RAG - Retrieval Augmented Generation

References and useful ressources

Stages within RAG

Loading: This refers to getting your data from where it lives - whether it's text files, PDFs, another website, a database or an API - into your pipeline.
Indexing: This means creating a data structure that allows for querying the data. For LLMs this nearly always means creating vector embeddings, numerical representations of the meaning of your data, as well as numerous other metadata strategies to make it easy to accurately find contextually relevant data.
Storing: Once your data is indexed, you will want to store your index, along with any other metadata, to avoid the need to re-index it.
Querying: For any given indexing strategy there are many ways you can utilize LLMs and data structures to query, including sub-queries, multi-step queries, and hybrid strategies.
Evaluation: A critical step in any pipeline is checking how effective it is relative to other strategies, or when you make changes. Evaluation provides objective measures on how accurate, faithful, and fast your responses to queries are.

Text-Only RAG vs Multimodal RAG

Text-only RAG

encode text-only data
need one encoder for text data
store the encoded data in an index
we retrieve text-only from the user-query (text-only) to augment the system-query
use LLMs to answer questions and evaluate

Multimodal RAG

encode text as well as image data
need two encoders: one for text data and one for image data
store the encoded data in separate index (or namespace/collection) to ease the eval (separate scores)
we retrieve text and image from the user-query (text and image) to augment the system-query
use LLMs to answer questions and evaluate

Advanced RAG Querying strategies

⚠️ TODO: explain the strategies

Small-To-Big retrieval

HyDE Query Transform

Multi-Step Query engine

Long LLMLingua

(ReAct) Reason + Act prompting

Used for Common Sense / Math / Problem solving with external tools

ReAct prompts LLMs to generate verbal reasoning traces and actions for a task. This allows the system to perform dynamic reasoning to create, maintain, and adjust plans for acting while also enabling interaction to external environments (e.g., Wikipedia) to incorporate additional information into the reasoning.

(CoT) Chain of Thought

Used for Common Sense / Math / Problem solving

Enables complex reasoning capabilities through intermediate reasoning steps.

https://arxiv.org/abs/2201.11903 (CoT)
https://arxiv.org/abs/2205.11916 (zero-shot CoT)
https://arxiv.org/abs/2210.03493 (auto CoT)
https://www.promptingguide.ai/techniques/cot

Few shots CoT

Zero-shot CoT

Another approach is the zero-shot CoT which essentially involves adding "Let's think step by step" to the original prompt.

Auto CoT

Finally, the Auto-GoT try to reduce the difficulty of crafting effective and diverse examples by hand.

(CoD) Chain of Density

Used for Summarization

CoD prompting is a technique designed to optimize summarization tasks in language models. It focuses on controlling the information density in generated summaries to achieve a balanced output. This approach has practical implications in data science, particularly for tasks requiring high-quality, contextually appropriate summaries.