Summary
Retrieval-Augmented Generation (RAG) combines information retrieval with text generation. Relevant documents are retrieved from a knowledge base based on user queries, then provided as context for LLM response generation. This grounding reduces hallucinations and ensures responses reflect current, authoritative information.
How it works
- Query Processing: Transform user query into retrieval format
- Document Retrieval: Search knowledge base for relevant passages
- Context Assembly: Combine retrieved documents with user query
- Generation: LLM produces grounded response from augmented prompt
- Verification: Optional citation or source attribution
RAG architectures
- Naive RAG: Retrieve-then-read pipeline
- Agentic RAG: Iterative retrieval with planning and tool use
- Hybrid RAG: Combine dense and sparse retrieval
- Graph RAG: Leverage knowledge graph relationships
Component considerations
- Retriever: BM25, dense embeddings, hybrid approaches
- Index: FAISS, Pinecone, Weaviate, Elasticsearch
- Chunking: Fixed-size, semantic, recursive strategies
- Retrieval: Top-k, MMR, similarity thresholds