What is Retrieval-Augmented Generation (RAG)?

AI & ML

Retrieval-Augmented Generation (RAG)

A technique that enhances LLM responses by retrieving relevant documents from an external knowledge base before generating an answer.

Retrieval-Augmented Generation (RAG) combines the power of large language models with external knowledge retrieval to produce more accurate, up-to-date, and grounded responses.

In a typical RAG pipeline, a user query is first converted into a vector embedding, which is used to search a vector database for semantically similar documents. The retrieved documents are then injected into the LLM's prompt as context, allowing the model to generate responses grounded in factual, domain-specific information.

RAG solves several key LLM limitations: hallucination (making up facts), knowledge cutoff dates, and lack of domain-specific expertise. It's widely used in enterprise chatbots, customer support systems, and internal knowledge assistants.

Key components of a RAG system include a document ingestion pipeline, a chunking strategy, an embedding model, a vector store, and a retrieval strategy (e.g., semantic search, hybrid search, or re-ranking).