Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) combines the power of large language models with external knowledge retrieval to produce more accurate, up-to-date, and grounded responses.
In a typical RAG pipeline, a user query is first converted into a vector embedding, which is used to search a vector database for semantically similar documents. The retrieved documents are then injected into the LLM's prompt as context, allowing the model to generate responses grounded in factual, domain-specific information.
RAG solves several key LLM limitations: hallucination (making up facts), knowledge cutoff dates, and lack of domain-specific expertise. It's widely used in enterprise chatbots, customer support systems, and internal knowledge assistants.
Key components of a RAG system include a document ingestion pipeline, a chunking strategy, an embedding model, a vector store, and a retrieval strategy (e.g., semantic search, hybrid search, or re-ranking).
Want to learn more?
Explore more developer terms or read in-depth articles on the blog.
Browse all terms