Retrieval-augmented generation (RAG) is a technique used to “ground” large language models (LLMs) with specific data sources, often sources that weren’t included in the models’ original training. RAG’s three steps are retrieval from a specified source, augmentation of the prompt with the context retrieved from the source, and then generation using the model and the augmented prompt.
At one point, RAG seemed like it would be the answer to everything that’s wrong with LLMs. While RAG can help, it isn’t a magical fix. In addition, RAG can introduce its own issues. Finally, as LLMs get better, adding larger context windows and better search integrations, RAG is becoming less necessary for many use cases.
Meanwhile, several new, improved kinds of RAG architectures have been introduced. One example combines RAG with a graph database. The combination can make the results more accurate and relevant, particularly when relationships and semantic content are important. Another example, agentic RAG, expands the resources available to the LLM to include tools and functions as well as external knowledge sources, such as text databases.