Large language models generate fluent text. They fail to meet grounding, traceability, freshness, and access control requirements. Retrieval-augmented generation (RAG) addresses this by forcing models to answer using external evidence.
Early RAG used one simple pipeline. Production systems now use multiple architecture patterns. Each pattern targets a different failure mode. This post explains eight major RAG architectures used in production today.