AI glossary
RAG (Retrieval-Augmented Generation)
Fetching relevant documents from a database first, then asking the model to answer using only those documents. The default architecture for grounded chatbots and knowledge assistants.
The longer version
Production RAG isn't just 'fetch chunks and stuff them in the prompt.' It's hybrid search (BM25 + vector) → reranking → citation-required prompting → refusal patterns when retrieval confidence is low. Hallucination drops 80%+ with the full pattern vs naive RAG. See /playbooks/rag for the full pattern.
Related terms
Embedding
A dense numerical representation of text (or other media) that captures meaning. Used for semantic search, clustering, recommendation. Underlies most RAG systems.
Vector database
A database optimized for similarity search over embeddings. pgvector if you're on Postgres, Pinecone / Qdrant / Weaviate when you need more scale.
Reranking
Running retrieved candidates through a second model to reorder by relevance. Top-50 from vector search reranked to top-5 dramatically improves precision. We use Cohere Rerank.