AI glossary

RAG (Retrieval-Augmented Generation)

Fetching relevant documents from a database first, then asking the model to answer using only those documents. The default architecture for grounded chatbots and knowledge assistants.

The longer version

Production RAG isn't just 'fetch chunks and stuff them in the prompt.' It's hybrid search (BM25 + vector) → reranking → citation-required prompting → refusal patterns when retrieval confidence is low. Hallucination drops 80%+ with the full pattern vs naive RAG. See /playbooks/rag for the full pattern.

Related terms

Embedding
A dense numerical representation of text (or other media) that captures meaning. Used for semantic search, clustering, recommendation. Underlies most RAG systems.
Vector database
A database optimized for similarity search over embeddings. pgvector if you're on Postgres, Pinecone / Qdrant / Weaviate when you need more scale.
Reranking
Running retrieved candidates through a second model to reorder by relevance. Top-50 from vector search reranked to top-5 dramatically improves precision. We use Cohere Rerank.

Want to talk about how this applies to your stack?

Book a 20-min call →Browse all terms

More terms

Agent
Agentic workflow
BAA (Business Associate Agreement)
Cache (prompt caching)
Citations / grounding
Context window