AI glossary

Cache (prompt caching)

Reusing the encoded representation of a long, static prompt prefix across many requests. Cuts cost 50-90% on use cases with large shared context (schemas, knowledge bases, instructions).

The longer version

Anthropic's prompt cache (and OpenAI's structured equivalents) lets you pay for the long static parts of your prompt — system instructions, schemas, large reference docs — once per cache TTL, then ~10% of normal input rate on subsequent reads. For RAG over a large knowledge base with a stable system prompt, this is a 5–10x cost reduction. We surface cache hit rate in observability so it's measurable, not assumed.

Want to talk about how this applies to your stack?

Book a 20-min call →Browse all terms

More terms

Agent
Agentic workflow
BAA (Business Associate Agreement)
Citations / grounding
Context window
DPA (Data Processing Agreement)