Skip to content
AIAn Alian Software company

AI glossary

Cache (prompt caching)

Reusing the encoded representation of a long, static prompt prefix across many requests. Cuts cost 50-90% on use cases with large shared context (schemas, knowledge bases, instructions).

The longer version

Anthropic's prompt cache (and OpenAI's structured equivalents) lets you pay for the long static parts of your prompt — system instructions, schemas, large reference docs — once per cache TTL, then ~10% of normal input rate on subsequent reads. For RAG over a large knowledge base with a stable system prompt, this is a 5–10x cost reduction. We surface cache hit rate in observability so it's measurable, not assumed.

Want to talk about how this applies to your stack?