AI glossary
Eval / evaluation suite
A set of test cases run weekly against an AI system to catch quality regressions before users do. Non-optional for production. We build one for every shipped system.
The longer version
20+ test cases at launch, growing weekly with production failures. Each case has expected behavior (not just expected output), scoring rubric, and pass threshold. Run in CI on every prompt change. Failing the eval blocks merge. Named owner on the client side. See /playbooks/eval-suite for the full pattern.