Methodology
How we engineer AI that doesn't break at 11pm.
This is deeper than /process — that's the engagement flow. This is the engineering methodology. Six pillars, four phases, five non-negotiables.
Six pillars
Exception-first design
Most teams design for the happy path and add guardrails later. We design for the exceptions first — refusal patterns, low-confidence routing, escalation queues. Happy path falls out.
Action surface before prompts
Before any prompt, we list every action the agent can take. Verb-object pairs, schemas, failure modes. Written collaboratively with the human currently doing the workflow.
Approval-gated writes
v1 agents draft; humans approve; system executes. We graduate to auto-execute only after measured trust at scale. Never the reverse.
Observability from commit one
Every prompt, retrieval, tool call, and output logged with reasoning. Langfuse by default. Replayable traces are the only way to keep agents healthy past launch.
Eval suite as a permanent system
20+ cases at launch, growing weekly with production failures. Replay against every prompt change. Failing the eval blocks merge.
Boring infrastructure
We prefer obvious solutions over clever ones — Postgres over Kafka, REST over GraphQL, hosted-API over fine-tuning. The clever path is reserved for the AI itself, not the plumbing.
Four phases · what's actually produced
1 · Discovery (1–2 weeks)
Interviews with 8–12 stakeholders, code/data review, technical risk audit. Output: a scoring rubric for the candidate use cases and a draft scope for the leading one.
Artifacts
- Stakeholder map + risk register
- Use-case backlog ranked by impact × feasibility × data readiness
- Draft SOW with assumptions called out explicitly
2 · Architecture + scope
Pick the v1, pin the architecture, sign the SOW. We commit to the price; you commit to the dependencies (access, sign-off owners, the data we need).
Artifacts
- Signed SOW with milestones and acceptance criteria
- System architecture diagram
- Eval-suite seed (10–20 cases) before any code lands
3 · Build (3–6 weeks)
Senior engineers in your repo from week one. Working code at every weekly demo — not slides, not mockups. Bi-weekly retros to catch direction changes early.
Artifacts
- Working code at every weekly demo
- Observability + eval coverage growing as we go
- Runbook drafts ready by mid-sprint
4 · Hardening + ship
Production launch, runbook handover, knowledge transfer. Daily readout for the first week, weekly for the second month. We're around for tuning if you want us.
Artifacts
- Production deploy
- Runbook + on-call playbook
- All IP transferred to your accounts
- Eval suite handed off with documented growth plan
Five non-negotiables
- No agent ships without an eval suite
- No write action without a documented approval path
- Every prompt change runs against the eval suite before deploy
- Production readiness includes a runbook · no exceptions
- Code, prompts, and configs in your repos · day one
We've walked away from engagements where a client wanted us to ship without one of these. Not a flex — just clarity about what shipping AI responsibly actually requires.
See the methodology in action.
Five published case studies. Each one shows this methodology applied to a different domain.