The controls cheap AI skips.
A starter that calls the model is a demo. Metered token accounting, a per-tenant circuit breaker, an eval gate in CI, and typed guardrails are the gap between a demo and a feature you can charge for. This kit is that gap, wired and tested.
$ caisson eval run --suite prompts/golden.yaml --ci Running 24 cases against golden set... case helpfulness score 0.91 prev 0.93 Δ -0.02 ok case accuracy score 0.72 prev 0.91 Δ -0.19 FAIL case refusal_rate score 0.98 prev 0.97 Δ +0.01 ok ✗ eval gate failed — accuracy regressed 0.19 > tolerance 0.02\n deploy blocked. fix the prompt or update the golden set. report: .caisson/eval/2026-06-27T14-09-11Z.json
Cheap AI boilerplate ships the demo, not the controls.
Three ways an AI feature turns into an incident: an unmetered loop triples the invoice, an eval regression ships on Friday, a prompt nobody can audit breaks in production. This kit puts a named control in front of each one.
Six controls. One kit.
Usage writes in the same Postgres transaction as the result — one atomic increment. Concurrent calls never double-count a charge or drop one under load.
Each tenant gets a hard cap. Cross it and the breaker opens — the next model call returns HTTP 402 and resets on the window, not a surprise invoice.
Prompts run against a golden set on every pull request. A score drop past tolerance fails the check — the regression never reaches a customer.
Input and output cross a Zod-typed schema and a policy check on both sides of the model. Out-of-policy responses are rejected at the boundary, not forwarded.
Every prompt is versioned and addressable by id. A call references prompt@v7, not an inline string — diff it, roll it back, audit what the model was asked.
Typed agent and tool definitions with a per-tool allowlist. An agent calls only the tools its manifest declares — no implicit access, no surprise side-effect.
Every claim here is a control you can point at.
The caps, the breaker, and the eval gate are configuration checked into your repo and enforced at call time — not a dashboard you hope someone is watching.
# caisson.ai.toml — checked into your repo, enforced at call time [caps.default] daily_tokens = 1_000_000 on_exceed = "break" # open the circuit, return 402 [evals] gate = "ci" # block the PR on regression tolerance = 0.02 # max score drop before the check fails [guardrails] input_schema = "schemas/chat-input.json" output_policy = "policies/content-policy.ts"
Own the code, or subscribe.
from $599
Own the AI Production Kit source outright — the six controls, wired and tested, plus all future patch releases.
Take metering, caps, or the eval harness on its own — from from $49 per module.
Subscription — credits, framework updates, and private-registry pulls. Keeps the kit current as model APIs shift.
What does token metering actually prevent?
A runaway loop, a misconfigured agent, or a single burst of traffic can multiply your API invoice by 10× before you see it. Caisson writes usage in the same Postgres transaction as the result — an atomic increment — so concurrent calls can never double-count or drop a charge. Crossing the cap opens the circuit breaker and returns HTTP 402 before the next model call fires.
What happens when a tenant hits their spend cap?
The breaker opens. The next model call returns HTTP 402 with a structured error body — the same as any other payment-required response in your API. The window resets on the configured interval (UTC midnight by default). No partial responses, no silent overages, no surprise invoice.
How does the eval gate work in CI?
You commit a golden set of prompt → expected-output pairs alongside your prompt definitions. On every pull request, the eval runner scores the current prompts against the golden set. A score drop past the configured tolerance fails the check — the regression never merges. The gate is a GitHub Actions step; it reads from the prompt registry and writes results to a structured report.
Is Caisson an AI platform or a library?
A library — a codebase you own. It ships as typed TypeScript packages you install and configure in your own repository. There is no hosted control plane, no SDK that phones home, no vendor lock-in beyond the Postgres database you already run.
Ship the feature with the brakes on.
Scaffold a new project with the AI Production Kit included, or go straight to pricing to add it to an existing Caisson base.
$ npx create-caisson@latest