AI Production Kit
Token metering, spend caps, a circuit breaker, an eval gate in CI, and a versioned prompt registry — the production-rigor layer cheap AI boilerplate skips.
A single model call costs a fraction of a cent — invisible until a retry loop fires it ten
thousand times and the bill lands at the end of the month. The same week, a reworded prompt drifts
the summarizer's accuracy down and nobody notices until a customer does. Both failures are cheap to
prevent at the call site and expensive to find in production. @caisson/ai-kit is the layer that
caps the spike, gates the regression, and versions the prompt.
$ POST /v1/ai/complete
HTTP/1.1 402 Payment Required
{
"error": "spend_cap_exceeded",
"tenant": "acme",
"window": "month",
"capCredits": 250000,
"spentCredits": 250000
}The cap is enforced before the provider is called, in integer credit units, fail-closed — over budget returns 402, never an unbounded charge.
What it does
- Provider-agnostic config — one settings file; switch model or provider without touching call sites.
- Token metering — every call is recorded with Postgres-atomic accounting, so usage is exact under concurrency.
- Spend caps + circuit breaker — per-tenant budgets that deny over-limit calls and trip the breaker on a failing provider.
- Eval harness in CI — prompts are scored against a fixture set; a regression fails the build, not production.
- Prompt registry — prompts are versioned and append-only, so a change is a diff you can roll back.
- Guardrails + agent setup — input/output guards plus an agent that walks env, providers, and DB through first-run configuration.
The contract it upholds
Metering is Postgres-atomic: no double-count, no lost write when calls land concurrently — the ledger reconciles to the cent. Spend enforcement is fail-closed at the call site, so an unmetered path is a build error, not a leak. And the eval gate runs in CI on every change:
$ bun run eval
FAIL prompts/summarize@v3
faithfulness 0.71 gate >= 0.80
1 regression — exit 1. Build blocked.A prompt regression stops at the pull request. It does not reach a customer.
Usage
Illustrative — final API names may still change.
import type { TenantId } from "@caisson/tenancy-rls";
import { aiConfig, withSpendCap } from "@caisson/ai-kit";
// Provider-agnostic: model and provider live in config, not at the call site.
const ai = aiConfig({ provider: "anthropic", model: "claude" });
// Every call is metered (Postgres-atomic) and capped. Over budget -> 402, fail-closed.
export async function summarize(tenant: TenantId, input: string) {
return withSpendCap(tenant, () =>
ai.complete({ prompt: "summarize@v3", input }),
);
}Reference expanding
This page is the spine, not the manual. The full @caisson/ai-kit API
reference — config schema, the metering ledger, spend-cap policy, the
eval-harness CI contract, and the prompt registry — is still expanding.
Package names and the fail-closed guarantees above are stable.
compliance
A SOC 2 / HIPAA evidence-pack generator and a config-as-code module registry — each control mapped to a cited clause, the pack signed so an auditor can verify it wasn't edited.
Local-first AI
On-device inference and vector search behind a privacy gate that keeps data on the machine. Own the source.