Proof, not promises

Prove and govern your AI spend — do more per dollar, attested.

They estimate; we settle.

Change one line and route every provider through one fail-open gateway. See your spend in real time, set hard budget caps and a kill-switch, attribute every dollar by key, team, and feature, and keep an auditable, basis-labeled cost ledger. Verified savings — net of quality, the 25% share — stay off until the proof gate ships; we never bill a saving we can’t prove. OpenAI-compatible, your provider keys pass through, fail-open.

Start free in observe-only mode — no card. Control levels up on the paid tiers. Your data in your tenant; fail-open.

OpenAI-compatible · one lineOpenAI · Anthropic · the OpenRouter long tail live today · coming soon: native Gemini · Bedrock · Azure
RECOVEA · COST LEDGERSample dataacct ████ · period 2026-05
leverbasisstatus
prompt-cache prefix hygieneplannedplanned
Batch API migrationplannedplanned
reasoning / effort trimplannedplanned
dedup / single-flightconciergeconcierge
exact cacheconciergeconcierge
model routing / cascadeproof pending
context / RAG trim (gateway)proof pending
spend visibilitylive · every provider
cost ledger basislabeled · auditable
verified savingsoff · proof pending
The cost ledger is live and basis-labeled. Verified savings — net of quality, adjustments stripped — turn on only once the eval gate ships at volume; the 25% share stays off until then.
01One line to start. See and govern your whole AI stack.02Optimization is a commodity. Proof is the product.03Caps, kill-switch, attribution — control, not guesswork.04Your data in your tenant. Fail-open. They estimate; we settle.

OpenAI-compatible · one line

OpenAIAnthropicOpenRoutercoming soonGoogle Gemini (native)AWS BedrockAzure OpenAI
The wedge

Every gateway caches and routes for free. We found none that prove what you actually saved.

Caching, batching, and routing ship for $0–500/mo across LiteLLM, Portkey, Helicone, Cloudflare AI Gateway, OpenRouter. We use some of them ourselves. What we haven't found anywhere else is proof that the bill dropped without the product getting worse. That proof is the product. And we measure every method (our levers, your own prompt edits, and any token-reduction tool you already run) through one eval gate and one ledger.

Table stakes: free / $0–500/mo
  • Caching (exact + semantic)
  • Model routing / cascading
  • Batch API dispatch
  • Spend dashboards
  • Source-side reducers (Graphify, h5i)
The product: what we found nobody else building
  • The cost ledger: every dollar by route and lever, basis-labeled — verified savings off · proof pending.
  • The cross-family eval gate: non-inferiority proven per route, on your real traffic, before anything counts.
  • Designed to bill on cost-per-successful-output, so quality is inside the unit you pay on.
How it works

Change one line. See and govern every dollar; optimize only what we can prove.

1

Point your base URL at Recovea

OpenAI on /v1, Anthropic on its native surface, and the OpenRouter long tail via vendor/model ids (more planned). Zero app change; your keys pass through; fail-open.
2

We record your baseline

Every request is stamped with what it would have cost untouched.
3

Safe levers ship; risky ones wait

Zero-risk levers go live day one. Risky ones wait behind a shadow-mode quality gate. A light source-side slim of the assembled request is planned.
4

The ledger shows it all

An auditable, basis-labeled cost ledger — spend by key, team, and lever. Verified savings stay off until the proof gate ships.
Do-firstcachededup·PlannedBatch APIeffort-trim·Eval-gatedroutingcompression
shadow1%5%20%50%100%

gate must hold at each rung · one-config rollback any time

The cost ledger

An auditable, basis-labeled cost ledger — spend you can attribute and defend.

Every dollar attributed by key, team, feature, and lever, with the basis labeled. Verified savings — baseline minus realized, net of quality, adjustments we didn’t cause stripped — are off · proof pending: they turn on only once the eval gate ships at volume, and the 25% share stays off until then. The method spine is IPMVP performance contracting.

How we measure it
RECOVEA · COST LEDGERSample dataacct ████ · period 2026-05
leverbasisstatus
prompt-cache prefix hygieneplannedplanned
Batch API migrationplannedplanned
reasoning / effort trimplannedplanned
dedup / single-flightconciergeconcierge
exact cacheconciergeconcierge
model routing / cascadeproof pending
context / RAG trim (gateway)proof pending
spend visibilitylive · every provider
cost ledger basislabeled · auditable
verified savingsoff · proof pending
The cost ledger is live and basis-labeled. Verified savings — net of quality, adjustments stripped — turn on only once the eval gate ships at volume; the 25% share stays off until then.
non-inferiority testSample data PASS
routesupport-classify
candidategpt-4o-mini· baseline gpt-4o
margin δ2.0 pts· observed +0.4
oraclegolden set + x-family judge
→ promote to 5% canary · rollback armed
shadow1%5%20%50%100%

gate must hold at each rung · one-config rollback any time

Quality is the gate, not a footnote

We never gamble your output quality to save money.

Risky levers run in shadow mode on your real traffic, with zero user exposure. They’re promoted only after a per-route non-inferiority test passes against a layered oracle: deterministic checks → golden-dataset gate → cross-family calibrated judge. One-config rollback at any step.

See the eval method
Two tiers, one ledger

One line today. Optionally go deeper at the source, soon. Every layer proven.

one line in, one line out
from openai import OpenAI
client = OpenAI(
- base_url="https://api.openai.com/v1",
+ base_url="https://api.recovea.ai/v1", # your keys pass through · fail-open
api_key=OPENAI_API_KEY,
)

Tier 1: Gateway live today

One base-URL change, zero app change. On the already-assembled request the gateway runs dedup and exact cache today; routing/cascading shadows on your real traffic behind the eval gate, and Batch API plus a light source-side slim (obvious RAG/context trim) are planned.

Tier 2: Recovea Intake coming v1.1

A thin SDK with connectors for the frameworks you use (LangChain / LlamaIndex / raw / MCP). You wrap your retrieval and tool-output steps before the prompt is assembled, to unlock the bigger multiples on context- and output-heavy agents. Designed now; not yet shippable.

Stated plainly: a gateway sees your request after it’s assembled, so it can compress and trim what’s already in the prompt, but it can’t decide what your agent retrieves. The deepest reductions need a little app-level cooperation: the optional Intake SDK, which we’re building next. Either way, every reduction passes the same shadow-mode non-inferiority gate and lands in the same ledger; the ledger measures the incremental savings from your traffic.

Honest numbers

Realistic, not aspirational.

We lead with what's live today — cost visibility and control — and label everything else honestly. Cache/dedup savings are concierge-verified, never a self-serve billed dollar; verified savings and the 25% share are off · proof pending until the eval gate ships at volume.

Live today (cost visibility + control)spend, caps, attribution — from day one
Cache / dedup savingsconcierge-verified, hands-on · not a self-serve billed dollar
Verified savings + the 25% shareoff · proof pending — on only once the eval gate ships
What we don’t claimheadline routing maxima · never a single giant promise

Other tools’ routing headlines are conversational benchmarks on the routable fraction, not a blended number. We settle yours on your own traffic — you can verify it offline.

For AI builders

Inference dominates your COGS. Get it under control first.

You just raised; your inference bill grows faster than revenue; every optimization feels like a quality gamble you can’t prove. We change one base URL, give you caps, a kill-switch, and per-team attribution this week, and only prove savings once the eval gate can settle them.

Sample data
spend controllive today
verified savingsoff · proof pending
See the plans
Pricing

Start free. Pay a flat platform fee for control. Verified savings stay off until proven.

Free
$0
Observe-only · any spend · no card

The gateway in observe-only mode meters every request against a frozen price list and gives you read-only spend visibility across providers. Nothing about your traffic changes.

ProfessionalStart here
$249/mo
In-path gateway · sized by spend routed

One base_url change, BYO-key, fail-open: cross-provider cost visibility, concierge-verified cache/dedup savings, and an auditable, basis-labeled cost ledger. Verified savings and the 25% share: off · proof pending.

Scale & Enterprise
Talk to us
Higher volume · security review · white-glove

For higher volume and teams that need a security review and the white-glove inference teardown. Published as their own tiers as they ship — talk to us in the meantime.

Proof

Numbers, not testimonials.

Every figure carries its conditions, and we show what we deliberately don’t count. Named ledger excerpts land here as our first design-partner cases publish — until then, the proof we can offer is your own scan.

The shape of a first engagement

Control on day one; cache/dedup savings concierge-verified; verified savings off · proof pending

You get cost visibility and control immediately. Cache and dedup savings (byte-identical, zero quality risk) are concierge- verified hands-on — never a self-serve billed dollar. Eval-gated savings and the 25% share stay off until the proof gate ships at volume; we never bill a saving we can’t show in the ledger.

Validation

Paul Graham publicly described a YC startup doing exactly this (cutting customers’ inference costs and splitting the savings) and sized the market at “a quarter of the model companies’ corporate revenue.”

Who it’s for

Built for two buyers.

The technical budget-holder (CTO / CIO / platform lead)

The builder who owns the AI spend and can swipe a card, no procurement. You want caps, a kill-switch, and per-team attribution before the bill surprises you.

Need to show your board? (a second lane)

Finance and FinOps teams get an auditable, basis-labeled cost ledger to attribute and defend spend. Verified savings land here too — off · proof pending until the eval gate ships.

Spending under ~$15K/mo? We’ll tell you straight: at that scale you want control and visibility, not a savings pitch. Start free in observe-only mode — we’ll tell you the day a paid tier pays for itself.

Security & trust

We sit inline on your traffic. We treat that seriously.

No prompt bodies logged by default. Scoped access + audit log. Fail-open. Self-host in your own VPC and salted per-tenant cache: planned. SOC 2 on the roadmap.

Fail-openSelf-host VPCMetadata-onlyAudit log
Read security & trust

See and govern your AI spend, before the next surprise bill.

Start free in observe-only mode — no card. Control levels up on the paid tiers; verified savings turn on once we can prove them.