How Recovea works

Recovea sits on the wire between your app and your provider. To your code it is OpenAI: same request shapes, same response shapes, same streaming, same errors. Underneath, every request is metered against a frozen baseline, optimized through a risk-ordered set of levers, and, for anything that could touch quality, proven non-inferior before it can ever reach a user. This page walks through what actually happens to a request, end to end.

request ─► baseline stamp ─► do-first levers ─► eval-gated levers ─► provider ─► metered response
              (frozen          (byte-identical,    (run in SHADOW on        │
           price list)         ship immediately)   real traffic until       │
                 │                    │            the route gate passes)    │
                 └────────────────────┴──────────────► cost ledger ◄─────────┘
                                                (baseline vs realized,
                                                 net of quality, per lever)

1. Every request carries its baseline

The moment a request arrives, Recovea stamps it with a baseline: the model you would have used and what that call would have cost, priced against a frozen reference price list: a versioned, immutable snapshot pinned at the start of your engagement, not your provider's live prices.

Cost is then computed twice for every request: the baseline cost (the counterfactual) and the realized cost (what we actually spent). That pair is the unit of proof. Because the price list is frozen, a mid-engagement provider price cut can never be claimed as Recovea savings. It shows up as an adjustment instead (see step 3).

2. The risk-ordered lever pipeline

Levers are ordered by risk, not by size of payoff. Two classes, and the line between them is non-negotiable.

Do-first levers (byte-identical, ship immediately)

These cannot change your output, so they need no quality gate. Two are live today; the rest are rolling out:

  • Exact cache (live): a hash of the canonical request returns the stored response on a hit; realized cost goes to roughly zero.
  • Dedup / single-flight (live): collapse identical concurrent requests into one upstream call; waiters receive a byte-identical response.
  • Prompt-cache prefix hygiene (planned): reorder the stable prefix ahead of volatile tokens so your provider's own prompt cache hits.
  • Batch API migration (planned): move batch-eligible traffic to the provider Batch tier (flat 50% off), with a latency fallback so a sync-expected route is never batched.
  • Reasoning-effort trim (planned): lower thinking effort on simple routes, off by default on reasoning-heavy or high-stakes routes.

Each is byte-identical to the baseline answer, so it carries no quality risk. (Until the verification engine is live end-to-end, every ledger figure — do-first included — is shown as a labeled estimate and no verified-savings share is billed; see step 4.)

Eval-gated levers (shadow first, promote only on proof)

These could affect output, so they are never allowed to touch a user request until they have earned it. Both are planned — they stay in shadow today (0% of live traffic; your users are always served by the baseline path):

  • Model routing / cascading (planned): send routable traffic to a cheaper model when a transparent classifier judges it safe.
  • Light Tier-1 source-side slim (planned): re-rank attached retrieval chunks, drop low-relevance context, prune unused tool definitions. Slim runs before the routing classifier, since compressing the input changes which tier gets picked.

A gated lever starts in shadow mode, and this collection step is live: your user is always served by the baseline path while a copy of the request runs the cheaper candidate out-of-band, on a small, statistically-driven sample of real traffic, with zero added user latency and zero user exposure, under a hard per-route budget that auto-pauses mirroring before it could dent your bill.

The scoring side is rolling out. As it lands, each candidate is scored through a layered oracle, cheapest checks first: deterministic checks (schema, format, refusal/truncation detection), a golden-dataset regression gate, and a cross-family calibrated LLM judge drawn from a different model family than both the baseline and the candidate, so no family grades its own work. A lever is promoted only when a paired, per-route non-inferiority test clears — the cheaper path is no worse than baseline by more than a contractual margin δ, at a 95% confidence bound — and the promotion endpoint mechanically refuses to advance a route until that test passes (GATE-PROMO). Promotion then climbs a ladder (1% → 5% → 20% → 50% → 100%) with the gate re-running at every rung; a regression on a promoted route auto-rolls-back and pages. Until this is live end-to-end, gated levers contribute only labeled estimates and never bill.

3. The cost ledger: net of quality

Every request writes one row to a ledger that is both your dashboard and your invoice basis, so the numbers you audit and the numbers you are billed can't diverge. The identity per request:

net_attributable_saving = baseline_cost − realized_cost − itemized_adjustments

Two things make it credible:

  • Adjustments we did not cause are stripped out. Your traffic growth is normalized to a per-unit basis so the invoice doesn't inflate on volume. A provider price cut is neutralized by the frozen price list. Your own prompt edits re-baseline through change-control. Recovea's own shadow-eval spend is removed from realized cost. We do not claim savings we didn't produce.
  • Per-lever attribution sums exactly to the net. Each dollar is attributed to the lever that earned it, in actual application order, with no dollar lost or invented. Every number drills down to the underlying requests.

The ledger is append-only and hash-chained, so an auditor can confirm no row was edited after the fact. You can export it as CSV or JSON today — every row carries its prev_hash / row_hash and the recompute recipe, so you can re-derive the chain from genesis yourself — covering baseline, normalization rules, adjustments, and per-lever attribution, and reconcile it line by line against your own provider invoice. (Planned: a signed-PDF proof exhibit and eval-verdict columns, which arrive with the verification engine.)

4. Billed on cost-per-successful-output

Quality is baked into the billed unit. A response is metered as a saving only if it is a successful output; a cheaper answer that was truncated, refused, or failed its eval check contributes zero billable saving. This closes the obvious gaming loophole (you can't get cheaper by getting worse), and it is why we describe savings as net of quality rather than gross. See Pricing for how share-of-savings attaches to the net verified number, never the gross. Until the verification engine is live end-to-end, no output is billed as a verified saving — the pre-proof guard keeps the share off and every ledger figure reads as a labeled estimate.

5. Fail-open, and one-line rollback

If anything in Recovea's layer fails (a classifier error, a cache timeout, a slim failure), the request falls back to the baseline model and flows straight to your provider on your own key. Savings for that request go to zero; you never get a 5xx from us, and you never lose the request. Telemetry and ledger writes happen off the hot path, so a metering hiccup can't fail a call either.

The rollback path is the same as the integration path. Recovea is reversible in one line: point your base URL back at your provider and traffic flows direct, with no code change.

Next

  • Pricing: share-of-savings, only on net verified savings
  • Streaming: server-sent events, unchanged
  • Errors: status codes and the OpenAI error envelope