Embeddings

POST /v1/embeddings returns vector representations of text. Recovea proxies the OpenAI embeddings shape unchanged: you send the same request, you get the same response back, and your existing client works without modification.

Recovea meters every embeddings call into your cost ledger using exact token accounting. In v1 there is no semantic cache on this endpoint: embeddings are billed on real, measured usage, not on a cache hit-rate estimate.

Request

from openai import OpenAI

client = OpenAI(
    base_url="https://api.recovea.ai/v1",
    api_key="rcv_live_…",
)

resp = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumped over the lazy dog.",
)
print(resp.data[0].embedding[:5])

curl https://api.recovea.ai/v1/embeddings \
  -H "Authorization: Bearer rcv_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumped over the lazy dog."
  }'

Parameters

Field	Type	Required	Notes
`model`	string	yes	Embedding model id, e.g. `text-embedding-3-small`, `text-embedding-3-large`. Echoed back in the response.
`input`	string or array	yes	A single string, or an array of strings to embed in one call. Each array element returns its own vector.
`dimensions`	integer	no	Truncate output vectors to this length (supported by `text-embedding-3-*`).
`encoding_format`	string	no	`"float"` (default) returns arrays of floats; `"base64"` returns base64-encoded packed floats.
`user`	string	no	Opaque end-user identifier, passed through unchanged.

To embed several inputs in one request, pass an array. The response data[] preserves order via each entry's index:

resp = client.embeddings.create(
    model="text-embedding-3-small",
    input=["first document", "second document"],
)
for item in resp.data:
    print(item.index, len(item.embedding))

Response

The response object is "list"; data[] holds one embedding per input, each tagged with its index. The shape matches OpenAI field-for-field.

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [-0.0061, 0.0094, -0.0123, "…"]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 11,
    "total_tokens": 11
  }
}

Field	Type	Notes
`object`	string	Always `"list"`.
`data[]`	array	One `embedding` object per input, ordered by `index`.
`data[].object`	string	Always `"embedding"`.
`data[].index`	integer	Position of this vector in the input array (0-based).
`data[].embedding`	array or string	Array of floats, or a base64 string when `encoding_format` is `"base64"`.
`model`	string	The model that produced the vectors; echoes your request.
`usage`	object	`prompt_tokens` and `total_tokens` only, since embeddings have no completion tokens.

Metering and the ledger

Recovea records usage.prompt_tokens for every embeddings call and prices it against the frozen reference rate for that model, writing the result to your cost ledger. This is exact metering: the count comes from the model's own usage report, not an estimate.

There is no semantic cache on /v1/embeddings in v1. Recovea does not deduplicate or replay near-identical inputs here, so you are never billed against a similarity heuristic. Each input you send is embedded and metered as sent. Caching levers that affect quality apply on the chat surface, behind the non-inferiority gate; embeddings stay pass-through and exact.

Headers and fail-open

Every response carries the standard OpenAI headers plus Recovea's x-recovea-trace-id, which correlates the call to its ledger entry. If Recovea's metering layer fails, the request flows straight through to your provider on your own key: you get the embeddings, savings == 0, never a Recovea-shaped error.

Errors

Errors use the same envelope and status codes as the rest of the API. For example, an unknown model returns 404 with code: "model_not_found", and an input that exceeds the model's context window returns 400 with code: "context_length_exceeded".

{
  "error": {
    "message": "The model `text-embedding-9` does not exist or you do not have access to it.",
    "type": "invalid_request_error",
    "param": null,
    "code": "model_not_found"
  }
}

See Errors for the full status-code and envelope reference.

Chat Completions: the core inference endpoint
Models: list the models available to your key
Errors: status codes and the error envelope

PreviousStreaming

Next Models