Embeddings
POST /v1/embeddings returns vector representations of text. Recovea proxies the OpenAI embeddings shape unchanged: you send the same request, you get the same response back, and your existing client works without modification.
Recovea meters every embeddings call into your cost ledger using exact token accounting. In v1 there is no semantic cache on this endpoint: embeddings are billed on real, measured usage, not on a cache hit-rate estimate.
Request
from openai import OpenAI
client = OpenAI(
base_url="https://api.recovea.ai/v1",
api_key="rcv_live_…",
)
resp = client.embeddings.create(
model="text-embedding-3-small",
input="The quick brown fox jumped over the lazy dog.",
)
print(resp.data[0].embedding[:5])
curl https://api.recovea.ai/v1/embeddings \
-H "Authorization: Bearer rcv_live_…" \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumped over the lazy dog."
}'
Parameters
| Field | Type | Required | Notes |
|---|---|---|---|
model | string | yes | Embedding model id, e.g. text-embedding-3-small, text-embedding-3-large. Echoed back in the response. |
input | string or array | yes | A single string, or an array of strings to embed in one call. Each array element returns its own vector. |
dimensions | integer | no | Truncate output vectors to this length (supported by text-embedding-3-*). |
encoding_format | string | no | "float" (default) returns arrays of floats; "base64" returns base64-encoded packed floats. |
user | string | no | Opaque end-user identifier, passed through unchanged. |
To embed several inputs in one request, pass an array. The response data[] preserves order via each entry's index:
resp = client.embeddings.create(
model="text-embedding-3-small",
input=["first document", "second document"],
)
for item in resp.data:
print(item.index, len(item.embedding))
Response
The response object is "list"; data[] holds one embedding per input, each tagged with its index. The shape matches OpenAI field-for-field.
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [-0.0061, 0.0094, -0.0123, "…"]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 11,
"total_tokens": 11
}
}
| Field | Type | Notes |
|---|---|---|
object | string | Always "list". |
data[] | array | One embedding object per input, ordered by index. |
data[].object | string | Always "embedding". |
data[].index | integer | Position of this vector in the input array (0-based). |
data[].embedding | array or string | Array of floats, or a base64 string when encoding_format is "base64". |
model | string | The model that produced the vectors; echoes your request. |
usage | object | prompt_tokens and total_tokens only, since embeddings have no completion tokens. |
Metering and the ledger
Recovea records usage.prompt_tokens for every embeddings call and prices it against the frozen reference rate for that model, writing the result to your cost ledger. This is exact metering: the count comes from the model's own usage report, not an estimate.
There is no semantic cache on /v1/embeddings in v1. Recovea does not deduplicate or replay near-identical inputs here, so you are never billed against a similarity heuristic. Each input you send is embedded and metered as sent. Caching levers that affect quality apply on the chat surface, behind the non-inferiority gate; embeddings stay pass-through and exact.
Headers and fail-open
Every response carries the standard OpenAI headers plus Recovea's x-recovea-trace-id, which correlates the call to its ledger entry. If Recovea's metering layer fails, the request flows straight through to your provider on your own key: you get the embeddings, savings == 0, never a Recovea-shaped error.
Errors
Errors use the same envelope and status codes as the rest of the API. For example, an unknown model returns 404 with code: "model_not_found", and an input that exceeds the model's context window returns 400 with code: "context_length_exceeded".
{
"error": {
"message": "The model `text-embedding-9` does not exist or you do not have access to it.",
"type": "invalid_request_error",
"param": null,
"code": "model_not_found"
}
}
See Errors for the full status-code and envelope reference.
Next
- Chat Completions: the core inference endpoint
- Models: list the models available to your key
- Errors: status codes and the error envelope