Request lifecycle

When a request goes through routeur.ai there are four distinct views worth documenting: what the caller sent, what routeur.ai passed upstream, what the model returned, and what routeur.ai finally returned to the caller. Every page of the API reference describes a single hop in this chain — this page is the map.

Four layers

Each numbered box below is its own JSON document. Together they form the complete picture of any single request.

1 · Caller → routeur.ai

The original OpenAI-compatible request body from your application.

2 · Routeur → LLM

The upstream request after routing, DLP, and any one-request overrides have been applied.

3 · LLM → routeur.ai

The raw upstream response from the provider adapter.

4 · Routeur → caller

The OpenAI-compatible response or short JSON error the caller actually receives.

Example: input DLP redaction

This is the most important example for security reviews because it proves the LLM never sees the original sensitive value.

1 · Caller request
{
  "model": "auto",
  "messages": [{
    "role": "user",
    "content": "Repeat this card: 4111 1111 1111 1111"
  }]
}
2 · Upstream request
{
  "model": "gpt-4o-mini",
  "messages": [{
    "role": "user",
    "content": "Repeat this card: [REDACTED]"
  }]
}
3 · Upstream response
{
  "id": "chatcmpl_...",
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "I can't assist with that."
    },
    "finish_reason": "stop"
  }]
}
4 · Caller response
{
  "id": "chatcmpl_...",
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [ ... ],
  "usage": { ... },
  "routeur": {
    "provider": "openai",
    "model": "gpt-4o-mini",
    "route_reason": "default",
    "redactions": 1
  }
}
i

Security property. The caller sent the real card number, but the upstream provider only received [REDACTED].

Example: output moderation block

Output moderation is the inverse case: the upstream model does produce content, but routeur.ai prevents that content from reaching the caller.

400application/json
Caller response
{
  "error": {
    "code": "blocked_by_moderation",
    "message": "moderation:secret_leak_block",
    "type": "routeur_error"
  }
}
!

Important distinction. With output moderation, the LLM has already seen the prompt and answered. The control protects the end user, not the upstream model.

Trace records and payload archives

Routeur exposes two observability layers for audits and debugging.

  • Trace record: compact request metadata persisted for every request, including provider, requested model, final model, latency, token counts, cost, and an optional payload_url.
  • Payload archive: the full request and response bundle, encrypted at rest and fetched via a short-lived signed URL. The archive includes the caller request, upstream request, upstream response, and the response body returned to the caller.
Trace record
{
  "request_id": "01K...",
  "organization_id": "org_42",
  "provider": "openai",
  "model": "gpt-4o-mini",
  "requested_model": "auto",
  "route_reason": "default",
  "latency_ms": 2939,
  "prompt_tokens": 13,
  "completion_tokens": 7,
  "cost_usd": 0.00000615,
  "payload_url": "https://payloads.routeur.ai/...?sig=..."
}