Chat completion
curl https://api.routeur.ai/v1/chat/completions \
-H "Authorization: Bearer $ROUTEUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "hi"}]
}'
Streaming
Add "stream": true to receive server-sent events. Use curl -N to disable buffering:
curl -N https://api.routeur.ai/v1/chat/completions \
-H "Authorization: Bearer $ROUTEUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"stream": true,
"messages": [{"role": "user", "content": "stream please"}]
}'
Choosing a model
Every request picks an upstream model in one of three ways — in this order of precedence:
- Routeur-* request headers. If
Routeur-ProviderorRouteur-Modelare set, they win, regardless of what is in the body. Useful from middleware or transports that can't change the JSON. - An explicit upstream id in
model. Pass something likegpt-4o-miniorclaude-3-5-sonnetto bypass routing rules and pin the request to that model. model: "auto"(or any alias). Lets routeur.ai pick the upstream model from your routing rules. This is the recommended default.
Force a specific provider+model on a single call via headers (the body still needs a model field — it is ignored when headers are set):
curl https://api.routeur.ai/v1/chat/completions \
-H "Authorization: Bearer $ROUTEUR_KEY" \
-H "Routeur-Provider: openai" \
-H "Routeur-Model: gpt-4o" \
-H "Content-Type: application/json" \
-d '{ "model": "auto", "messages": [{"role":"user","content":"hi"}] }'
Production tips
- Always set a connect + read timeout on your HTTP client. The platform default is rarely what you want.
- Stream long completions to surface first-token latency and let users cancel.
- Persist the
X-Routeur-Trace-*response headers if you want to inspect a routed request after the fact — see Inspect trace.