Rate limits
Routeur applies two independent caps per API key: requests per minute and tokens per minute (prompt + completion). Hitting either cap returns 429 rate_limited.
Defaults
New keys start with conservative limits. The dashboard shows your current caps and lets you request increases.
Requests / minute
600
Default per-key request cap.
Tokens / minute
200,000
Combined prompt + completion tokens.
Response headers
Every response includes the current state of both buckets so clients can self-throttle without polling.
X-Routeur-RateLimit-Requests-RemainingintegerRequests left in the current minute window.
X-Routeur-RateLimit-Tokens-RemainingintegerTokens left in the current minute window.
Retry-Aftersecondson 429 onlyNumber of seconds to wait before retrying.
Hitting the limit
{
"error": {
"code": "rate_limited",
"message": "requests per minute exceeded",
"type": "routeur_error"
}
}
Streaming caveat. Streamed responses count their completion tokens against your bucket as they're generated. A long stream can therefore push you over the cap mid-response.