Limits keep the free tier sustainable and protect the API. They’re starting points and
may be tuned over time.
Limits
| Lane | Limit |
|---|
| Anonymous (by IP) | 15 / minute and 100 / day - both apply |
| Starter key | 60 / minute |
| Scale key | 300 / minute |
| Agency key | 1000 / minute |
Keyed limits are per API key, by the team’s plan. The anonymous limit is per client IP.
Over MCP, a single user question is usually several HTTP requests (the client lists
tools, then calls one or more). Budget the anonymous limit accordingly, or use a key.
When you’re limited
Over-limit requests get 429 with the standard error envelope:
{
"success": false,
"error": {
"code": "rate_limited",
"message": "Rate limit exceeded",
"details": { "retry_after_seconds": 12, "limit": 60, "remaining": 0 }
}
}
…and these headers:
| Header | Meaning |
|---|
Retry-After | Seconds to wait before retrying. |
X-RateLimit-Limit | Requests allowed in the window. |
X-RateLimit-Remaining | Requests left in the window. |
X-RateLimit-Reset | Unix seconds when the window resets. |
Handling it
Back off on Retry-After and retry - don’t hammer. A simple exponential backoff that
honors Retry-After is plenty. If you’re regularly hitting your ceiling, a higher plan
raises the per-minute limit.