Why Rate Limits Are a Product Decision, Not Just an Engineering One
Most engineers treat rate limiting as an infrastructure concern: protect the backend from abuse. That's half of it.
Rate limits also:
- Define your product tiers (free: 100 req/day, pro: 10,000 req/day)
- Create natural conversion triggers (user hits limit → upgrade prompt)
- Signal product value (high limits imply confidence in your infrastructure)
- Generate friction that filters out low-quality free users
Setting rate limits requires understanding both your infrastructure capacity and your business model. Get the limits wrong in one direction: you get DDoS'd or your database melts. Get them wrong in the other direction: free users never hit limits, never upgrade, and your server bill grows indefinitely.
The Rate Limit Hierarchy
Well-designed APIs have layered rate limits:
1. Per-second burst limit The "spike absorber." Prevents single users from overwhelming the system in a short window.
- Typical: 10-50 req/sec for free, 50-500 for paid
- Implementation: token bucket or leaky bucket algorithm
2. Per-minute limit The primary operational limit for most API use cases.
- Typical: 60-300 req/min for free, 600-6,000 for paid
- Implementation: sliding window counter
3. Daily limit The conversion driver. The limit users actually notice and upgrade to increase.
- Typical: 100-1,000 req/day for free, 10,000-1,000,000+ for paid
- Implementation: Redis counter with TTL reset at midnight UTC
4. Monthly limit (usage-based pricing) The billing limit. Exceeding this triggers either hard cutoff or additional charges.
- Typical: Varies by product; always clearly communicated at signup
Calculating Your Rate Limits
Start from infrastructure capacity, work backward to tier limits:
Infrastructure capacity calculation:
If your API can handle 1,000 req/sec at baseline load (measured, not assumed):
- Reserve 30% headroom for traffic spikes: usable capacity = 700 req/sec
- Reserve 20% for internal traffic (monitoring, jobs): 560 req/sec for customers
- At 10,000 paying customers averaging 5 req/sec each: 50,000 req/sec needed
This reveals a capacity problem early — before rate limits are needed for protection, you need infrastructure scaling. Rate limits protect existing capacity; they don't substitute for insufficient capacity.
Tier design calculation:
- Identify your target conversion metric: "X% of free users should hit the daily limit within 30 days"
- Measure actual free user request patterns (P50, P90, P99)
- Set free limit between P50 and P90 — blocks heavy free users who won't pay, allows light users to explore
Example: Free users average 42 req/day (P50), 180 req/day (P90).
- Limit at 100/day: 50% of free users hit limit regularly → good conversion pressure
- Limit at 500/day: only 5% hit limit → no conversion pressure, high server cost
The Rate Limit Response Design
How you communicate rate limits matters as much as the limits themselves.
Good rate limit responses:
HTTP 429 Too Many Requests
{
"error": "rate_limit_exceeded",
"message": "You've used 100/100 daily requests. Resets in 4h 23m.",
"limit": 100,
"remaining": 0,
"reset": "2025-03-15T04:00:00Z",
"upgrade_url": "https://yourproduct.com/pricing"
}
Always include in rate limit headers:
X-RateLimit-Limit: The limitX-RateLimit-Remaining: Requests remainingX-RateLimit-Reset: When the limit resets (Unix timestamp)
Developers will check these headers in their code to throttle automatically. Without them, they'll retry blindly and make your rate limit problem worse.
Retry-After header: In the 429 response, include Retry-After: 3600 (seconds until reset). Well-behaved API clients will back off automatically.
Rate Limit Tiers for Common Monetization Models
Freemium developer API:
| Tier | Req/min | Req/day | Price |
|---|---|---|---|
| Free | 30 | 500 | $0 |
| Starter | 300 | 10,000 | $49/mo |
| Pro | 1,000 | 100,000 | $149/mo |
| Scale | 5,000 | Unlimited | $499/mo |
Usage-based API:
| Tier | Price | Overage |
|---|---|---|
| Pay-as-you-go | $0.01/req | Same |
| Volume 100K | $800/mo (included) | $0.008/req |
| Volume 1M | $7,000/mo | $0.007/req |
Enterprise API: Custom limits, SLA guarantees, dedicated infrastructure. Never put hard rate limits on enterprise — negotiate appropriate limits in contract.
Soft Limits vs. Hard Limits
Hard limits: API returns 429 when limit is hit. Clean, predictable, simple to implement.
Soft limits: API continues working past the limit but at degraded speed, or sends warning but doesn't block. Better for user experience, harder to implement consistently.
Hybrid (recommended for most SaaS):
- Free tier: hard limit at daily quota
- Paid tiers: soft limit with notification at 80%, hard limit at 150% of quota
- Enterprise: no hard limit; overage billing
The hybrid approach prevents paid customers from experiencing hard outages while still capping the most extreme usage.
Common Rate Limit Mistakes
Resetting at midnight UTC (same time for everyone): Creates a thundering herd when every user who hit limits simultaneously gets reset at 00:00 UTC. Solution: rolling windows or stagger reset times.
Not rate limiting by endpoint: Your /search endpoint may be 100x more expensive than your /status endpoint. Apply limits per-endpoint or per-cost-unit, not just per-request.
No communication before hitting limits: Users should see usage warnings at 75% and 90% of their limit. Surprise rate limit errors create support tickets. Visible progress meters create upgrade moments.
Too aggressive on new signups: New users exploring your API will hit limits immediately and churn before seeing the product value. Consider: unlimited for the first 7 days, then limits kick in.
Use our API Rate Limit Calculator to calculate sustainable rate limits for your infrastructure capacity and convert those into pricing tier designs.