Most companies don't know what they're spending on AI. They sign up for an API key, ship a feature, and get a surprise invoice three weeks later.
Here's the truth: a single GPT-4o request with a long context can cost $0.05 — meaning 100,000 requests costs $5,000. Per month. Just for one feature.
The Token Math Nobody Explains
OpenAI charges by the token, not by the request. One token ≈ 0.75 words. A typical ChatGPT conversation has 500-2,000 tokens per exchange.
| Usage level | Tokens/day | GPT-4o cost/month | GPT-4o mini cost/month |
|---|---|---|---|
| Side project | 50K | $6 | $0.40 |
| Startup feature | 500K | $60 | $4 |
| Growth product | 5M | $600 | $40 |
| Scale | 50M | $6,000 | $400 |
The gap between GPT-4o and GPT-4o mini is 15x. For most use cases — customer support, summarization, classification — mini is indistinguishable in output quality.
Why Your Bill Is Higher Than You Think
Three hidden cost drivers most teams miss:
1. System prompts counted every time. Your 500-token system prompt is charged on every single request, even if nothing changes. At 10,000 requests/day, that's 5M tokens per month in overhead alone.
2. Context window bloat. Conversation history grows with every exchange. A 10-turn chat includes all previous turns in the next request. GPT-4o charges for every token in context, every time.
3. Retry logic amplifies errors. Failed requests that auto-retry still consume tokens. A 5% error rate with 3 retries means 15% more API spend with zero additional value.
The Optimization Playbook
Use prompt caching. OpenAI offers 50% discount on cached tokens (identical prefix ≥ 1,024 tokens). Static system prompts are perfect candidates. Potential savings: $300/month on a $600 bill.
Route by task complexity. GPT-4o mini for routine tasks, GPT-4o only for complex reasoning. A simple router based on keyword heuristics can achieve 70/30 routing, cutting costs by 60%.
Batch non-real-time work. The Batch API offers 50% cost reduction for asynchronous processing — bulk analytics, overnight report generation, content moderation queues.
Compress context. Summarize older conversation history instead of passing raw text. Reduces context window by 40-60% on long conversations with minimal quality loss.
What It Actually Costs to Build an AI Chatbot
A customer support bot serving 1,000 users/day with average 5 messages per session:
- Total requests/day: 5,000
- Avg tokens/request: 800 (context + response)
- GPT-4o cost/month: ~$360
- GPT-4o mini cost/month: ~$25
- With caching and routing: ~$80-120/month
Use the calculator above to run your own numbers. The difference between naive implementation and an optimized stack at scale is often $10,000/month or more.