OpenAI updates its pricing more often than most developers notice. If you're still budgeting based on prices from six months ago, your estimates could be off by 40%.
This guide covers every current model, what actually determines your bill, and the techniques that reliably cut API costs by 50-60%.
Current Pricing (May 2025)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128K |
| GPT-4o mini | $0.15 | $0.60 | 128K |
| o1 | $15.00 | $60.00 | 200K |
| o3-mini | $1.10 | $4.40 | 200K |
| GPT-3.5 Turbo | $0.50 | $1.50 | 16K |
| text-embedding-3-small | $0.02 | — | — |
Prompt caching (for repeated prefixes over 1,024 tokens) gives a 50% discount on cached input tokens.
What Actually Drives Your Bill
System prompts repeat on every request. A 500-token system prompt sent 10,000 times/day costs $12.50/day at GPT-4o prices — before any user input. Enable prompt caching and that drops to $6.25.
Conversation history compounds. In a 10-turn conversation, your 10th message carries the full previous 9 turns. The token count doesn't grow linearly — it compounds. By turn 10, you might be sending 5x the tokens of your first message.
Verbose models output more than you need. Models allowed to reason freely will often output 600-900 tokens where 200 would suffice. A simple max_tokens: 300 cap on routine tasks cuts output cost by 60%.
Prompt Caching: The Underused Feature
OpenAI's prompt caching automatically kicks in when:
- The cached prefix is at least 1,024 tokens
- The same prefix was used within the past 5-10 minutes
Best candidates for caching:
- Long system prompts (instructions, personas, context documents)
- RAG retrieved documents that don't change between calls
- Few-shot examples prepended to every request
A 2,000-token system prompt sent 50,000 times/month: without caching costs $250/month. With caching (assuming 80% cache hit rate), costs $100/month.
Model Routing: The Biggest Lever
Most production workloads don't need GPT-4o for every request. A simple routing layer that sends:
- Simple classification, extraction, formatting → GPT-4o mini ($0.15/$0.60)
- Complex reasoning, generation → GPT-4o ($2.50/$10.00)
Reduces costs by 70-80% with less than 2% quality loss on most tasks.
Practical Cost Benchmarks
| Use case | Avg tokens/request | GPT-4o cost | GPT-4o mini cost |
|---|---|---|---|
| Customer support reply | 800 in / 200 out | $0.004 | $0.00024 |
| Email draft | 600 in / 400 out | $0.0055 | $0.00033 |
| Code review | 1,500 in / 800 out | $0.0455 | $0.00273 |
| Document summary | 3,000 in / 500 out | $0.0125 | $0.00075 |
At 10,000 requests/day, routing 80% to mini saves roughly $3,200/month on the customer support use case alone.
The Three Cuts That Work
-
Cap
max_tokensper task type. Simple tasks at 150 tokens max, complex at 500. Never leave it unlimited in production. -
Compress conversation history. After 6 turns, summarize older turns instead of appending them raw. Cuts context size by 60%.
-
Cache your system prompt aggressively. Keep it static. Move dynamic content to the user message where caching doesn't apply anyway.
Use the GPT API Cost Calculator to model your specific usage before committing to a tier.