Data · Updated May 2026
AI API Pricing Table 2026
Complete pricing for all major LLM providers — input tokens, output tokens, context windows, caching and batch discounts. Use our API cost calculator to estimate your monthly spend.
| Provider | Model | Input /1M tokens | Output /1M tokens | Context | Cache discount | Batch discount | Notes |
|---|---|---|---|---|---|---|---|
| OpenAI | GPT-4o | $2.50 | $10.00 | 128K | 50% | 50% | Best overall; vision included |
| OpenAI | GPT-4o mini | $0.15 | $0.60 | 128K | 50% | 50% | Best budget model from OpenAI |
| OpenAI | o3 | $10.00 | $40.00 | 200K | 50% | 50% | Reasoning model; high latency |
| Anthropic | Claude Sonnet 4 | $3.00 | $15.00 | 200K | 90% | 50% | Best prompt caching in class |
| Anthropic | Claude Haiku 3.5 | $0.80 | $4.00 | 200K | 90% | 50% | Fast & cheap; great for classification |
| Gemini 1.5 Pro | $3.50 | $10.50 | 2000K | 75% | — | Longest context window available | |
| Gemini 1.5 Flash | $0.07 | $0.30 | 1000K | 75% | — | Cheapest large-context option | |
| Meta / Together.ai | Llama 3.3 70B | $0.59 | $0.59 | 128K | — | — | Open-source; same price in/out |
| Groq | Llama 3.3 70B | $0.59 | $0.79 | 128K | — | — | Fastest inference available (~300 t/s) |
| Mistral | Mistral Large 2 | $2.00 | $6.00 | 128K | — | — | Strong multilingual; EU data residency |
| Cohere | Command R+ | $2.50 | $10.00 | 128K | — | — | RAG-optimized; grounding built-in |
| xAI | Grok 2 | $2.00 | $10.00 | 131K | — | — | Real-time X/Twitter data access |
💰 Cheapest per token
Gemini 1.5 Flash at $0.075/1M input tokens is the cheapest option for high-volume workloads with large context needs.
🏆 Best cached cost
Claude Haiku 3.5 with 90% cache discount drops to $0.08/1M cached input — best for apps with long, repeated system prompts.
⚡ Fastest inference
Groq Llama 3.3 70B delivers ~300 tokens/second — 10× faster than OpenAI. Same price, ideal for real-time apps.
Calculate your exact costs
Prices sourced directly from provider pricing pages. Last updated 2026-05-19. Prices change frequently — verify with provider before budgeting.