aicalcus.com
Data · Updated May 2026

AI API Pricing Table 2026

Complete pricing for all major LLM providers — input tokens, output tokens, context windows, caching and batch discounts. Use our API cost calculator to estimate your monthly spend.

ProviderModelInput /1M tokensOutput /1M tokensContextCache discountBatch discountNotes
OpenAIGPT-4o$2.50$10.00128K50%50%Best overall; vision included
OpenAIGPT-4o mini$0.15$0.60128K50%50%Best budget model from OpenAI
OpenAIo3$10.00$40.00200K50%50%Reasoning model; high latency
AnthropicClaude Sonnet 4$3.00$15.00200K90%50%Best prompt caching in class
AnthropicClaude Haiku 3.5$0.80$4.00200K90%50%Fast & cheap; great for classification
GoogleGemini 1.5 Pro$3.50$10.502000K75%Longest context window available
GoogleGemini 1.5 Flash$0.07$0.301000K75%Cheapest large-context option
Meta / Together.aiLlama 3.3 70B$0.59$0.59128KOpen-source; same price in/out
GroqLlama 3.3 70B$0.59$0.79128KFastest inference available (~300 t/s)
MistralMistral Large 2$2.00$6.00128KStrong multilingual; EU data residency
CohereCommand R+$2.50$10.00128KRAG-optimized; grounding built-in
xAIGrok 2$2.00$10.00131KReal-time X/Twitter data access

💰 Cheapest per token

Gemini 1.5 Flash at $0.075/1M input tokens is the cheapest option for high-volume workloads with large context needs.

🏆 Best cached cost

Claude Haiku 3.5 with 90% cache discount drops to $0.08/1M cached input — best for apps with long, repeated system prompts.

⚡ Fastest inference

Groq Llama 3.3 70B delivers ~300 tokens/second — 10× faster than OpenAI. Same price, ideal for real-time apps.

Prices sourced directly from provider pricing pages. Last updated 2026-05-19. Prices change frequently — verify with provider before budgeting.