Data · Updated May 2026

AI API Pricing Table 2026

Complete pricing for all major LLM providers — input tokens, output tokens, context windows, caching and batch discounts. Use our API cost calculator to estimate your monthly spend.

Provider	Model	Input /1M tokens	Output /1M tokens	Context	Cache discount	Batch discount	Notes
OpenAI	GPT-4o	$2.50	$10.00	128K	50%	50%	Best overall; vision included
OpenAI	GPT-4o mini	$0.15	$0.60	128K	50%	50%	Best budget model from OpenAI
OpenAI	o3	$10.00	$40.00	200K	50%	50%	Reasoning model; high latency
Anthropic	Claude Sonnet 4	$3.00	$15.00	200K	90%	50%	Best prompt caching in class
Anthropic	Claude Haiku 3.5	$0.80	$4.00	200K	90%	50%	Fast & cheap; great for classification
Google	Gemini 1.5 Pro	$3.50	$10.50	2000K	75%	—	Longest context window available
Google	Gemini 1.5 Flash	$0.07	$0.30	1000K	75%	—	Cheapest large-context option
Meta / Together.ai	Llama 3.3 70B	$0.59	$0.59	128K	—	—	Open-source; same price in/out
Groq	Llama 3.3 70B	$0.59	$0.79	128K	—	—	Fastest inference available (~300 t/s)
Mistral	Mistral Large 2	$2.00	$6.00	128K	—	—	Strong multilingual; EU data residency
Cohere	Command R+	$2.50	$10.00	128K	—	—	RAG-optimized; grounding built-in
xAI	Grok 2	$2.00	$10.00	131K	—	—	Real-time X/Twitter data access

💰 Cheapest per token

Gemini 1.5 Flash at $0.075/1M input tokens is the cheapest option for high-volume workloads with large context needs.

🏆 Best cached cost

Claude Haiku 3.5 with 90% cache discount drops to $0.08/1M cached input — best for apps with long, repeated system prompts.

⚡ Fastest inference

Groq Llama 3.3 70B delivers ~300 tokens/second — 10× faster than OpenAI. Same price, ideal for real-time apps.

Calculate your exact costs

→ GPT API Cost Calculator → Claude Token Cost Calculator → LLM Cost Comparison → AI Inference Cost Calculator → Chatbot Pricing Calculator → Cost per MAU Calculator

Prices sourced directly from provider pricing pages. Last updated 2026-05-19. Prices change frequently — verify with provider before budgeting.