OpenAI API Pricing: The Complete Guide for 2025

OpenAI updates its pricing more often than most developers notice. If you're still budgeting based on prices from six months ago, your estimates could be off by 40%.

This guide covers every current model, what actually determines your bill, and the techniques that reliably cut API costs by 50-60%.

Current Pricing (May 2025)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context
GPT-4o	$2.50	$10.00	128K
GPT-4o mini	$0.15	$0.60	128K
o1	$15.00	$60.00	200K
o3-mini	$1.10	$4.40	200K
GPT-3.5 Turbo	$0.50	$1.50	16K
text-embedding-3-small	$0.02	—	—

Prompt caching (for repeated prefixes over 1,024 tokens) gives a 50% discount on cached input tokens.

What Actually Drives Your Bill

System prompts repeat on every request. A 500-token system prompt sent 10,000 times/day costs $12.50/day at GPT-4o prices — before any user input. Enable prompt caching and that drops to $6.25.

Conversation history compounds. In a 10-turn conversation, your 10th message carries the full previous 9 turns. The token count doesn't grow linearly — it compounds. By turn 10, you might be sending 5x the tokens of your first message.

Verbose models output more than you need. Models allowed to reason freely will often output 600-900 tokens where 200 would suffice. A simple max_tokens: 300 cap on routine tasks cuts output cost by 60%.

Prompt Caching: The Underused Feature

OpenAI's prompt caching automatically kicks in when:

The cached prefix is at least 1,024 tokens
The same prefix was used within the past 5-10 minutes

Best candidates for caching:

Long system prompts (instructions, personas, context documents)
RAG retrieved documents that don't change between calls
Few-shot examples prepended to every request

A 2,000-token system prompt sent 50,000 times/month: without caching costs $250/month. With caching (assuming 80% cache hit rate), costs $100/month.

Model Routing: The Biggest Lever

Most production workloads don't need GPT-4o for every request. A simple routing layer that sends:

Simple classification, extraction, formatting → GPT-4o mini ($0.15/$0.60)
Complex reasoning, generation → GPT-4o ($2.50/$10.00)

Reduces costs by 70-80% with less than 2% quality loss on most tasks.

Practical Cost Benchmarks

Use case	Avg tokens/request	GPT-4o cost	GPT-4o mini cost
Customer support reply	800 in / 200 out	$0.004	$0.00024
Email draft	600 in / 400 out	$0.0055	$0.00033
Code review	1,500 in / 800 out	$0.0455	$0.00273
Document summary	3,000 in / 500 out	$0.0125	$0.00075

At 10,000 requests/day, routing 80% to mini saves roughly $3,200/month on the customer support use case alone.

The Three Cuts That Work

Cap max_tokens per task type. Simple tasks at 150 tokens max, complex at 500. Never leave it unlimited in production.
Compress conversation history. After 6 turns, summarize older turns instead of appending them raw. Cuts context size by 60%.
Cache your system prompt aggressively. Keep it static. Move dynamic content to the user message where caching doesn't apply anyway.

Use the GPT API Cost Calculator to model your specific usage before committing to a tier.