aicalcus.com
AI Cost3 min read

OpenAI API Pricing: The Complete Guide for 2025

GPT-4o, GPT-4o mini, o1, o3 — pricing changes every quarter. Here's every current price, what determines your bill, and how to cut costs by 60%.

AMAlex Morgan·
OpenAI API Pricing: The Complete Guide for 2025

OpenAI updates its pricing more often than most developers notice. If you're still budgeting based on prices from six months ago, your estimates could be off by 40%.

This guide covers every current model, what actually determines your bill, and the techniques that reliably cut API costs by 50-60%.

Current Pricing (May 2025)

ModelInput (per 1M tokens)Output (per 1M tokens)Context
GPT-4o$2.50$10.00128K
GPT-4o mini$0.15$0.60128K
o1$15.00$60.00200K
o3-mini$1.10$4.40200K
GPT-3.5 Turbo$0.50$1.5016K
text-embedding-3-small$0.02

Prompt caching (for repeated prefixes over 1,024 tokens) gives a 50% discount on cached input tokens.

What Actually Drives Your Bill

System prompts repeat on every request. A 500-token system prompt sent 10,000 times/day costs $12.50/day at GPT-4o prices — before any user input. Enable prompt caching and that drops to $6.25.

Conversation history compounds. In a 10-turn conversation, your 10th message carries the full previous 9 turns. The token count doesn't grow linearly — it compounds. By turn 10, you might be sending 5x the tokens of your first message.

Verbose models output more than you need. Models allowed to reason freely will often output 600-900 tokens where 200 would suffice. A simple max_tokens: 300 cap on routine tasks cuts output cost by 60%.

Prompt Caching: The Underused Feature

OpenAI's prompt caching automatically kicks in when:

  • The cached prefix is at least 1,024 tokens
  • The same prefix was used within the past 5-10 minutes

Best candidates for caching:

  • Long system prompts (instructions, personas, context documents)
  • RAG retrieved documents that don't change between calls
  • Few-shot examples prepended to every request

A 2,000-token system prompt sent 50,000 times/month: without caching costs $250/month. With caching (assuming 80% cache hit rate), costs $100/month.

Model Routing: The Biggest Lever

Most production workloads don't need GPT-4o for every request. A simple routing layer that sends:

  • Simple classification, extraction, formatting → GPT-4o mini ($0.15/$0.60)
  • Complex reasoning, generation → GPT-4o ($2.50/$10.00)

Reduces costs by 70-80% with less than 2% quality loss on most tasks.

Practical Cost Benchmarks

Use caseAvg tokens/requestGPT-4o costGPT-4o mini cost
Customer support reply800 in / 200 out$0.004$0.00024
Email draft600 in / 400 out$0.0055$0.00033
Code review1,500 in / 800 out$0.0455$0.00273
Document summary3,000 in / 500 out$0.0125$0.00075

At 10,000 requests/day, routing 80% to mini saves roughly $3,200/month on the customer support use case alone.

The Three Cuts That Work

  1. Cap max_tokens per task type. Simple tasks at 150 tokens max, complex at 500. Never leave it unlimited in production.

  2. Compress conversation history. After 6 turns, summarize older turns instead of appending them raw. Cuts context size by 60%.

  3. Cache your system prompt aggressively. Keep it static. Move dynamic content to the user message where caching doesn't apply anyway.

Use the GPT API Cost Calculator to model your specific usage before committing to a tier.

Get weekly AI cost benchmarks & productivity data

Join 4,200+ founders, developers, and creators. No spam, unsubscribe anytime.

#openai#gpt-4o#api-pricing#tokens#cost-optimization