The 50% Discount You're Probably Missing
Most developers using OpenAI's API for high-volume workloads are paying full price when half-price is available. The Batch API has existed since 2024, accepts async workloads, and delivers results within 24 hours at exactly 50% off standard pricing.
For a company spending $10,000/month on GPT-4o Mini for classification tasks: switching to the Batch API saves $5,000/month — $60,000/year — with typically 30-60 minutes of engineering work.
What Qualifies as a Batch Workload
Batch processing is appropriate when:
-
Results aren't needed in real-time. The 24-hour SLA fits: nightly data enrichment, weekly content classification, periodic moderation runs, async report generation.
-
Request volume is high and predictable. The Batch API shines at 10K+ requests per batch. Overhead amortizes at scale.
-
Workload is embarrassingly parallel. Each request is independent — no request depends on the output of another request in the same batch.
Ideal batch workloads:
- Content moderation (classify 100K posts nightly)
- Data enrichment (add categories/tags to product catalog)
- Sentiment analysis on customer feedback
- Named entity extraction from documents
- Document summarization pipeline (process new uploads overnight)
- Translation of user-generated content
Poor fit for batch:
- Chatbots (users need immediate responses)
- Real-time recommendation systems
- Live classification (user sees result immediately)
- Anything where latency affects user experience
The OpenAI Batch API Implementation
Step 1: Create Your JSONL Input File
import json
requests = []
for i, document in enumerate(documents):
request = {
"custom_id": f"doc-{i}", # Your identifier
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "Classify the sentiment: positive, negative, or neutral."},
{"role": "user", "content": document["text"]}
],
"max_tokens": 10
}
}
requests.append(json.dumps(request))
with open("batch_input.jsonl", "w") as f:
f.write("\n".join(requests))
Step 2: Upload and Submit the Batch
from openai import OpenAI
client = OpenAI()
# Upload the file
with open("batch_input.jsonl", "rb") as f:
batch_file = client.files.create(file=f, purpose="batch")
# Create the batch
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)
print(f"Batch created: {batch.id}")
Step 3: Poll and Retrieve Results
import time
while True:
batch_status = client.batches.retrieve(batch.id)
if batch_status.status == "completed":
output_file = client.files.content(batch_status.output_file_id)
results = [json.loads(line) for line in output_file.text.strip().split("\n")]
break
elif batch_status.status in ["failed", "expired", "cancelled"]:
print(f"Batch failed: {batch_status.status}")
break
time.sleep(60) # Check every minute
Total implementation: ~50 lines of Python. Engineering time: 30-60 minutes.
Calculating Your Savings
Example: Content moderation pipeline
Current approach (real-time API):
- 200,000 moderation requests/day
- Avg 150 input tokens + 20 output tokens per request
- GPT-4o Mini pricing: $0.15/M input + $0.60/M output
- Daily cost: (200K × 150 / 1M × $0.15) + (200K × 20 / 1M × $0.60) = $4.50 + $2.40 = $6.90/day
- Monthly: $207
With Batch API (50% off):
- Daily cost: $6.90 × 0.50 = $3.45/day
- Monthly: $103.50
- Monthly savings: $103.50
At 1M requests/day: monthly savings = $517.50. At 10M requests/day: $5,175/month in savings.
Limits and Considerations
Batch API limits (as of 2025):
- 50,000 requests per batch file
- 90,000 tokens per minute queue throughput
- Results available within 24 hours (often much faster)
- 7-day expiration on pending batches
For workloads exceeding 50K requests, submit multiple batches concurrently. OpenAI allows multiple active batches.
Error handling: Individual requests can fail without failing the entire batch. Check error_file_id in the batch result for failed requests — reprocess those specifically.
Model availability: GPT-4o, GPT-4o Mini, and embedding models all support batch. GPT-4 (non-o) does not. o1 models have limited batch availability.
The Combined Optimization Strategy
Batch API is one lever. Combined with these optimizations:
| Optimization | Additional Savings |
|---|---|
| Batch API | 50% on qualifying workloads |
| Prompt caching (repeated system prompts) | 25-50% on cached input tokens |
| Model routing (Mini for simple tasks) | 30-80% on eligible requests |
| Output token reduction | 10-30% based on prompt engineering |
Stack all four and an initial $10,000/month bill can drop to $1,500-$2,000/month on the same workload volume. The engineering investment: 2-5 days total across all optimizations.
Use our AI Batch Processing Cost Calculator to estimate your exact savings before implementing.