// FREE TOOL · COMPARE AI API COSTS

CostForge

AI API Cost Calculator

Compare Claude vs GPT vs Gemini API costs side-by-side. Calculate daily and monthly spend, find the cheapest model for your use case. Free, no sign-up.

Use Case Presets
100100K
5032K
1100K
Sort by:
💰 Cheapest
GPT-4.1 Nano
$12.00/mo
🔥 Most Expensive
o3
$1,200/mo
📊 Potential Savings
99%
by picking the right model
📈 Total Tokens/Day
2.5M
across 1,000 requests
ModelInput $/1MOutput $/1MPer RequestDaily CostMonthly Cost
💚
GPT-4.1 Nano
OpenAI · 1M ctx
$0.1
$0.4
$0.0004
$0.400
$12.00
BEST
🔵
Gemini 2.5 Flash-Lite
Google · 1M ctx
$0.1
$0.4
$0.0004
$0.400
$12.00
DeepSeek V4-Flash
DeepSeek · 1M ctx
$0.14
$0.28
$0.0004
$0.420
$12.60
Mistral Small 4
Mistral · 128K ctx
$0.15
$0.6
$0.0006
$0.600
$18.00
DeepSeek V4-Pro
DeepSeek · 1M ctx
$0.435
$0.87
$0.0013
$1.30
$39.15
🟩
GPT-4.1 Mini
OpenAI · 1M ctx
$0.4
$1.6
$0.0016
$1.60
$48.00
Mistral Large 3
Mistral · 128K ctx
$0.5
$1.5
$0.0018
$1.75
$52.50
Gemini 2.5 Flash
Google · 1M ctx
$0.3
$2.5
$0.0019
$1.85
$55.50
🤖
Grok 4.3
xAI · 131K ctx
$1.25
$2.5
$0.0037
$3.75
$113
🟤
Claude Haiku 4.5
Anthropic · 200K ctx
$1
$5
$0.0045
$4.50
$135
Mistral Medium 3.5
Mistral · 256K ctx
$1.5
$7.5
$0.0067
$6.75
$203
💎
Gemini 2.5 Pro
Google · 1M ctx
$1.25
$10
$0.0075
$7.50
$225
🟢
GPT-4.1
OpenAI · 1M ctx
$2
$8
$0.0080
$8.00
$240
GPT-4o
OpenAI · 128K ctx
$2.5
$10
$0.010
$10.00
$300
🟠
Claude Sonnet 4.6
Anthropic · 200K ctx
$3
$15
$0.013
$13.50
$405
🌟
GPT-5.5
OpenAI · 200K ctx
$5
$20
$0.020
$20.00
$600
🟡
Claude Opus 4.7
Anthropic · 200K ctx
$5
$25
$0.022
$22.50
$675
🧠
o3
OpenAI · 200K ctx
$10
$40
$0.040
$40.00
$1,200
Toggle Models

📝 Token Estimator

0 tokens
0 chars · ~4 chars/token

💰 Prompt Caching Savings

Enable Prompt Caching above to see savings per model. Claude offers up to 90% input cost reduction on cached system prompts.

💡 Cost Optimization Tips

Use Batch API
Most providers offer 50% discounts for batch/async requests. Toggle 'Batch Pricing' above to see savings.
Prompt Caching
Claude's prompt caching can reduce costs by 90% for repeated system prompts. Cache your system prompts and reuse them.
Right-Size Your Model
Use smaller models (Haiku, Mini, Flash) for simple tasks. Reserve flagship models for complex reasoning.
Reduce Token Usage
Compress prompts, use structured outputs (JSON mode), and limit max_tokens. Every token saved reduces cost.
Monitor & Set Budgets
Set hard spend limits in your API dashboard. Use rate limiting to prevent runaway costs during development.
Hybrid Routing
Route simple queries to cheap models and complex ones to flagship models. MCPlex Gateway supports this.

Why CostForge?

20 Models Compared

Side-by-side pricing for Claude, GPT-5.5, Gemini, DeepSeek V4, Grok 4.3, and Mistral — 6 providers in one view.

Real-Time Calculations

Drag sliders or type exact numbers. See per-request, daily, and monthly costs update instantly across all models.

6 Use Case Presets

Pre-configured token counts for Chatbot, Code Assistant, Data Processing, RAG Pipeline, Content Generation, and Summarization.

Batch vs Standard

Toggle between standard and batch pricing. See exactly how much you save with async/batch processing for each provider.

Visual Cost Bars

Horizontal bars make it easy to compare relative costs at a glance. The cheapest model is highlighted with a BEST badge.

Optimization Tips

Expert advice on reducing API costs: prompt caching, model routing, token compression, and budget monitoring strategies.

Always Up-to-Date

Pricing data is based on latest published rates from Anthropic, OpenAI, Google, DeepSeek, xAI, and Mistral as of May 2026.

100% Private

All calculations happen in your browser. No data is sent anywhere. No account needed, no tracking.

Frequently Asked Questions

How accurate are these prices?

Prices are based on the latest published pricing from Anthropic, OpenAI, Google, DeepSeek, xAI, and Mistral as of May 2026. We update regularly, but always verify against the official pricing pages for the most current rates. Actual costs may vary with prompt caching, commitments, or volume discounts.

What's the difference between standard and batch pricing?

Batch/async pricing is typically 50% cheaper but has higher latency (responses delivered within hours, not seconds). Use batch for non-real-time tasks like data processing, analysis, and bulk content generation.

How do I estimate my token usage?

A rough rule: 1 token ≈ 4 characters in English. A typical user message is 50-500 tokens, a code file is 1,000-5,000 tokens, and a long document is 10,000+ tokens. Output tokens are usually 20-50% of input tokens for most tasks.

Which model should I choose?

For simple Q&A and classification: use DeepSeek V4-Flash/Gemini Flash-Lite/Mistral Small (cheapest). For coding and analysis: use Sonnet 4.6/GPT-4.1/Grok 4.3 (best value). For complex reasoning: use Opus 4.7/GPT-5.5/o3/Gemini Pro (most capable). Always benchmark on your specific use case.

Does this include hidden costs like network, storage, etc.?

No. CostForge calculates pure API token costs only — what the providers charge per input/output token. Other costs like network bandwidth, compute for pre/post-processing, and database storage are separate and depend on your infrastructure.

Watch: 139x Rust Speedup