Last updated 2026-05-13. Pricing, model names, and provider policies change frequently.
Quick answer
LLM API pricing is usually driven by input tokens, output tokens, model tier, request volume, caching, and any platform or routing fees. Output tokens often cost more than input tokens, so verbose answers can be the biggest cost driver.
Use this guide when
You need a first budget before launch
Use this guide when you need to translate vague product usage into a range that finance, founders, or procurement can react to.
Token tables are not making the decision clearer
This article helps when provider price pages look precise but still do not tell you what your monthly bill will actually look like.
You are comparing official APIs with routers or cheap open-model routes
It is especially useful when you are weighing direct provider cost against marketplace, routing, or open-model tradeoffs.
Token pricing basics
Most providers charge per million input tokens and per million output tokens. Input tokens include your system prompt, user message, retrieved context, tool schemas, and conversation history. Output tokens are what the model generates.
Why monthly cost surprises teams
Costs grow when prompts include long retrieved documents, chat history is replayed every turn, agents call tools repeatedly, or users generate longer outputs than expected. Request count alone is not enough to estimate spend.
Aggregator and gateway fees
Aggregators can simplify provider access, but you need to understand whether pricing is pass-through, marked up, credit-based, or plan-based. Always compare total cost, not only model token rates.
Example decision paths
Chatbot looks cheap until output grows
A product team may estimate a chatbot on short prompts, then discover the real cost driver is long assistant answers and replayed history across every conversation turn.
RAG system hides prompt cost
A retrieval app may focus on the model price and miss the fact that long retrieved context and repeated system instructions are adding most of the bill.
DeepSeek-V4 changes the cost baseline
DeepSeek-V4-Flash and the currently discounted DeepSeek-V4-Pro can make official-API pricing look very different from the older assumption that only third-party open-model routes are cheap.
Cheap provider still loses on total cost
A lower token rate can still become more expensive overall if retries, weak formatting, or longer responses create more product and engineering overhead.
Provider examples to compare
| Provider | Category | Supported models | OpenAI-compatible | Starting price | Context | Tool calling | Vision | Streaming | Status | Trust | Links |
|---|---|---|---|---|---|---|---|---|---|---|---|
| OpenAI | Official APIs | GPT, reasoning models, embeddings, image | Yes | Budget to premium GPT tiers | Short to very long, model based | Yes | Yes | Yes | Available | 12/15 | |
| Anthropic | Official APIs | Claude, Claude Haiku, Claude Sonnet, Claude Opus | No | Mid to premium Claude tiers | Long context options | Yes | Yes | Yes | Available | 10/15 | |
| DeepSeek | Official APIs | DeepSeek-V4-Flash, DeepSeek-V4-Pro | Yes | Low-cost flash to discounted pro tiers | 1M context, up to 384K output | Yes | No | Yes | Available | 11/15 | |
| Google Gemini | Official APIs | Gemini, embedding models, multimodal models | Yes | Low-cost flash to premium tiers | Short to million-token-class options | Yes | Yes | Yes | Available | 11/15 | |
| DeepInfra | Inference Providers | Llama, Qwen, DeepSeek-V4, Mistral | Yes | Often low for open models | Broad open-model range, model specific | No | Yes | Yes | Available | 10/15 | |
| OpenRouter | LLM API Aggregators | GPT, Claude, Gemini, DeepSeek-V4 | Yes | Varies by model route | Model dependent across upstream routes | No | Yes | Yes | Available | 11/15 |
Compare next
Checklist
- Estimate input and output tokens separately.
- Include hidden prompt tokens: system prompts, tool schemas, retrieved context, and history.
- Model peak usage, not just average daily traffic.
- Review cache discounts, batch pricing, and volume commitments carefully.
Recommended next step
Use the calculator with realistic token sizes, then compare cheaper alternatives only if quality remains acceptable.
FAQ
Why are output tokens often more expensive?
Generation consumes inference time and capacity. Providers commonly price generated tokens higher than prompt tokens.
Is price per million tokens enough to compare providers?
No. You also need quality, latency, retry rate, context needs, support, and platform fees.
How can I reduce cost without hurting quality?
Shorten prompts, cache repeated context, route simple tasks to cheaper models, and measure quality with evals.