Input vs output tokens
Most APIs charge separately for tokens you send and tokens the model generates. Output tokens are often more expensive.
LLMEndpoint
Compare pricing assumptions across official APIs, inference platforms, aggregators, and OpenAI-compatible endpoints.
Use this lens when long prompts, retrieval context, or repeated system instructions dominate cost.
LLM API pricing is only useful when tied to your workload. The cheapest token rate is not automatically the lowest total cost once output length, retries, routing, and support burden are included. DeepSeek-V4-Flash and DeepSeek-V4-Pro are also a reminder that official APIs can now enter the same low-cost conversation that many teams used to reserve for open-model platforms.
Use this as a starting point; verify final rates on official pricing pages.
| Provider | Category | Supported models | OpenAI-compatible | Starting price | Context | Tool calling | Vision | Streaming | Status | Trust | Links |
|---|---|---|---|---|---|---|---|---|---|---|---|
| OpenAI | Official APIs | GPT, reasoning models, embeddings, image | Yes | Budget to premium GPT tiers | Short to very long, model based | Yes | Yes | Yes | Available | 12/15 | |
| Anthropic | Official APIs | Claude, Claude Haiku, Claude Sonnet, Claude Opus | No | Mid to premium Claude tiers | Long context options | Yes | Yes | Yes | Available | 10/15 | |
| Google Gemini | Official APIs | Gemini, embedding models, multimodal models | Yes | Low-cost flash to premium tiers | Short to million-token-class options | Yes | Yes | Yes | Available | 11/15 | |
| Mistral AI | Official APIs | Mistral, Mixtral, Codestral, embeddings | Yes | Open and premium model tiers | Short to long, model based | Yes | No | Yes | Available | 11/15 | |
| DeepSeek | Official APIs | DeepSeek-V4-Flash, DeepSeek-V4-Pro | Yes | Low-cost flash to discounted pro tiers | 1M context, up to 384K output | Yes | No | Yes | Available | 11/15 | |
| xAI | Official APIs | Grok | Yes | Frontier-model pricing tiers | Mid to long, model based | Yes | Yes | Yes | Available | 11/15 | |
| Cohere | Official APIs | Command, Embed, Rerank | No | Enterprise and task-specific tiers | Task and model based | Yes | No | Yes | Available | 10/15 | |
| Together AI | Inference Providers | Llama, Qwen, DeepSeek-V4, Mistral | Yes | Often competitive for open models | Broad open-model range | No | Yes | Yes | Available | 11/15 | |
| Fireworks AI | Inference Providers | Llama, Qwen, DeepSeek-V4, Mistral | Yes | Competitive serverless tiers for open models | Broad open-model range, model specific | No | Yes | Yes | Available | 11/15 | |
| Groq | Inference Providers | Llama, Mixtral, Gemma, Whisper-like speech models | Yes | Speed-oriented model tiers | Selected fast-serving model range, model specific | Yes | No | Yes | Available | 11/15 | |
| DeepInfra | Inference Providers | Llama, Qwen, DeepSeek-V4, Mistral | Yes | Often low for open models | Broad open-model range, model specific | No | Yes | Yes | Available | 10/15 | |
| Replicate | Inference Providers | open models, image models, audio models, video models | No | Runtime dependent | Model dependent | No | Yes | Yes | Available | 10/15 | |
| Baseten | Inference Providers | custom models, open models | No | Deployment dependent | Model dependent | No | Yes | Yes | Available | 10/15 | |
| OpenRouter | LLM API Aggregators | GPT, Claude, Gemini, DeepSeek-V4 | Yes | Varies by model route | Model dependent across upstream routes | No | Yes | Yes | Available | 11/15 | |
| Portkey | LLM API Aggregators | GPT, Claude, Gemini, DeepSeek-V4 | Yes | Plan dependent plus provider spend | Provider dependent | No | Yes | Yes | Available | 11/15 | |
| LiteLLM Cloud | LLM API Aggregators | GPT, Claude, Gemini, Mistral | Yes | Plan dependent | Provider dependent | No | Yes | Yes | Unclear | 10/15 | |
| Helicone | LLM API Aggregators | provider dependent | Yes | Plan dependent | Provider dependent | No | Yes | Yes | Available | 11/15 | |
| Perplexity API | OpenAI-Compatible APIs | Sonar, online models | Yes | Varies by model | Model dependent | No | No | Yes | Available | 10/15 | |
| Novita AI | OpenAI-Compatible APIs | Llama, Qwen, DeepSeek-V4, image models | Yes | Varies by model | Model dependent | No | Yes | Yes | Unclear | 10/15 | |
| AI/ML API | OpenAI-Compatible APIs | GPT-style models, Claude-style access, Gemini-style access, open models | Yes | Varies by model | Model dependent | No | Yes | Yes | Unclear | 9/15 | |
| Anyscale Endpoints | Inference Providers | open models, custom deployments | Yes | Unclear | Model dependent | No | No | Yes | Unclear | 10/15 | |
| Voyage AI | Official APIs | embeddings, rerankers | No | Varies by model | Model dependent | No | No | No | Unclear | 9/15 |
Turn broad pricing research into a more realistic buying decision.
Start with request count, input tokens, output tokens, and cache assumptions instead of relying on a single token-rate headline.
WorkflowLearn how output length, repeated context, routing, and retries can matter more than the homepage price table.
WorkflowCompare lower-cost routes only after checking quality, support, and trust posture for your real workload.
Most APIs charge separately for tokens you send and tokens the model generates. Output tokens are often more expensive.
Aggregators can simplify access and routing, but pricing may include upstream model costs, routing choices, and platform fees. DeepSeek-V4 is a useful benchmark here because it shows how an official API can sometimes compete directly with cheaper marketplace assumptions.
Low token cost is not enough. Check model quality, latency, rate limits, support, and transparency before production use.
Editable defaults used by the calculator.
OpenAI
$0.15 input / $0.6 output per 1M tokens
OpenAI
$2.5 input / $10 output per 1M tokens
Anthropic
$3 input / $15 output per 1M tokens
DeepSeek
$0.14 input / $0.28 output per 1M tokens
DeepSeek
$0.435 input / $0.87 output per 1M tokens
Google Gemini
$0.1 input / $0.4 output per 1M tokens
DeepInfra
$0.05 input / $0.2 output per 1M tokens
OpenRouter
$0.2 input / $0.8 output per 1M tokens
Only consider these cheaper routes after checking output quality, latency, and trust signals.
Official DeepSeek API for the DeepSeek-V4 family, with OpenAI and Anthropic compatible formats plus very large context windows.
Models: DeepSeek-V4-Flash, DeepSeek-V4-Pro
Inference provider known for very fast LPU-backed serving of selected open and partner models.
Models: Llama, Mixtral, Gemma
Serverless inference platform with a broad model catalog and OpenAI-compatible endpoints for many models.
Models: Llama, Qwen, DeepSeek-V4
Use request volume, input/output tokens, users, and cache assumptions.