LLMEndpoint

LLM API Pricing Comparison

Compare pricing assumptions across official APIs, inference platforms, aggregators, and OpenAI-compatible endpoints.

What are you optimizing for first?

Use this lens when long prompts, retrieval context, or repeated system instructions dominate cost.

Short answer

LLM API pricing is only useful when tied to your workload. The cheapest token rate is not automatically the lowest total cost once output length, retries, routing, and support burden are included. DeepSeek-V4-Flash and DeepSeek-V4-Pro are also a reminder that official APIs can now enter the same low-cost conversation that many teams used to reserve for open-model platforms.

Provider Pricing Table

Use this as a starting point; verify final rates on official pricing pages.

ProviderCategorySupported modelsOpenAI-compatibleStarting priceContextTool callingVisionStreamingStatusTrustLinks
OpenAIOfficial APIsGPT, reasoning models, embeddings, imageYesBudget to premium GPT tiersShort to very long, model basedYesYesYesAvailable12/15
AnthropicOfficial APIsClaude, Claude Haiku, Claude Sonnet, Claude OpusNoMid to premium Claude tiersLong context optionsYesYesYesAvailable10/15
Google GeminiOfficial APIsGemini, embedding models, multimodal modelsYesLow-cost flash to premium tiersShort to million-token-class optionsYesYesYesAvailable11/15
Mistral AIOfficial APIsMistral, Mixtral, Codestral, embeddingsYesOpen and premium model tiersShort to long, model basedYesNoYesAvailable11/15
DeepSeekOfficial APIsDeepSeek-V4-Flash, DeepSeek-V4-ProYesLow-cost flash to discounted pro tiers1M context, up to 384K outputYesNoYesAvailable11/15
xAIOfficial APIsGrokYesFrontier-model pricing tiersMid to long, model basedYesYesYesAvailable11/15
CohereOfficial APIsCommand, Embed, RerankNoEnterprise and task-specific tiersTask and model basedYesNoYesAvailable10/15
Together AIInference ProvidersLlama, Qwen, DeepSeek-V4, MistralYesOften competitive for open modelsBroad open-model rangeNoYesYesAvailable11/15
Fireworks AIInference ProvidersLlama, Qwen, DeepSeek-V4, MistralYesCompetitive serverless tiers for open modelsBroad open-model range, model specificNoYesYesAvailable11/15
GroqInference ProvidersLlama, Mixtral, Gemma, Whisper-like speech modelsYesSpeed-oriented model tiersSelected fast-serving model range, model specificYesNoYesAvailable11/15
DeepInfraInference ProvidersLlama, Qwen, DeepSeek-V4, MistralYesOften low for open modelsBroad open-model range, model specificNoYesYesAvailable10/15
ReplicateInference Providersopen models, image models, audio models, video modelsNoRuntime dependentModel dependentNoYesYesAvailable10/15
BasetenInference Providerscustom models, open modelsNoDeployment dependentModel dependentNoYesYesAvailable10/15
OpenRouterLLM API AggregatorsGPT, Claude, Gemini, DeepSeek-V4YesVaries by model routeModel dependent across upstream routesNoYesYesAvailable11/15
PortkeyLLM API AggregatorsGPT, Claude, Gemini, DeepSeek-V4YesPlan dependent plus provider spendProvider dependentNoYesYesAvailable11/15
LiteLLM CloudLLM API AggregatorsGPT, Claude, Gemini, MistralYesPlan dependentProvider dependentNoYesYesUnclear10/15
HeliconeLLM API Aggregatorsprovider dependentYesPlan dependentProvider dependentNoYesYesAvailable11/15
Perplexity APIOpenAI-Compatible APIsSonar, online modelsYesVaries by modelModel dependentNoNoYesAvailable10/15
Novita AIOpenAI-Compatible APIsLlama, Qwen, DeepSeek-V4, image modelsYesVaries by modelModel dependentNoYesYesUnclear10/15
AI/ML APIOpenAI-Compatible APIsGPT-style models, Claude-style access, Gemini-style access, open modelsYesVaries by modelModel dependentNoYesYesUnclear9/15
Anyscale EndpointsInference Providersopen models, custom deploymentsYesUnclearModel dependentNoNoYesUnclear10/15
Voyage AIOfficial APIsembeddings, rerankersNoVaries by modelModel dependentNoNoNoUnclear9/15

Input vs output tokens

Most APIs charge separately for tokens you send and tokens the model generates. Output tokens are often more expensive.

Official API vs aggregator pricing

Aggregators can simplify access and routing, but pricing may include upstream model costs, routing choices, and platform fees. DeepSeek-V4 is a useful benchmark here because it shows how an official API can sometimes compete directly with cheaper marketplace assumptions.

Cheapest provider caveat

Low token cost is not enough. Check model quality, latency, rate limits, support, and transparency before production use.

Sample Cost Models

Editable defaults used by the calculator.

OpenAI fast general model

OpenAI

$0.15 input / $0.6 output per 1M tokens

OpenAI premium model

OpenAI

$2.5 input / $10 output per 1M tokens

Claude balanced model

Anthropic

$3 input / $15 output per 1M tokens

DeepSeek V4 Flash

DeepSeek

$0.14 input / $0.28 output per 1M tokens

DeepSeek V4 Pro

DeepSeek

$0.435 input / $0.87 output per 1M tokens

Gemini fast model

Google Gemini

$0.1 input / $0.4 output per 1M tokens

DeepInfra low-cost route

DeepInfra

$0.05 input / $0.2 output per 1M tokens

OpenRouter mixed route

OpenRouter

$0.2 input / $0.8 output per 1M tokens

Lower-cost providers to review next

Only consider these cheaper routes after checking output quality, latency, and trust signals.

Official APIs

DeepSeek

Official DeepSeek API for the DeepSeek-V4 family, with OpenAI and Anthropic compatible formats plus very large context windows.

Models: DeepSeek-V4-Flash, DeepSeek-V4-Pro

cost-effective long-context appsLow-cost flash to discounted pro tiers1M context, up to 384K output
Yes OpenAI-compatibleTool callingTrust 11/15
Inference Providers

Groq

Inference provider known for very fast LPU-backed serving of selected open and partner models.

Models: Llama, Mixtral, Gemma

low-latency chatSpeed-oriented model tiersSelected fast-serving model range, model specific
Yes OpenAI-compatibleTool callingTrust 11/15
Inference Providers

DeepInfra

Serverless inference platform with a broad model catalog and OpenAI-compatible endpoints for many models.

Models: Llama, Qwen, DeepSeek-V4

low-cost open model inferenceOften low for open modelsBroad open-model range, model specific
Yes OpenAI-compatibleNo tool calling listedTrust 10/15

Turn pricing into an estimate.

Use request volume, input/output tokens, users, and cache assumptions.

Open cost calculator