LLMEndpoint

LLM API Pricing Comparison

Compare pricing assumptions across official APIs, inference platforms, aggregators, and OpenAI-compatible endpoints.

What are you optimizing for first?

Use this lens when long prompts, retrieval context, or repeated system instructions dominate cost.

DeepSeekLower input-token pressure DeepInfraLower input-token pressure Together AILower input-token pressure OpenRouterLower input-token pressure

Short answer

LLM API pricing is only useful when tied to your workload. The cheapest token rate is not automatically the lowest total cost once output length, retries, routing, and support burden are included. DeepSeek-V4-Flash and DeepSeek-V4-Pro are also a reminder that official APIs can now enter the same low-cost conversation that many teams used to reserve for open-model platforms.

Open calculator Read pricing guide

Provider Pricing Table

Use this as a starting point; verify final rates on official pricing pages.

Provider	Category	Supported models	OpenAI-compatible	Starting price	Context	Tool calling	Vision	Streaming	Status	Trust	Links
OpenAI	Official APIs	GPT, reasoning models, embeddings, image	Yes	Budget to premium GPT tiers	Short to very long, model based	Yes	Yes	Yes	Available	12/15	Review Docs Compare
Anthropic	Official APIs	Claude, Claude Haiku, Claude Sonnet, Claude Opus	No	Mid to premium Claude tiers	Long context options	Yes	Yes	Yes	Available	10/15	Review Docs Compare
Google Gemini	Official APIs	Gemini, embedding models, multimodal models	Yes	Low-cost flash to premium tiers	Short to million-token-class options	Yes	Yes	Yes	Available	11/15	Review Docs Compare
Mistral AI	Official APIs	Mistral, Mixtral, Codestral, embeddings	Yes	Open and premium model tiers	Short to long, model based	Yes	No	Yes	Available	11/15	Review Docs
DeepSeek	Official APIs	DeepSeek-V4-Flash, DeepSeek-V4-Pro	Yes	Low-cost flash to discounted pro tiers	1M context, up to 384K output	Yes	No	Yes	Available	11/15	Review Docs Compare
xAI	Official APIs	Grok	Yes	Frontier-model pricing tiers	Mid to long, model based	Yes	Yes	Yes	Available	11/15	Review Docs
Cohere	Official APIs	Command, Embed, Rerank	No	Enterprise and task-specific tiers	Task and model based	Yes	No	Yes	Available	10/15	Review Docs
Together AI	Inference Providers	Llama, Qwen, DeepSeek-V4, Mistral	Yes	Often competitive for open models	Broad open-model range	No	Yes	Yes	Available	11/15	Review Docs Compare
Fireworks AI	Inference Providers	Llama, Qwen, DeepSeek-V4, Mistral	Yes	Competitive serverless tiers for open models	Broad open-model range, model specific	No	Yes	Yes	Available	11/15	Review Docs
Groq	Inference Providers	Llama, Mixtral, Gemma, Whisper-like speech models	Yes	Speed-oriented model tiers	Selected fast-serving model range, model specific	Yes	No	Yes	Available	11/15	Review Docs Compare
DeepInfra	Inference Providers	Llama, Qwen, DeepSeek-V4, Mistral	Yes	Often low for open models	Broad open-model range, model specific	No	Yes	Yes	Available	10/15	Review Docs Compare
Replicate	Inference Providers	open models, image models, audio models, video models	No	Runtime dependent	Model dependent	No	Yes	Yes	Available	10/15	Review Docs
Baseten	Inference Providers	custom models, open models	No	Deployment dependent	Model dependent	No	Yes	Yes	Available	10/15	Review Docs
OpenRouter	LLM API Aggregators	GPT, Claude, Gemini, DeepSeek-V4	Yes	Varies by model route	Model dependent across upstream routes	No	Yes	Yes	Available	11/15	Review Docs Compare
Portkey	LLM API Aggregators	GPT, Claude, Gemini, DeepSeek-V4	Yes	Plan dependent plus provider spend	Provider dependent	No	Yes	Yes	Available	11/15	Review Docs
LiteLLM Cloud	LLM API Aggregators	GPT, Claude, Gemini, Mistral	Yes	Plan dependent	Provider dependent	No	Yes	Yes	Unclear	10/15	Review Docs
Helicone	LLM API Aggregators	provider dependent	Yes	Plan dependent	Provider dependent	No	Yes	Yes	Available	11/15	Review Docs
Perplexity API	OpenAI-Compatible APIs	Sonar, online models	Yes	Varies by model	Model dependent	No	No	Yes	Available	10/15	Review Docs
Novita AI	OpenAI-Compatible APIs	Llama, Qwen, DeepSeek-V4, image models	Yes	Varies by model	Model dependent	No	Yes	Yes	Unclear	10/15	Review Docs
AI/ML API	OpenAI-Compatible APIs	GPT-style models, Claude-style access, Gemini-style access, open models	Yes	Varies by model	Model dependent	No	Yes	Yes	Unclear	9/15	Review Docs
Anyscale Endpoints	Inference Providers	open models, custom deployments	Yes	Unclear	Model dependent	No	No	Yes	Unclear	10/15	Review Docs
Voyage AI	Official APIs	embeddings, rerankers	No	Varies by model	Model dependent	No	No	No	Unclear	9/15	Review Docs

How to use this pricing page

Turn broad pricing research into a more realistic buying decision.

Workflow

Estimate real monthly spend

Start with request count, input tokens, output tokens, and cache assumptions instead of relying on a single token-rate headline.

Use cost modeling before comparing vendors

Workflow

Understand what actually drives cost

Learn how output length, repeated context, routing, and retries can matter more than the homepage price table.

Read the pricing guide first

Workflow

Shortlist cheaper providers carefully

Compare lower-cost routes only after checking quality, support, and trust posture for your real workload.

Move from price table to shortlist

Input vs output tokens

Most APIs charge separately for tokens you send and tokens the model generates. Output tokens are often more expensive.

Official API vs aggregator pricing

Aggregators can simplify access and routing, but pricing may include upstream model costs, routing choices, and platform fees. DeepSeek-V4 is a useful benchmark here because it shows how an official API can sometimes compete directly with cheaper marketplace assumptions.