LLM API Pricing Explained

Last updated 2026-05-13. Pricing, model names, and provider policies change frequently.

Quick answer

LLM API pricing is usually driven by input tokens, output tokens, model tier, request volume, caching, and any platform or routing fees. Output tokens often cost more than input tokens, so verbose answers can be the biggest cost driver.

Open cost calculator Pricing comparison

Use this guide when

You need a first budget before launch

Use this guide when you need to translate vague product usage into a range that finance, founders, or procurement can react to.

Token tables are not making the decision clearer

This article helps when provider price pages look precise but still do not tell you what your monthly bill will actually look like.

You are comparing official APIs with routers or cheap open-model routes

It is especially useful when you are weighing direct provider cost against marketplace, routing, or open-model tradeoffs.

Token pricing basics

Most providers charge per million input tokens and per million output tokens. Input tokens include your system prompt, user message, retrieved context, tool schemas, and conversation history. Output tokens are what the model generates.

Why monthly cost surprises teams

Costs grow when prompts include long retrieved documents, chat history is replayed every turn, agents call tools repeatedly, or users generate longer outputs than expected. Request count alone is not enough to estimate spend.

Aggregator and gateway fees

Aggregators can simplify provider access, but you need to understand whether pricing is pass-through, marked up, credit-based, or plan-based. Always compare total cost, not only model token rates.

Example decision paths

Chatbot looks cheap until output grows

A product team may estimate a chatbot on short prompts, then discover the real cost driver is long assistant answers and replayed history across every conversation turn.

RAG system hides prompt cost

A retrieval app may focus on the model price and miss the fact that long retrieved context and repeated system instructions are adding most of the bill.

DeepSeek-V4 changes the cost baseline

DeepSeek-V4-Flash and the currently discounted DeepSeek-V4-Pro can make official-API pricing look very different from the older assumption that only third-party open-model routes are cheap.

Cheap provider still loses on total cost

A lower token rate can still become more expensive overall if retries, weak formatting, or longer responses create more product and engineering overhead.

Provider examples to compare

Provider	Category	Supported models	OpenAI-compatible	Starting price	Context	Tool calling	Vision	Streaming	Status	Trust	Links
OpenAI	Official APIs	GPT, reasoning models, embeddings, image	Yes	Budget to premium GPT tiers	Short to very long, model based	Yes	Yes	Yes	Available	12/15	Review Docs Compare
Anthropic	Official APIs	Claude, Claude Haiku, Claude Sonnet, Claude Opus	No	Mid to premium Claude tiers	Long context options	Yes	Yes	Yes	Available	10/15	Review Docs Compare
DeepSeek	Official APIs	DeepSeek-V4-Flash, DeepSeek-V4-Pro	Yes	Low-cost flash to discounted pro tiers	1M context, up to 384K output	Yes	No	Yes	Available	11/15	Review Docs Compare
Google Gemini	Official APIs	Gemini, embedding models, multimodal models	Yes	Low-cost flash to premium tiers	Short to million-token-class options	Yes	Yes	Yes	Available	11/15	Review Docs Compare
DeepInfra	Inference Providers	Llama, Qwen, DeepSeek-V4, Mistral	Yes	Often low for open models	Broad open-model range, model specific	No	Yes	Yes	Available	10/15	Review Docs Compare
OpenRouter	LLM API Aggregators	GPT, Claude, Gemini, DeepSeek-V4	Yes	Varies by model route	Model dependent across upstream routes	No	Yes	Yes	Available	11/15	Review Docs Compare

Open directory Use endpoint finder

Compare next

How to estimate LLM API costsBest if you want a simpler operational modeling flow

OpenAI vs DeepSeekBest if you are comparing premium default versus cheaper long-context official API pricing

Best cheap LLM API providersBest if cost reduction is already the main goal

Groq vs DeepInfraBest if you are deciding between speed and cheaper open-model breadth

Checklist

Estimate input and output tokens separately.
Include hidden prompt tokens: system prompts, tool schemas, retrieved context, and history.
Model peak usage, not just average daily traffic.
Review cache discounts, batch pricing, and volume commitments carefully.

Recommended next step

Use the calculator with realistic token sizes, then compare cheaper alternatives only if quality remains acceptable.

Open cost calculator Pricing comparison

FAQ

Why are output tokens often more expensive?

Generation consumes inference time and capacity. Providers commonly price generated tokens higher than prompt tokens.

Is price per million tokens enough to compare providers?

No. You also need quality, latency, retry rate, context needs, support, and platform fees.

How can I reduce cost without hurting quality?

Shorten prompts, cache repeated context, route simple tasks to cheaper models, and measure quality with evals.