Last updated 2026-05-13. Pricing, model names, and provider policies change frequently.
Quick answer
An LLM endpoint is the API address your application calls to send prompts, files, tool requests, or embeddings work to a language model provider. In practice, choosing an endpoint means choosing a provider, model catalog, API format, pricing model, reliability posture, and data handling policy.
Use this guide when
You are new to LLM infrastructure
Use this guide when terms like provider, model, endpoint, gateway, and compatibility are still blending together.
You are reading provider marketing pages
This guide helps when several providers sound similar on the homepage and you need a clearer mental model before comparing price, speed, or trust.
You are planning a migration
It is especially useful before moving from one provider to another, because migration problems often come from misunderstanding what layer is actually changing.
Endpoint vs model vs provider
Developers often use these terms interchangeably, but they are different layers. The model is the system that generates or embeds content. The provider operates the API, billing, limits, documentation, and support. The endpoint is the network interface your code calls.
Why the endpoint choice matters
Two endpoints can expose similar models but behave differently under load, errors, streaming, tool calls, structured output, and rate limits. A clean demo can hide production details that matter once real users create bursty traffic.
Where OpenAI compatibility fits
An OpenAI-compatible endpoint usually means the provider accepts an OpenAI-style request shape. It can reduce migration work, but it does not guarantee identical model behavior, error codes, feature support, or safety settings.
Example decision paths
Replacing the endpoint but not the app shape
A team using OpenAI chat completions might switch to an OpenAI-compatible provider like OpenRouter or Together AI. The endpoint and billing path change first, while the application code changes less.
Changing the provider because the workflow changed
A startup may begin on one official API, then add Groq for faster real-time chat or DeepInfra for cheaper open-model experiments once the product and traffic pattern become clearer.
Adding a gateway instead of changing the model vendor
Sometimes the real issue is not the model itself but fallback, observability, and governance. In that case, a gateway like Portkey can matter more than swapping one model brand for another.
Provider examples to compare
| Provider | Category | Supported models | OpenAI-compatible | Starting price | Context | Tool calling | Vision | Streaming | Status | Trust | Links |
|---|---|---|---|---|---|---|---|---|---|---|---|
| OpenAI | Official APIs | GPT, reasoning models, embeddings, image | Yes | Budget to premium GPT tiers | Short to very long, model based | Yes | Yes | Yes | Available | 12/15 | |
| Anthropic | Official APIs | Claude, Claude Haiku, Claude Sonnet, Claude Opus | No | Mid to premium Claude tiers | Long context options | Yes | Yes | Yes | Available | 10/15 | |
| Google Gemini | Official APIs | Gemini, embedding models, multimodal models | Yes | Low-cost flash to premium tiers | Short to million-token-class options | Yes | Yes | Yes | Available | 11/15 | |
| OpenRouter | LLM API Aggregators | GPT, Claude, Gemini, DeepSeek-V4 | Yes | Varies by model route | Model dependent across upstream routes | No | Yes | Yes | Available | 11/15 | |
| Together AI | Inference Providers | Llama, Qwen, DeepSeek-V4, Mistral | Yes | Often competitive for open models | Broad open-model range | No | Yes | Yes | Available | 11/15 | |
| Groq | Inference Providers | Llama, Mixtral, Gemma, Whisper-like speech models | Yes | Speed-oriented model tiers | Selected fast-serving model range, model specific | Yes | No | Yes | Available | 11/15 |
Compare next
Checklist
- Identify the exact model and endpoint your app will call.
- Check whether streaming, tool calling, structured output, vision, and embeddings are supported.
- Review pricing, rate limits, status page, support channel, terms, and privacy policy.
- Run a small eval set before sending production traffic.
Recommended next step
Use the directory to compare endpoint categories, then use the finder to shortlist providers for your use case.
FAQ
Is an LLM endpoint the same as an LLM API?
Usually, yes in everyday developer language. More precisely, the API is the interface contract and the endpoint is the URL your application calls.
Can one provider offer multiple endpoints?
Yes. A provider may expose chat, embeddings, images, audio, batch, realtime, and OpenAI-compatible endpoints.
Should I start with the cheapest endpoint?
Only if your evals, latency, support, and transparency requirements still pass. Cheapest is not the same as production-ready.