What Is an LLM Endpoint?

Last updated 2026-05-13. Pricing, model names, and provider policies change frequently.

Quick answer

An LLM endpoint is the API address your application calls to send prompts, files, tool requests, or embeddings work to a language model provider. In practice, choosing an endpoint means choosing a provider, model catalog, API format, pricing model, reliability posture, and data handling policy.

Browse the directory Open endpoint finder

Use this guide when

You are new to LLM infrastructure

Use this guide when terms like provider, model, endpoint, gateway, and compatibility are still blending together.

You are reading provider marketing pages

This guide helps when several providers sound similar on the homepage and you need a clearer mental model before comparing price, speed, or trust.

You are planning a migration

It is especially useful before moving from one provider to another, because migration problems often come from misunderstanding what layer is actually changing.

Endpoint vs model vs provider

Developers often use these terms interchangeably, but they are different layers. The model is the system that generates or embeds content. The provider operates the API, billing, limits, documentation, and support. The endpoint is the network interface your code calls.

Why the endpoint choice matters

Two endpoints can expose similar models but behave differently under load, errors, streaming, tool calls, structured output, and rate limits. A clean demo can hide production details that matter once real users create bursty traffic.

Where OpenAI compatibility fits

An OpenAI-compatible endpoint usually means the provider accepts an OpenAI-style request shape. It can reduce migration work, but it does not guarantee identical model behavior, error codes, feature support, or safety settings.

Example decision paths

Replacing the endpoint but not the app shape

A team using OpenAI chat completions might switch to an OpenAI-compatible provider like OpenRouter or Together AI. The endpoint and billing path change first, while the application code changes less.

Changing the provider because the workflow changed

A startup may begin on one official API, then add Groq for faster real-time chat or DeepInfra for cheaper open-model experiments once the product and traffic pattern become clearer.

Adding a gateway instead of changing the model vendor

Sometimes the real issue is not the model itself but fallback, observability, and governance. In that case, a gateway like Portkey can matter more than swapping one model brand for another.

Provider examples to compare

Provider	Category	Supported models	OpenAI-compatible	Starting price	Context	Tool calling	Vision	Streaming	Status	Trust	Links
OpenAI	Official APIs	GPT, reasoning models, embeddings, image	Yes	Budget to premium GPT tiers	Short to very long, model based	Yes	Yes	Yes	Available	12/15	Review Docs Compare
Anthropic	Official APIs	Claude, Claude Haiku, Claude Sonnet, Claude Opus	No	Mid to premium Claude tiers	Long context options	Yes	Yes	Yes	Available	10/15	Review Docs Compare
Google Gemini	Official APIs	Gemini, embedding models, multimodal models	Yes	Low-cost flash to premium tiers	Short to million-token-class options	Yes	Yes	Yes	Available	11/15	Review Docs Compare
OpenRouter	LLM API Aggregators	GPT, Claude, Gemini, DeepSeek-V4	Yes	Varies by model route	Model dependent across upstream routes	No	Yes	Yes	Available	11/15	Review Docs Compare
Together AI	Inference Providers	Llama, Qwen, DeepSeek-V4, Mistral	Yes	Often competitive for open models	Broad open-model range	No	Yes	Yes	Available	11/15	Review Docs Compare
Groq	Inference Providers	Llama, Mixtral, Gemma, Whisper-like speech models	Yes	Speed-oriented model tiers	Selected fast-serving model range, model specific	Yes	No	Yes	Available	11/15	Review Docs Compare

Open directory Use endpoint finder

Compare next

Official APIs vs OpenAI-compatible providersBest next read if you already understand the basic terms

OpenAI-compatible category pageUse this if migration-friendly endpoints are the main goal

Provider directoryUse this if you are ready to scan actual vendors

Checklist

Identify the exact model and endpoint your app will call.
Check whether streaming, tool calling, structured output, vision, and embeddings are supported.
Review pricing, rate limits, status page, support channel, terms, and privacy policy.
Run a small eval set before sending production traffic.

Recommended next step

Use the directory to compare endpoint categories, then use the finder to shortlist providers for your use case.

Browse the directory Open endpoint finder

FAQ

Is an LLM endpoint the same as an LLM API?

Usually, yes in everyday developer language. More precisely, the API is the interface contract and the endpoint is the URL your application calls.

Can one provider offer multiple endpoints?

Yes. A provider may expose chat, embeddings, images, audio, batch, realtime, and OpenAI-compatible endpoints.

Should I start with the cheapest endpoint?

Only if your evals, latency, support, and transparency requirements still pass. Cheapest is not the same as production-ready.