LLM directory · Pricing · Benchmarks

LLM cost calculator and model directory

Per-token pricing, context windows, capabilities, and public benchmark scores for 2,518 models across 54 providers — chat, embeddings, image, audio, video, and rerank. Pricing verified weekly. Submit edits in one click. Route any model via Future AGI's Agent Command Center for unified caching, fallback, and observability.

Get the gateway free → GitHub ↗
Models
2,518
With pricing
2,183
Providers
54

Browse by provider

OpenAI
192
GPT-4o, GPT-4, o-series reasoning models, embeddings, audio, image, and Realtime API.
Anthropic
27
Claude family of frontier reasoning models with industry-leading context length and tool use.
Google AI
93
Gemini chat, embeddings, image and multimodal models served via the Google AI API.
Google Vertex AI
194
Enterprise GCP endpoint for Gemini, Anthropic, Meta, and Mistral models with VPC-SC and IAM.
Amazon Bedrock
382
AWS-managed endpoints for Anthropic, Meta, Mistral, Cohere, and Amazon Nova models.
Azure OpenAI
201
OpenAI models on Azure with regional endpoints, RBAC, and content-filtering policies.
Azure AI Foundry
76
Azure marketplace catalog for Mistral, Cohere, Meta, Phi, and other partner models.
Mistral AI
51
European frontier lab — Mistral Large, Codestral, Pixtral, Mistral Small.
Cohere
22
Command R+ chat, Embed v3 multilingual embeddings, Rerank, and enterprise RAG models.
Groq
14
LPU inference for Llama, Mixtral, Whisper — sub-second TTFT at high throughput.
Cerebras
7
Wafer-scale inference for Llama and Qwen models — among the fastest tokens/sec available.
xAI
38
Grok family of frontier models with native real-time search and tool use.
Perplexity
46
Sonar models with grounded web search and citation built-in.
Fireworks AI
268
Fast inference for Llama, Qwen, DeepSeek, Mixtral and 200+ open-source models.
Together AI
42
Open-model inference cloud — Llama, DeepSeek, Qwen, Mixtral, FLUX with serverless and dedicated endpoints.
DeepInfra
67
GPU inference for Llama, Qwen, DeepSeek, FLUX with per-token pricing.
Replicate
40
Hosted inference for Llama, FLUX, Stable Diffusion, Whisper, and community models.
OpenRouter
92
Unified gateway across 300+ models from 60+ providers with a single API key.
Novita AI
85
Pay-as-you-go GPU inference for Llama, Qwen, FLUX, and image/video models.
Databricks
28
Mosaic AI Model Serving — Llama, DBRX, Claude on Databricks with Unity Catalog governance.
Snowflake Cortex
24
In-warehouse LLM functions — Arctic, Llama, Mistral, with data residency in Snowflake.
IBM watsonx
29
Enterprise foundation models — Granite, Llama, Mistral with AI governance.
Alibaba DashScope
34
Qwen family of frontier models — Qwen-Max, Qwen-VL, Qwen-Audio.
Moonshot AI
22
Kimi K2 series — long-context Chinese-leading frontier models.
DeepSeek
6
DeepSeek-R1 reasoning model and DeepSeek-V3 MoE — open-weight frontier-class inference.
Deepgram
36
Nova/Aura speech-to-text and TTS — production-grade audio AI.
AssemblyAI
2
Universal-2 speech-to-text with diarization, sentiment, and entity detection.
ElevenLabs
4
Multilingual text-to-speech and voice cloning — production voice AI.
Stability AI
23
Stable Diffusion, Stable Image Ultra, Stable Audio for image and audio generation.
Lambda
20
Lambda Inference API — Llama, DeepSeek, Qwen on dedicated GPU clusters.
Nebius
30
European GPU cloud with Inference Studio for Llama, Qwen, DeepSeek, FLUX.
AI21 Labs
12
Jamba models — hybrid Mamba-Transformer with 256K context.
Voyage AI
19
voyage-3 family of state-of-the-art retrieval embeddings and rerankers.
Jina AI
1
Multilingual embeddings, rerankers, and Reader API for grounding.
GitHub Copilot
31
GitHub Copilot Chat completions — Claude, GPT, and Gemini routed through Copilot.
SambaNova
17
Reconfigurable Dataflow Unit inference for Llama, DeepSeek, Qwen.
Hyperbolic
16
Decentralized GPU inference for Llama, Qwen, DeepSeek, and FLUX.
Nscale
16
Nordic GPU cloud with serverless inference for Llama and DeepSeek.
OVHcloud AI
15
EU-sovereign AI endpoints — Llama, Mistral, Qwen, Mixtral.
W&B Inference
16
Weights & Biases serverless inference for Llama and DeepSeek.
DigitalOcean Gradient
13
DigitalOcean Gradient AI Platform — managed inference for open-source models.
Oracle Cloud
37
Oracle Cloud Generative AI — Llama, Cohere, on OCI.
Anyscale
12
Ray-powered serverless inference for Llama, Mistral, and Qwen.
fal.ai
12
Real-time inference for FLUX, Stable Diffusion, Whisper, and audio models.
Z.ai
11
GLM family of frontier Chinese-leading reasoning and chat models.
MiniMax
9
MiniMax M2 reasoning, abab chat, and speech models.
Black Forest Labs
8
FLUX family — state-of-the-art image generation and editing.
Volcano Engine
12
ByteDance cloud — Doubao chat, image, and audio models.
Aleph Alpha
6
European sovereign AI — Pharia and Luminous foundation models.
AIMLAPI
12
Unified API for 200+ AI models — chat, image, audio, video.
LlamaGate
16
Llama-family inference gateway with OpenAI-compatible endpoints.
GMI Cloud
17
Asia-Pacific GPU cloud with serverless inference for Llama and Qwen.
PublicAI
9
Open inference network — Llama, DeepSeek, Qwen with public-good routing.
GigaChat
6
Sber GigaChat — Russian-leading frontier chat models.

Browse by category

One-click into the directory above, scoped to a single modality.

Highest-ranked chat models on Chatbot Arena

Sorted by Chatbot Arena ELO (real human pairwise preference). Open the full directory →

Model Provider Input / 1M Output / 1M Context
Claude Opus 4.6 Azure AI Foundry $5.00/M $25.00/M 200,000
Gemini 3.1 Pro preview Google Vertex AI $2.00/M $12.00/M 1,048,576
Claude Opus 4.7 Azure AI Foundry $5.00/M $25.00/M 200,000
Gemini 3 Pro Preview Google Vertex AI $2.00/M $12.00/M 1,048,576
GPT 5.5 OpenAI $5.00/M $30.00/M 1,050,000
GPT 5.2 Azure OpenAI $1.75/M $14.00/M 272,000
GPT 5.2 Chat latest OpenAI $1.75/M $14.00/M 128,000
GPT 5.4 Azure OpenAI $2.50/M $15.00/M 1,050,000
Grok 4.20 beta 0309 Reasoning xAI $2.00/M $6.00/M 2,000,000
Grok 4.20 Multi Agent beta 0309 xAI $2.00/M $6.00/M 2,000,000
Claude Opus 4.5 Azure AI Foundry $5.00/M $25.00/M 200,000
Claude Sonnet 4.6 Azure AI Foundry $3.00/M $15.00/M 1,000,000
Kimi K2.6 Moonshot AI $0.950/M $4.00/M 262,144
Grok 4 xAI $3.00/M $15.00/M 256,000
GPT 5.4 mini Azure OpenAI $0.750/M $4.50/M 1,050,000

FAQ

How many models does the directory cover?

2,518 models across 54+ providers — chat, embeddings, image generation, image edit, speech-to-text, text-to-speech, video generation, rerank, moderation, OCR, and realtime audio. Maxim and most competitors only cover chat models on a handful of providers.

How often is pricing updated?

Pricing is verified weekly from litellm + models.dev + OpenRouter, with the per-model "Last verified" timestamp shown on every page. If you spot an outdated rate, click "suggest edit" on the row — submissions land in our review queue and roll out on the next refresh.

How accurate are these prices?

We aggregate from public, version-controlled sources (BerriAI/litellm, models.dev, OpenRouter) and pull the most-recently-published number. We do NOT contractually guarantee accuracy — for production billing decisions, always confirm against the provider's own pricing page (linked on every model page).

Do you cover deprecated and upcoming models?

Yes — every model carries a `deprecation_date` when one is published by the provider. Models retiring in the next 180 days show a countdown badge; deprecated models still render with a clear notice and migration recommendations to a current alternative.

Can I get pricing for a model that isn't listed?

Click "Request a model" on the hub — submissions land in a public queue, ranked by community votes. Models with active demand get added within a few days. No sign-up required.

How do I route across providers without changing my code?

Use Future AGI's Agent Command Center — an open-source OpenAI-compatible gateway. One config supports cost-optimized routing, latency-aware retries, model fallback, shadow / mirror traffic, exact + semantic caching, and 18 built-in guardrail scanners. Free for the first 100K requests/month. The "Production recipe" block on every model page has a ready-to-paste config.

Route every model via Agent Command Center →
One OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.