LLM directory · Pricing · Benchmarks

LLM cost calculator and model directory

Per-token pricing, context windows, capabilities, and public benchmark scores for 2,518 models across 54 providers — chat, embeddings, image, audio, video, and rerank. Pricing verified weekly. Submit edits in one click. Route any model via Future AGI's Agent Command Center for unified caching, fallback, and observability.

Get the gateway free → GitHub ↗

Models

2,518

With pricing

2,183

Providers

Loading 2,400+ models…

Open the full-screen directory →

Browse by provider

OpenAI

192

GPT-4o, GPT-4, o-series reasoning models, embeddings, audio, image, and Realtime API.

Anthropic

Claude family of frontier reasoning models with industry-leading context length and tool use.

Google AI

Gemini chat, embeddings, image and multimodal models served via the Google AI API.

Google Vertex AI

194

Enterprise GCP endpoint for Gemini, Anthropic, Meta, and Mistral models with VPC-SC and IAM.

Amazon Bedrock

382

AWS-managed endpoints for Anthropic, Meta, Mistral, Cohere, and Amazon Nova models.

Azure OpenAI

201

OpenAI models on Azure with regional endpoints, RBAC, and content-filtering policies.

Azure AI Foundry

Azure marketplace catalog for Mistral, Cohere, Meta, Phi, and other partner models.

Mistral AI

European frontier lab — Mistral Large, Codestral, Pixtral, Mistral Small.

Cohere

Command R+ chat, Embed v3 multilingual embeddings, Rerank, and enterprise RAG models.

Groq

LPU inference for Llama, Mixtral, Whisper — sub-second TTFT at high throughput.

Cerebras

Wafer-scale inference for Llama and Qwen models — among the fastest tokens/sec available.

xAI

Grok family of frontier models with native real-time search and tool use.

Perplexity

Sonar models with grounded web search and citation built-in.

Fireworks AI

268

Fast inference for Llama, Qwen, DeepSeek, Mixtral and 200+ open-source models.

Together AI

Open-model inference cloud — Llama, DeepSeek, Qwen, Mixtral, FLUX with serverless and dedicated endpoints.

DeepInfra

GPU inference for Llama, Qwen, DeepSeek, FLUX with per-token pricing.

Replicate

Hosted inference for Llama, FLUX, Stable Diffusion, Whisper, and community models.

OpenRouter

Unified gateway across 300+ models from 60+ providers with a single API key.

Novita AI

Pay-as-you-go GPU inference for Llama, Qwen, FLUX, and image/video models.

Databricks

Mosaic AI Model Serving — Llama, DBRX, Claude on Databricks with Unity Catalog governance.

Snowflake Cortex

In-warehouse LLM functions — Arctic, Llama, Mistral, with data residency in Snowflake.

IBM watsonx

Enterprise foundation models — Granite, Llama, Mistral with AI governance.

Alibaba DashScope

Qwen family of frontier models — Qwen-Max, Qwen-VL, Qwen-Audio.

Moonshot AI

Kimi K2 series — long-context Chinese-leading frontier models.

DeepSeek

DeepSeek-R1 reasoning model and DeepSeek-V3 MoE — open-weight frontier-class inference.

Deepgram

Nova/Aura speech-to-text and TTS — production-grade audio AI.

AssemblyAI

Universal-2 speech-to-text with diarization, sentiment, and entity detection.

ElevenLabs

Multilingual text-to-speech and voice cloning — production voice AI.

Stability AI

Stable Diffusion, Stable Image Ultra, Stable Audio for image and audio generation.

Lambda

Lambda Inference API — Llama, DeepSeek, Qwen on dedicated GPU clusters.

Nebius

European GPU cloud with Inference Studio for Llama, Qwen, DeepSeek, FLUX.

AI21 Labs

Jamba models — hybrid Mamba-Transformer with 256K context.

Voyage AI

voyage-3 family of state-of-the-art retrieval embeddings and rerankers.

Jina AI

Multilingual embeddings, rerankers, and Reader API for grounding.

GitHub Copilot

GitHub Copilot Chat completions — Claude, GPT, and Gemini routed through Copilot.

SambaNova

Reconfigurable Dataflow Unit inference for Llama, DeepSeek, Qwen.

Hyperbolic

Decentralized GPU inference for Llama, Qwen, DeepSeek, and FLUX.

Nscale

Nordic GPU cloud with serverless inference for Llama and DeepSeek.

OVHcloud AI

EU-sovereign AI endpoints — Llama, Mistral, Qwen, Mixtral.

W&B Inference

Weights & Biases serverless inference for Llama and DeepSeek.

DigitalOcean Gradient

DigitalOcean Gradient AI Platform — managed inference for open-source models.

Oracle Cloud

Oracle Cloud Generative AI — Llama, Cohere, on OCI.

Anyscale

Ray-powered serverless inference for Llama, Mistral, and Qwen.

fal.ai

Real-time inference for FLUX, Stable Diffusion, Whisper, and audio models.

Z.ai

GLM family of frontier Chinese-leading reasoning and chat models.

MiniMax

MiniMax M2 reasoning, abab chat, and speech models.

Black Forest Labs

FLUX family — state-of-the-art image generation and editing.

Volcano Engine

ByteDance cloud — Doubao chat, image, and audio models.

Aleph Alpha

European sovereign AI — Pharia and Luminous foundation models.

AIMLAPI

Unified API for 200+ AI models — chat, image, audio, video.

LlamaGate

Llama-family inference gateway with OpenAI-compatible endpoints.

GMI Cloud

Asia-Pacific GPU cloud with serverless inference for Llama and Qwen.

PublicAI

Open inference network — Llama, DeepSeek, Qwen with public-good routing.

GigaChat

Sber GigaChat — Russian-leading frontier chat models.

Browse by category

One-click into the directory above, scoped to a single modality.

Chat & reasoning

1977

Conversational and instruction-following models.

1821 priced · browse all →

Embeddings

114

Vector-encoding models for retrieval, clustering, and semantic search.

100 priced · browse all →

Image generation

149

Text-to-image and diffusion models.

113 priced · browse all →

Image edit

Inpainting, editing, and image-to-image.

31 priced · browse all →

Speech to text

Audio transcription and diarization.

8 priced · browse all →

Text to speech

Voice synthesis and TTS models.

5 priced · browse all →

Video generation

Text-to-video and image-to-video models.

0 priced · browse all →

Rerank

Cross-encoder rerankers for RAG pipelines.

19 priced · browse all →

Moderation

Content classification and policy enforcement.

0 priced · browse all →

OCR

Document and image text extraction.

1 priced · browse all →

Realtime audio

Streaming voice + tool-use over WebSocket.

2 priced · browse all →

Highest-ranked chat models on Chatbot Arena

Sorted by Chatbot Arena ELO (real human pairwise preference). Open the full directory →

Model	Provider	Input / 1M	Output / 1M	Context
Claude Opus 4.6	Azure AI Foundry	$5.00/M	$25.00/M	200,000
Gemini 3.1 Pro preview	Google Vertex AI	$2.00/M	$12.00/M	1,048,576
Claude Opus 4.7	Azure AI Foundry	$5.00/M	$25.00/M	200,000
Gemini 3 Pro Preview	Google Vertex AI	$2.00/M	$12.00/M	1,048,576
GPT 5.5	OpenAI	$5.00/M	$30.00/M	1,050,000
GPT 5.2	Azure OpenAI	$1.75/M	$14.00/M	272,000
GPT 5.2 Chat latest	OpenAI	$1.75/M	$14.00/M	128,000
GPT 5.4	Azure OpenAI	$2.50/M	$15.00/M	1,050,000
Grok 4.20 beta 0309 Reasoning	xAI	$2.00/M	$6.00/M	2,000,000
Grok 4.20 Multi Agent beta 0309	xAI	$2.00/M	$6.00/M	2,000,000
Claude Opus 4.5	Azure AI Foundry	$5.00/M	$25.00/M	200,000
Claude Sonnet 4.6	Azure AI Foundry	$3.00/M	$15.00/M	1,000,000
Kimi K2.6	Moonshot AI	$0.950/M	$4.00/M	262,144
Grok 4	xAI	$3.00/M	$15.00/M	256,000
GPT 5.4 mini	Azure OpenAI	$0.750/M	$4.50/M	1,050,000

FAQ

How many models does the directory cover?

2,518 models across 54+ providers — chat, embeddings, image generation, image edit, speech-to-text, text-to-speech, video generation, rerank, moderation, OCR, and realtime audio. Maxim and most competitors only cover chat models on a handful of providers.

How often is pricing updated?

Pricing is verified weekly from litellm + models.dev + OpenRouter, with the per-model "Last verified" timestamp shown on every page. If you spot an outdated rate, click "suggest edit" on the row — submissions land in our review queue and roll out on the next refresh.

How accurate are these prices?

We aggregate from public, version-controlled sources (BerriAI/litellm, models.dev, OpenRouter) and pull the most-recently-published number. We do NOT contractually guarantee accuracy — for production billing decisions, always confirm against the provider's own pricing page (linked on every model page).

Do you cover deprecated and upcoming models?

Yes — every model carries a `deprecation_date` when one is published by the provider. Models retiring in the next 180 days show a countdown badge; deprecated models still render with a clear notice and migration recommendations to a current alternative.

Can I get pricing for a model that isn't listed?

Click "Request a model" on the hub — submissions land in a public queue, ranked by community votes. Models with active demand get added within a few days. No sign-up required.

How do I route across providers without changing my code?

Use Future AGI's Agent Command Center — an open-source OpenAI-compatible gateway. One config supports cost-optimized routing, latency-aware retries, model fallback, shadow / mirror traffic, exact + semantic caching, and 18 built-in guardrail scanners. Free for the first 100K requests/month. The "Production recipe" block on every model page has a ready-to-paste config.

Route every model via Agent Command Center →

One OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.