Replicate models & pricing

Replicate hosts 40 models (40 with public pricing) covering 1 modalities. Hosted inference for Llama, FLUX, Stable Diffusion, Whisper, and community models. Cheapest input starts at $0.0300/M tokens; the most premium goes up to $15.00/M. Use Future AGI's Agent Command Center to route any Replicate model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗
40

chat 40

Model Input / 1M Output / 1M Context Caps
IBM Granite Granite 3.3 8B Instruct $0.0300/M $0.250/M tools
Meta Llama 2.7B $0.0500/M $0.250/M 4,096
Meta Llama 2.7B Chat $0.0500/M $0.250/M 4,096
Meta Llama 3.8B $0.0500/M $0.250/M 8,086
Meta Llama 3.8B Instruct $0.0500/M $0.250/M 8,086
Mistralai Mistral 7B Instruct v0.2 $0.0500/M $0.250/M 4,096
Mistralai Mistral 7B v0.1 $0.0500/M $0.250/M 4,096
OpenAI GPT 5 nano $0.0500/M $0.400/M tools
GPT Oss 20B $0.0900/M $0.360/M tools
Meta Llama 2.13B $0.1000/M $0.500/M 4,096
Meta Llama 2.13B Chat $0.1000/M $0.500/M 4,096
OpenAI GPT 4.1 nano $0.1000/M $0.400/M tools
OpenAI GPT 4o mini $0.150/M $0.600/M tools · vision
OpenAI GPT Oss 120B $0.180/M $0.720/M tools
OpenAI GPT 5 mini $0.250/M $2.00/M tools · vision
Qwen Qwen3.235b A22b Instruct 2507 $0.264/M $1.06/M tools
Mistralai Mixtral 8×7B Instruct v0.1 $0.300/M $1.00/M 4,096
OpenAI GPT 4.1 mini $0.400/M $1.60/M tools · vision
Meta Llama 2.70B $0.650/M $2.75/M 4,096
Meta Llama 2.70B Chat $0.650/M $2.75/M 4,096
Meta Llama 3.70B $0.650/M $2.75/M 8,192
Meta Llama 3.70B Instruct $0.650/M $2.75/M 8,192
DeepSeek AI DeepSeek v3.1 $0.672/M $2.02/M 163,840 tools · reasoning
Anthropic Claude 3.5 Haiku $1.00/M $5.00/M tools · vision · cache
Anthropic Claude 4.5 Haiku $1.00/M $5.00/M tools · vision · cache
OpenAI o4 mini $1.00/M $4.00/M reasoning
OpenAI o1 mini $1.10/M $4.40/M reasoning
OpenAI GPT 5 $1.25/M $10.00/M tools · vision
DeepSeek AI DeepSeek v3 $1.45/M $1.45/M 65,536 tools
Google Gemini 3 Pro $2.00/M $12.00/M tools · vision
OpenAI GPT 4.1 $2.00/M $8.00/M tools · vision
Google Gemini 2.5 Flash $2.50/M $2.50/M tools · vision
OpenAI GPT 4o $2.50/M $10.00/M tools · vision · audio
Anthropic Claude 3.7 Sonnet $3.00/M $15.00/M tools · vision · cache
Anthropic Claude 4.5 Sonnet $3.00/M $15.00/M tools · vision · cache
Anthropic Claude 4 Sonnet $3.00/M $15.00/M tools · vision · cache
Anthropic Claude 3.5 Sonnet $3.75/M $18.75/M tools · vision · cache
DeepSeek AI DeepSeek R1 $3.75/M $10.00/M 65,536 reasoning
Xai Grok 4 $7.20/M $36.00/M tools
OpenAI o1 $15.00/M $60.00/M reasoning

FAQ

How many Replicate models are there?

40 Replicate models are listed across 1 modality on this page. 40 have public per-token pricing.

How is Replicate pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which Replicate model is cheapest?

Input pricing on Replicate starts at $0.0300 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to Replicate via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a Replicate target, and call Replicate models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any Replicate model via Agent Command Center →
OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.