Replicate models & pricing
Replicate hosts 40 models (40 with public pricing) covering 1 modalities. Hosted inference for Llama, FLUX, Stable Diffusion, Whisper, and community models. Cheapest input starts at $0.0300/M tokens; the most premium goes up to $15.00/M. Use Future AGI's Agent Command Center to route any Replicate model with cost-optimized fallback and unified observability.
chat 40
| Model ↕ | Input / 1M ↕ | Output / 1M ↕ | Context ↕ | Caps |
|---|---|---|---|---|
| IBM Granite Granite 3.3 8B Instruct | $0.0300/M | $0.250/M | — | tools |
| Meta Llama 2.7B | $0.0500/M | $0.250/M | 4,096 | |
| Meta Llama 2.7B Chat | $0.0500/M | $0.250/M | 4,096 | |
| Meta Llama 3.8B | $0.0500/M | $0.250/M | 8,086 | |
| Meta Llama 3.8B Instruct | $0.0500/M | $0.250/M | 8,086 | |
| Mistralai Mistral 7B Instruct v0.2 | $0.0500/M | $0.250/M | 4,096 | |
| Mistralai Mistral 7B v0.1 | $0.0500/M | $0.250/M | 4,096 | |
| OpenAI GPT 5 nano | $0.0500/M | $0.400/M | — | tools |
| GPT Oss 20B | $0.0900/M | $0.360/M | — | tools |
| Meta Llama 2.13B | $0.1000/M | $0.500/M | 4,096 | |
| Meta Llama 2.13B Chat | $0.1000/M | $0.500/M | 4,096 | |
| OpenAI GPT 4.1 nano | $0.1000/M | $0.400/M | — | tools |
| OpenAI GPT 4o mini | $0.150/M | $0.600/M | — | tools · vision |
| OpenAI GPT Oss 120B | $0.180/M | $0.720/M | — | tools |
| OpenAI GPT 5 mini | $0.250/M | $2.00/M | — | tools · vision |
| Qwen Qwen3.235b A22b Instruct 2507 | $0.264/M | $1.06/M | — | tools |
| Mistralai Mixtral 8×7B Instruct v0.1 | $0.300/M | $1.00/M | 4,096 | |
| OpenAI GPT 4.1 mini | $0.400/M | $1.60/M | — | tools · vision |
| Meta Llama 2.70B | $0.650/M | $2.75/M | 4,096 | |
| Meta Llama 2.70B Chat | $0.650/M | $2.75/M | 4,096 | |
| Meta Llama 3.70B | $0.650/M | $2.75/M | 8,192 | |
| Meta Llama 3.70B Instruct | $0.650/M | $2.75/M | 8,192 | |
| DeepSeek AI DeepSeek v3.1 | $0.672/M | $2.02/M | 163,840 | tools · reasoning |
| Anthropic Claude 3.5 Haiku | $1.00/M | $5.00/M | — | tools · vision · cache |
| Anthropic Claude 4.5 Haiku | $1.00/M | $5.00/M | — | tools · vision · cache |
| OpenAI o4 mini | $1.00/M | $4.00/M | — | reasoning |
| OpenAI o1 mini | $1.10/M | $4.40/M | — | reasoning |
| OpenAI GPT 5 | $1.25/M | $10.00/M | — | tools · vision |
| DeepSeek AI DeepSeek v3 | $1.45/M | $1.45/M | 65,536 | tools |
| Google Gemini 3 Pro | $2.00/M | $12.00/M | — | tools · vision |
| OpenAI GPT 4.1 | $2.00/M | $8.00/M | — | tools · vision |
| Google Gemini 2.5 Flash | $2.50/M | $2.50/M | — | tools · vision |
| OpenAI GPT 4o | $2.50/M | $10.00/M | — | tools · vision · audio |
| Anthropic Claude 3.7 Sonnet | $3.00/M | $15.00/M | — | tools · vision · cache |
| Anthropic Claude 4.5 Sonnet | $3.00/M | $15.00/M | — | tools · vision · cache |
| Anthropic Claude 4 Sonnet | $3.00/M | $15.00/M | — | tools · vision · cache |
| Anthropic Claude 3.5 Sonnet | $3.75/M | $18.75/M | — | tools · vision · cache |
| DeepSeek AI DeepSeek R1 | $3.75/M | $10.00/M | 65,536 | reasoning |
| Xai Grok 4 | $7.20/M | $36.00/M | — | tools |
| OpenAI o1 | $15.00/M | $60.00/M | — | reasoning |
FAQ
How many Replicate models are there?
40 Replicate models are listed across 1 modality on this page. 40 have public per-token pricing.
How is Replicate pricing verified?
Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.
Which Replicate model is cheapest?
Input pricing on Replicate starts at $0.0300 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.
Can I route to Replicate via an OpenAI-compatible API?
Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a Replicate target, and call Replicate models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.