DeepInfra models & pricing
DeepInfra hosts 67 models (67 with public pricing) covering 1 modalities. GPU inference for Llama, Qwen, DeepSeek, FLUX with per-token pricing. Cheapest input starts at $0.0200/M tokens; the most premium goes up to $16.50/M. Use Future AGI's Agent Command Center to route any DeepInfra model with cost-optimized fallback and unified observability.
chat 67
| Model ↕ | Input / 1M ↕ | Output / 1M ↕ | Context ↕ | Caps |
|---|---|---|---|---|
| Meta Llama Llama 3.2 3B Instruct | $0.0200/M | $0.0200/M | 131,072 | tools |
| Meta Llama Meta Llama 3.1 8B Instruct Turbo | $0.0200/M | $0.0300/M | 131,072 | tools |
| Mistralai Mistral Nemo Instruct 2407 | $0.0200/M | $0.0400/M | 131,072 | tools |
| Meta Llama Meta Llama 3.1 8B Instruct | $0.0300/M | $0.0500/M | 131,072 | tools |
| Meta Llama Meta Llama 3.8B Instruct | $0.0300/M | $0.0600/M | 8,192 | tools |
| Google Gemma 3.4B It | $0.0400/M | $0.0800/M | 131,072 | tools |
| Nvidia Nvidia Nemotron nano 9B v2 | $0.0400/M | $0.160/M | 131,072 | tools |
| OpenAI GPT Oss 20B | $0.0400/M | $0.150/M | 131,072 | tools |
| Qwen Qwen2.5 7B Instruct | $0.0400/M | $0.1000/M | 32,768 | |
| Sao10k L3.8b Lunaris V1 Turbo | $0.0400/M | $0.0500/M | 8,192 | |
| Meta Llama Llama 3.2 11B Vision Instruct | $0.0490/M | $0.0490/M | 131,072 | |
| Google Gemma 3.12B It | $0.0500/M | $0.1000/M | 131,072 | tools |
| Mistralai Mistral Small 24B Instruct 2501 | $0.0500/M | $0.0800/M | 32,768 | tools |
| OpenAI GPT Oss 120B | $0.0500/M | $0.450/M | 131,072 | tools |
| Meta Llama Llama Guard 3.8B | $0.0550/M | $0.0550/M | 131,072 | |
| Qwen Qwen3.14b | $0.0600/M | $0.240/M | 40,960 | tools |
| Microsoft Phi 4 | $0.0700/M | $0.140/M | 16,384 | tools |
| Mistralai Mistral Small 3.2 24B Instruct 2506 | $0.0750/M | $0.200/M | 128,000 | tools |
| Gryphe Mythomax L2.13b | $0.0800/M | $0.0900/M | 4,096 | tools |
| Meta Llama Llama 4 Scout 17B 16e Instruct | $0.0800/M | $0.300/M | 327,680 | tools |
| Qwen Qwen3.30b A3b | $0.0800/M | $0.290/M | 40,960 | tools |
| Google Gemma 3.27B It | $0.0900/M | $0.160/M | 131,072 | tools |
| Qwen Qwen3.235b A22b Instruct 2507 | $0.0900/M | $0.600/M | 262,144 | tools |
| Google Gemini 2.0 Flash 001 Deprecates in 19d | $0.1000/M | $0.400/M | 1,000,000 | tools |
| Meta Llama Meta Llama 3.1 70B Instruct Turbo | $0.1000/M | $0.280/M | 131,072 | tools |
| Nvidia Llama 3.3 Nemotron Super 49B v1.5 | $0.1000/M | $0.400/M | 131,072 | tools |
| Qwen Qwen3.32b | $0.1000/M | $0.280/M | 40,960 | tools |
| Qwen Qwen2.5 72B Instruct | $0.120/M | $0.390/M | 32,768 | tools |
| Meta Llama Llama 3.3 70B Instruct Turbo | $0.130/M | $0.390/M | 131,072 | tools |
| Qwen Qwen3 Next 80B A3b Instruct | $0.140/M | $1.40/M | 262,144 | tools |
| Qwen Qwen3 Next 80B A3b Thinking | $0.140/M | $1.40/M | 262,144 | tools |
| Meta Llama Llama 4 Maverick 17B 128e Instruct Fp8 | $0.150/M | $0.600/M | 1,048,576 | tools |
| Qwen Qwq 32B | $0.150/M | $0.400/M | 131,072 | tools |
| Meta Llama Llama Guard 4.12B | $0.180/M | $0.180/M | 163,840 | |
| Qwen Qwen3.235b A22b | $0.180/M | $0.540/M | 40,960 | tools |
| DeepSeek AI DeepSeek R1 Distill Llama 70B | $0.200/M | $0.600/M | 131,072 | |
| Qwen Qwen2.5 VL 32B Instruct | $0.200/M | $0.600/M | 128,000 | tools · vision |
| Meta Llama Llama 3.3 70B Instruct | $0.230/M | $0.400/M | 131,072 | tools |
| DeepSeek AI DeepSeek v3.0324 | $0.250/M | $0.880/M | 163,840 | tools |
| Allenai Olmocr 7B 0725 Fp8 | $0.270/M | $1.50/M | 16,384 | |
| DeepSeek AI DeepSeek R1 Distill Qwen 32B | $0.270/M | $0.270/M | 131,072 | tools |
| DeepSeek AI DeepSeek v3.1 | $0.270/M | $1.00/M | 163,840 | tools · reasoning |
| DeepSeek AI DeepSeek V3.1 Terminus | $0.270/M | $1.00/M | 163,840 | tools |
| Qwen Qwen3 Coder 480B A35b Instruct Turbo | $0.290/M | $1.20/M | 262,144 | tools |
| Google Gemini 2.5 Flash | $0.300/M | $2.50/M | 1,000,000 | tools |
| Nousresearch Hermes 3 Llama 3.1 70B | $0.300/M | $0.300/M | 131,072 | |
| Qwen Qwen3.235b A22b Thinking 2507 | $0.300/M | $2.90/M | 262,144 | tools |
| DeepSeek AI DeepSeek v3 | $0.380/M | $0.890/M | 163,840 | tools |
| Meta Llama Meta Llama 3.1 70B Instruct | $0.400/M | $0.400/M | 131,072 | tools |
| Mistralai Mixtral 8×7B Instruct v0.1 | $0.400/M | $0.400/M | 32,768 | tools |
| Qwen Qwen3 Coder 480B A35b Instruct | $0.400/M | $1.60/M | 262,144 | tools |
| Zai Org Glm 4.5 | $0.400/M | $1.60/M | 131,072 | tools |
| Microsoft Wizardlm 2.8x22b | $0.480/M | $0.480/M | 65,536 | |
| DeepSeek AI DeepSeek R1.0528 | $0.500/M | $2.15/M | 163,840 | tools |
| Moonshotai Kimi K2 Instruct | $0.500/M | $2.00/M | 131,072 | tools |
| Moonshotai Kimi K2 Instruct 0905 | $0.500/M | $2.00/M | 262,144 | tools |
| Nvidia Llama 3.1 Nemotron 70B Instruct | $0.600/M | $0.600/M | 131,072 | tools |
| Sao10k L3.1 70B Euryale v2.2 | $0.650/M | $0.750/M | 131,072 | |
| Sao10k L3.3 70B Euryale v2.3 | $0.650/M | $0.750/M | 131,072 | |
| DeepSeek AI DeepSeek R1 | $0.700/M | $2.40/M | 163,840 | tools |
| DeepSeek AI DeepSeek R1.0528 Turbo | $1.00/M | $3.00/M | 32,768 | tools |
| DeepSeek AI DeepSeek R1 Turbo | $1.00/M | $3.00/M | 40,960 | tools |
| Nousresearch Hermes 3 Llama 3.1 405B | $1.00/M | $1.00/M | 131,072 | tools |
| Google Gemini 2.5 Pro | $1.25/M | $10.00/M | 1,000,000 | tools |
| Anthropic Claude 3.7 Sonnet latest | $3.30/M | $16.50/M | 200,000 | tools |
| Anthropic Claude 4 Sonnet | $3.30/M | $16.50/M | 200,000 | tools |
| Anthropic Claude 4 Opus | $16.50/M | $82.50/M | 200,000 | tools |
FAQ
How many DeepInfra models are there?
67 DeepInfra models are listed across 1 modality on this page. 67 have public per-token pricing.
How is DeepInfra pricing verified?
Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.
Which DeepInfra model is cheapest?
Input pricing on DeepInfra starts at $0.0200 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.
Can I route to DeepInfra via an OpenAI-compatible API?
Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a DeepInfra target, and call DeepInfra models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.