DeepInfra models & pricing

DeepInfra hosts 67 models (67 with public pricing) covering 1 modalities. GPU inference for Llama, Qwen, DeepSeek, FLUX with per-token pricing. Cheapest input starts at $0.0200/M tokens; the most premium goes up to $16.50/M. Use Future AGI's Agent Command Center to route any DeepInfra model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗
67

chat 67

Model Input / 1M Output / 1M Context Caps
Meta Llama Llama 3.2 3B Instruct $0.0200/M $0.0200/M 131,072 tools
Meta Llama Meta Llama 3.1 8B Instruct Turbo $0.0200/M $0.0300/M 131,072 tools
Mistralai Mistral Nemo Instruct 2407 $0.0200/M $0.0400/M 131,072 tools
Meta Llama Meta Llama 3.1 8B Instruct $0.0300/M $0.0500/M 131,072 tools
Meta Llama Meta Llama 3.8B Instruct $0.0300/M $0.0600/M 8,192 tools
Google Gemma 3.4B It $0.0400/M $0.0800/M 131,072 tools
Nvidia Nvidia Nemotron nano 9B v2 $0.0400/M $0.160/M 131,072 tools
OpenAI GPT Oss 20B $0.0400/M $0.150/M 131,072 tools
Qwen Qwen2.5 7B Instruct $0.0400/M $0.1000/M 32,768
Sao10k L3.8b Lunaris V1 Turbo $0.0400/M $0.0500/M 8,192
Meta Llama Llama 3.2 11B Vision Instruct $0.0490/M $0.0490/M 131,072
Google Gemma 3.12B It $0.0500/M $0.1000/M 131,072 tools
Mistralai Mistral Small 24B Instruct 2501 $0.0500/M $0.0800/M 32,768 tools
OpenAI GPT Oss 120B $0.0500/M $0.450/M 131,072 tools
Meta Llama Llama Guard 3.8B $0.0550/M $0.0550/M 131,072
Qwen Qwen3.14b $0.0600/M $0.240/M 40,960 tools
Microsoft Phi 4 $0.0700/M $0.140/M 16,384 tools
Mistralai Mistral Small 3.2 24B Instruct 2506 $0.0750/M $0.200/M 128,000 tools
Gryphe Mythomax L2.13b $0.0800/M $0.0900/M 4,096 tools
Meta Llama Llama 4 Scout 17B 16e Instruct $0.0800/M $0.300/M 327,680 tools
Qwen Qwen3.30b A3b $0.0800/M $0.290/M 40,960 tools
Google Gemma 3.27B It $0.0900/M $0.160/M 131,072 tools
Qwen Qwen3.235b A22b Instruct 2507 $0.0900/M $0.600/M 262,144 tools
Google Gemini 2.0 Flash 001 Deprecates in 19d $0.1000/M $0.400/M 1,000,000 tools
Meta Llama Meta Llama 3.1 70B Instruct Turbo $0.1000/M $0.280/M 131,072 tools
Nvidia Llama 3.3 Nemotron Super 49B v1.5 $0.1000/M $0.400/M 131,072 tools
Qwen Qwen3.32b $0.1000/M $0.280/M 40,960 tools
Qwen Qwen2.5 72B Instruct $0.120/M $0.390/M 32,768 tools
Meta Llama Llama 3.3 70B Instruct Turbo $0.130/M $0.390/M 131,072 tools
Qwen Qwen3 Next 80B A3b Instruct $0.140/M $1.40/M 262,144 tools
Qwen Qwen3 Next 80B A3b Thinking $0.140/M $1.40/M 262,144 tools
Meta Llama Llama 4 Maverick 17B 128e Instruct Fp8 $0.150/M $0.600/M 1,048,576 tools
Qwen Qwq 32B $0.150/M $0.400/M 131,072 tools
Meta Llama Llama Guard 4.12B $0.180/M $0.180/M 163,840
Qwen Qwen3.235b A22b $0.180/M $0.540/M 40,960 tools
DeepSeek AI DeepSeek R1 Distill Llama 70B $0.200/M $0.600/M 131,072
Qwen Qwen2.5 VL 32B Instruct $0.200/M $0.600/M 128,000 tools · vision
Meta Llama Llama 3.3 70B Instruct $0.230/M $0.400/M 131,072 tools
DeepSeek AI DeepSeek v3.0324 $0.250/M $0.880/M 163,840 tools
Allenai Olmocr 7B 0725 Fp8 $0.270/M $1.50/M 16,384
DeepSeek AI DeepSeek R1 Distill Qwen 32B $0.270/M $0.270/M 131,072 tools
DeepSeek AI DeepSeek v3.1 $0.270/M $1.00/M 163,840 tools · reasoning
DeepSeek AI DeepSeek V3.1 Terminus $0.270/M $1.00/M 163,840 tools
Qwen Qwen3 Coder 480B A35b Instruct Turbo $0.290/M $1.20/M 262,144 tools
Google Gemini 2.5 Flash $0.300/M $2.50/M 1,000,000 tools
Nousresearch Hermes 3 Llama 3.1 70B $0.300/M $0.300/M 131,072
Qwen Qwen3.235b A22b Thinking 2507 $0.300/M $2.90/M 262,144 tools
DeepSeek AI DeepSeek v3 $0.380/M $0.890/M 163,840 tools
Meta Llama Meta Llama 3.1 70B Instruct $0.400/M $0.400/M 131,072 tools
Mistralai Mixtral 8×7B Instruct v0.1 $0.400/M $0.400/M 32,768 tools
Qwen Qwen3 Coder 480B A35b Instruct $0.400/M $1.60/M 262,144 tools
Zai Org Glm 4.5 $0.400/M $1.60/M 131,072 tools
Microsoft Wizardlm 2.8x22b $0.480/M $0.480/M 65,536
DeepSeek AI DeepSeek R1.0528 $0.500/M $2.15/M 163,840 tools
Moonshotai Kimi K2 Instruct $0.500/M $2.00/M 131,072 tools
Moonshotai Kimi K2 Instruct 0905 $0.500/M $2.00/M 262,144 tools
Nvidia Llama 3.1 Nemotron 70B Instruct $0.600/M $0.600/M 131,072 tools
Sao10k L3.1 70B Euryale v2.2 $0.650/M $0.750/M 131,072
Sao10k L3.3 70B Euryale v2.3 $0.650/M $0.750/M 131,072
DeepSeek AI DeepSeek R1 $0.700/M $2.40/M 163,840 tools
DeepSeek AI DeepSeek R1.0528 Turbo $1.00/M $3.00/M 32,768 tools
DeepSeek AI DeepSeek R1 Turbo $1.00/M $3.00/M 40,960 tools
Nousresearch Hermes 3 Llama 3.1 405B $1.00/M $1.00/M 131,072 tools
Google Gemini 2.5 Pro $1.25/M $10.00/M 1,000,000 tools
Anthropic Claude 3.7 Sonnet latest $3.30/M $16.50/M 200,000 tools
Anthropic Claude 4 Sonnet $3.30/M $16.50/M 200,000 tools
Anthropic Claude 4 Opus $16.50/M $82.50/M 200,000 tools

FAQ

How many DeepInfra models are there?

67 DeepInfra models are listed across 1 modality on this page. 67 have public per-token pricing.

How is DeepInfra pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which DeepInfra model is cheapest?

Input pricing on DeepInfra starts at $0.0200 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to DeepInfra via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a DeepInfra target, and call DeepInfra models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any DeepInfra model via Agent Command Center →
OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.