DeepInfra models & pricing

DeepInfra hosts 67 models (67 with public pricing) covering 1 modalities. GPU inference for Llama, Qwen, DeepSeek, FLUX with per-token pricing. Cheapest input starts at $0.0200/M tokens; the most premium goes up to $16.50/M. Use Future AGI's Agent Command Center to route any DeepInfra model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 67

Model ↕	Input / 1M ↕	Output / 1M ↕	Blended ↕	Context ↕	Caps
Meta Llama Llama 3.2 3B Instruct	$0.0200/M	$0.0200/M	$0.0200/M	131,072	tools
Meta Llama Meta Llama 3.1 8B Instruct Turbo	$0.0200/M	$0.0300/M	$0.0225/M	131,072	tools
Mistralai Mistral Nemo Instruct 2407	$0.0200/M	$0.0400/M	$0.0250/M	131,072	tools
Meta Llama Meta Llama 3.1 8B Instruct	$0.0300/M	$0.0500/M	$0.0350/M	131,072	tools
Meta Llama Meta Llama 3.8B Instruct	$0.0300/M	$0.0600/M	$0.0375/M	8,192	tools
Google Gemma 3.4B It	$0.0400/M	$0.0800/M	$0.0500/M	131,072	tools
Nvidia Nvidia Nemotron nano 9B v2	$0.0400/M	$0.160/M	$0.0700/M	131,072	tools
OpenAI GPT Oss 20B	$0.0400/M	$0.150/M	$0.0675/M	131,072	tools
Qwen Qwen2.5 7B Instruct	$0.0400/M	$0.1000/M	$0.0550/M	32,768
Sao10k L3.8b Lunaris V1 Turbo	$0.0400/M	$0.0500/M	$0.0425/M	8,192
Meta Llama Llama 3.2 11B Vision Instruct	$0.0490/M	$0.0490/M	$0.0490/M	131,072
Google Gemma 3.12B It	$0.0500/M	$0.1000/M	$0.0625/M	131,072	tools
Mistralai Mistral Small 24B Instruct 2501	$0.0500/M	$0.0800/M	$0.0575/M	32,768	tools
OpenAI GPT Oss 120B	$0.0500/M	$0.450/M	$0.150/M	131,072	tools
Meta Llama Llama Guard 3.8B	$0.0550/M	$0.0550/M	$0.0550/M	131,072
Qwen Qwen3.14b	$0.0600/M	$0.240/M	$0.105/M	40,960	tools
Microsoft Phi 4	$0.0700/M	$0.140/M	$0.0875/M	16,384	tools
Mistralai Mistral Small 3.2 24B Instruct 2506	$0.0750/M	$0.200/M	$0.106/M	128,000	tools
Gryphe Mythomax L2.13b	$0.0800/M	$0.0900/M	$0.0825/M	4,096	tools
Meta Llama Llama 4 Scout 17B 16e Instruct	$0.0800/M	$0.300/M	$0.135/M	327,680	tools
Qwen Qwen3.30b A3b	$0.0800/M	$0.290/M	$0.133/M	40,960	tools
Google Gemma 3.27B It	$0.0900/M	$0.160/M	$0.108/M	131,072	tools
Qwen Qwen3.235b A22b Instruct 2507	$0.0900/M	$0.600/M	$0.218/M	262,144	tools
Google Gemini 2.0 Flash 001 Deprecated	$0.1000/M	$0.400/M	$0.175/M	1,000,000	tools
Meta Llama Meta Llama 3.1 70B Instruct Turbo	$0.1000/M	$0.280/M	$0.145/M	131,072	tools
Nvidia Llama 3.3 Nemotron Super 49B v1.5	$0.1000/M	$0.400/M	$0.175/M	131,072	tools
Qwen Qwen3.32b	$0.1000/M	$0.280/M	$0.145/M	40,960	tools
Qwen Qwen2.5 72B Instruct	$0.120/M	$0.390/M	$0.188/M	32,768	tools
Meta Llama Llama 3.3 70B Instruct Turbo	$0.130/M	$0.390/M	$0.195/M	131,072	tools
Qwen Qwen3 Next 80B A3b Instruct	$0.140/M	$1.40/M	$0.455/M	262,144	tools
Qwen Qwen3 Next 80B A3b Thinking	$0.140/M	$1.40/M	$0.455/M	262,144	tools
Meta Llama Llama 4 Maverick 17B 128e Instruct Fp8	$0.150/M	$0.600/M	$0.262/M	1,048,576	tools
Qwen Qwq 32B	$0.150/M	$0.400/M	$0.212/M	131,072	tools
Meta Llama Llama Guard 4.12B	$0.180/M	$0.180/M	$0.180/M	163,840
Qwen Qwen3.235b A22b	$0.180/M	$0.540/M	$0.270/M	40,960	tools
DeepSeek AI DeepSeek R1 Distill Llama 70B	$0.200/M	$0.600/M	$0.300/M	131,072
Qwen Qwen2.5 VL 32B Instruct	$0.200/M	$0.600/M	$0.300/M	128,000	tools · vision
Meta Llama Llama 3.3 70B Instruct	$0.230/M	$0.400/M	$0.272/M	131,072	tools
DeepSeek AI DeepSeek v3.0324	$0.250/M	$0.880/M	$0.408/M	163,840	tools
Allenai Olmocr 7B 0725 Fp8	$0.270/M	$1.50/M	$0.578/M	16,384
DeepSeek AI DeepSeek R1 Distill Qwen 32B	$0.270/M	$0.270/M	$0.270/M	131,072	tools
DeepSeek AI DeepSeek v3.1	$0.270/M	$1.00/M	$0.453/M	163,840	tools · reasoning
DeepSeek AI DeepSeek V3.1 Terminus	$0.270/M	$1.00/M	$0.453/M	163,840	tools
Qwen Qwen3 Coder 480B A35b Instruct Turbo	$0.290/M	$1.20/M	$0.518/M	262,144	tools
Google Gemini 2.5 Flash	$0.300/M	$2.50/M	$0.850/M	1,000,000	tools
Nousresearch Hermes 3 Llama 3.1 70B	$0.300/M	$0.300/M	$0.300/M	131,072
Qwen Qwen3.235b A22b Thinking 2507	$0.300/M	$2.90/M	$0.950/M	262,144	tools
DeepSeek AI DeepSeek v3	$0.380/M	$0.890/M	$0.508/M	163,840	tools
Meta Llama Meta Llama 3.1 70B Instruct	$0.400/M	$0.400/M	$0.400/M	131,072	tools
Mistralai Mixtral 8×7B Instruct v0.1	$0.400/M	$0.400/M	$0.400/M	32,768	tools
Qwen Qwen3 Coder 480B A35b Instruct	$0.400/M	$1.60/M	$0.700/M	262,144	tools
Zai Org Glm 4.5	$0.400/M	$1.60/M	$0.700/M	131,072	tools
Microsoft Wizardlm 2.8x22b	$0.480/M	$0.480/M	$0.480/M	65,536
DeepSeek AI DeepSeek R1.0528	$0.500/M	$2.15/M	$0.913/M	163,840	tools
Moonshotai Kimi K2 Instruct	$0.500/M	$2.00/M	$0.875/M	131,072	tools
Moonshotai Kimi K2 Instruct 0905	$0.500/M	$2.00/M	$0.875/M	262,144	tools
Nvidia Llama 3.1 Nemotron 70B Instruct	$0.600/M	$0.600/M	$0.600/M	131,072	tools
Sao10k L3.1 70B Euryale v2.2	$0.650/M	$0.750/M	$0.675/M	131,072
Sao10k L3.3 70B Euryale v2.3	$0.650/M	$0.750/M	$0.675/M	131,072
DeepSeek AI DeepSeek R1	$0.700/M	$2.40/M	$1.13/M	163,840	tools
DeepSeek AI DeepSeek R1.0528 Turbo	$1.00/M	$3.00/M	$1.50/M	32,768	tools
DeepSeek AI DeepSeek R1 Turbo	$1.00/M	$3.00/M	$1.50/M	40,960	tools
Nousresearch Hermes 3 Llama 3.1 405B	$1.00/M	$1.00/M	$1.00/M	131,072	tools
Google Gemini 2.5 Pro	$1.25/M	$10.00/M	$3.44/M	1,000,000	tools
Anthropic Claude 3.7 Sonnet latest	$3.30/M	$16.50/M	$6.60/M	200,000	tools
Anthropic Claude 4 Sonnet	$3.30/M	$16.50/M	$6.60/M	200,000	tools
Anthropic Claude 4 Opus	$16.50/M	$82.50/M	$33.00/M	200,000	tools

FAQ

How many DeepInfra models are there?

67 DeepInfra models are listed across 1 modality on this page. 67 have public per-token pricing.

How is DeepInfra pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which DeepInfra model is cheapest?

Input pricing on DeepInfra starts at $0.0200 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to DeepInfra via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a DeepInfra target, and call DeepInfra models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any DeepInfra model via Agent Command Center →

OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.