Nebius models & pricing

Nebius hosts 30 models (30 with public pricing) covering 2 modalities. European GPU cloud with Inference Studio for Llama, Qwen, DeepSeek, FLUX. Cheapest input starts at $0.0100/M tokens; the most premium goes up to $1.00/M. Use Future AGI's Agent Command Center to route any Nebius model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 27

Model ↕	Input / 1M ↕	Output / 1M ↕	Blended ↕	Context ↕	Caps
Qwen Qwen2.5 Coder 7B	$0.0100/M	$0.0300/M	$0.0150/M	32,768	tools
Meta Llama Llama Guard 3.8B	$0.0200/M	$0.0600/M	$0.0300/M	128,000
Meta Llama Meta Llama 3.1 8B Instruct	$0.0200/M	$0.0600/M	$0.0300/M	128,000	tools
Qwen Qwen2 VL 7B Instruct	$0.0200/M	$0.0600/M	$0.0300/M	131,072	vision
Mistralai Mistral Nemo Instruct 2407	$0.0400/M	$0.120/M	$0.0600/M	128,000	tools
Google Gemma 3.27B It	$0.0600/M	$0.200/M	$0.0950/M	128,000	tools · vision
Qwen Qwen2.5 32B Instruct	$0.0600/M	$0.200/M	$0.0950/M	128,000	tools
Qwen Qwen3.14b	$0.0800/M	$0.240/M	$0.120/M	32,768	tools
Qwen Qwen3.4b	$0.0800/M	$0.240/M	$0.120/M	32,768	tools
Nvidia Llama 3.3 Nemotron Super 49B v1	$0.1000/M	$0.400/M	$0.175/M	131,072	tools
Qwen Qwen3.30b A3b	$0.1000/M	$0.300/M	$0.150/M	32,768	tools
Qwen Qwen3.32b	$0.1000/M	$0.300/M	$0.150/M	32,768	tools
Meta Llama Llama 3.3 70B Instruct	$0.130/M	$0.400/M	$0.198/M	128,000	tools
Meta Llama Meta Llama 3.1 70B Instruct	$0.130/M	$0.400/M	$0.198/M	128,000	tools
Qwen Qwen2.5 72B Instruct	$0.130/M	$0.400/M	$0.198/M	128,000	tools
Qwen Qwen2.5 VL 72B Instruct	$0.130/M	$0.400/M	$0.198/M	131,072	tools · vision
Qwen Qwen2 VL 72B Instruct	$0.130/M	$0.400/M	$0.198/M	131,072	tools · vision
Qwen Qwq 32B	$0.150/M	$0.450/M	$0.225/M	32,768	tools · reasoning
Qwen Qwen3.235b A22b	$0.200/M	$0.600/M	$0.300/M	262,144	tools
DeepSeek AI DeepSeek R1 Distill Llama 70B	$0.250/M	$0.750/M	$0.375/M	128,000	tools
DeepSeek AI DeepSeek v3	$0.500/M	$1.50/M	$0.750/M	128,000	tools
DeepSeek AI DeepSeek v3.0324	$0.500/M	$1.50/M	$0.750/M	128,000	tools
Nvidia Llama 3.1 Nemotron Ultra 253B v1	$0.600/M	$1.80/M	$0.900/M	128,000	tools
DeepSeek AI DeepSeek R1	$0.800/M	$2.40/M	$1.20/M	128,000	tools · reasoning
DeepSeek AI DeepSeek R1.0528	$0.800/M	$2.40/M	$1.20/M	164,000	tools · reasoning
Meta Llama Meta Llama 3.1 405B Instruct	$1.00/M	$3.00/M	$1.50/M	128,000	tools
Nousresearch Hermes 3 Llama 3.1 405B	$1.00/M	$3.00/M	$1.50/M	128,000	tools

embedding 3

Model ↕	Input / 1M ↕	Output / 1M ↕	Blended ↕	Context ↕
Baai Bge En Icl	$0.0100/M	—	—	32,768
Baai Bge Multilingual Gemma2	$0.0100/M	—	—	8,192
Intfloat E5 Mistral 7B Instruct	$0.0100/M	—	—	32,768

FAQ

How many Nebius models are there?

30 Nebius models are listed across 2 modalities on this page. 30 have public per-token pricing.

How is Nebius pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which Nebius model is cheapest?

Input pricing on Nebius starts at $0.0100 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to Nebius via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a Nebius target, and call Nebius models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any Nebius model via Agent Command Center →

OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.