W&B Inference models & pricing

W&B Inference hosts 16 models (16 with public pricing) covering 1 modalities. Weights & Biases serverless inference for Llama and DeepSeek. Cheapest input starts at $0.300/M tokens; the most premium goes up to $135,000/M. Use Future AGI's Agent Command Center to route any W&B Inference model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 16

Model ↕	Input / 1M ↕	Output / 1M ↕	Context ↕	Caps
Minimaxai Minimax M2.5	$0.300/M	$1.20/M	197,000	tools · reasoning
Moonshotai Kimi K2.5	$0.600/M	$3.00/M	262,144	tools · vision · reasoning
Moonshotai Kimi K2 Instruct	$0.600/M	$2.50/M	128,000
OpenAI GPT Oss 20B	$5,000/M	$20,000/M	131,072
Microsoft Phi 4 mini Instruct	$8,000/M	$35,000/M	128,000
Qwen Qwen3.235b A22b Instruct 2507	$10,000/M	$10,000/M	262,144
Qwen Qwen3.235b A22b Thinking 2507	$10,000/M	$10,000/M	262,144
OpenAI GPT Oss 120B	$15,000/M	$60,000/M	131,072
Meta Llama Llama 4 Scout 17B 16e Instruct	$17,000/M	$66,000/M	64,000
Meta Llama Llama 3.1 8B Instruct	$22,000/M	$22,000/M	128,000
DeepSeek AI DeepSeek v3.1	$55,000/M	$165,000/M	128,000
Zai Org Glm 4.5	$55,000/M	$200,000/M	131,072
Meta Llama Llama 3.3 70B Instruct	$71,000/M	$71,000/M	128,000
Qwen Qwen3 Coder 480B A35b Instruct	$100,000/M	$150,000/M	262,144
DeepSeek AI DeepSeek v3.0324	$114,000/M	$275,000/M	161,000
DeepSeek AI DeepSeek R1.0528	$135,000/M	$540,000/M	161,000

FAQ

How many W&B Inference models are there?

16 W&B Inference models are listed across 1 modality on this page. 16 have public per-token pricing.

How is W&B Inference pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which W&B Inference model is cheapest?

Input pricing on W&B Inference starts at $0.300 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to W&B Inference via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a W&B Inference target, and call W&B Inference models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any W&B Inference model via Agent Command Center →

OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.