W&B Inference models & pricing

W&B Inference hosts 16 models (16 with public pricing) covering 1 modalities. Weights & Biases serverless inference for Llama and DeepSeek. Cheapest input starts at $0.300/M tokens; the most premium goes up to $135,000/M. Use Future AGI's Agent Command Center to route any W&B Inference model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 16

Model Input / 1M Output / 1M Context Caps
Minimaxai Minimax M2.5 $0.300/M $1.20/M 197,000 tools · reasoning
Moonshotai Kimi K2.5 $0.600/M $3.00/M 262,144 tools · vision · reasoning
Moonshotai Kimi K2 Instruct $0.600/M $2.50/M 128,000
OpenAI GPT Oss 20B $5,000/M $20,000/M 131,072
Microsoft Phi 4 mini Instruct $8,000/M $35,000/M 128,000
Qwen Qwen3.235b A22b Instruct 2507 $10,000/M $10,000/M 262,144
Qwen Qwen3.235b A22b Thinking 2507 $10,000/M $10,000/M 262,144
OpenAI GPT Oss 120B $15,000/M $60,000/M 131,072
Meta Llama Llama 4 Scout 17B 16e Instruct $17,000/M $66,000/M 64,000
Meta Llama Llama 3.1 8B Instruct $22,000/M $22,000/M 128,000
DeepSeek AI DeepSeek v3.1 $55,000/M $165,000/M 128,000
Zai Org Glm 4.5 $55,000/M $200,000/M 131,072
Meta Llama Llama 3.3 70B Instruct $71,000/M $71,000/M 128,000
Qwen Qwen3 Coder 480B A35b Instruct $100,000/M $150,000/M 262,144
DeepSeek AI DeepSeek v3.0324 $114,000/M $275,000/M 161,000
DeepSeek AI DeepSeek R1.0528 $135,000/M $540,000/M 161,000

FAQ

How many W&B Inference models are there?

16 W&B Inference models are listed across 1 modality on this page. 16 have public per-token pricing.

How is W&B Inference pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which W&B Inference model is cheapest?

Input pricing on W&B Inference starts at $0.300 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to W&B Inference via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a W&B Inference target, and call W&B Inference models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any W&B Inference model via Agent Command Center →
OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.