Cerebras models & pricing

Cerebras hosts 7 models (7 with public pricing) covering 1 modalities. Wafer-scale inference for Llama and Qwen models — among the fastest tokens/sec available. Cheapest input starts at $0.1000/M tokens; the most premium goes up to $2.25/M. Use Future AGI's Agent Command Center to route any Cerebras model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 7

Model ↕	Input / 1M ↕	Output / 1M ↕	Context ↕	Caps
Llama3.1 8B	$0.1000/M	$0.1000/M	128,000	tools
GPT Oss 120B	$0.350/M	$0.750/M	131,072	tools · reasoning
Qwen 3.32B	$0.400/M	$0.800/M	128,000	tools · reasoning
Llama3.1 70B	$0.600/M	$0.600/M	128,000	tools
Llama 3.3 70B	$0.850/M	$1.20/M	128,000	tools
Zai Glm 4.6 Deprecated	$2.25/M	$2.75/M	128,000	tools · reasoning
Zai Glm 4.7	$2.25/M	$2.75/M	128,000	tools · reasoning

FAQ

How many Cerebras models are there?

7 Cerebras models are listed across 1 modality on this page. 7 have public per-token pricing.

How is Cerebras pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which Cerebras model is cheapest?

Input pricing on Cerebras starts at $0.1000 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to Cerebras via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a Cerebras target, and call Cerebras models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any Cerebras model via Agent Command Center →

OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.