Cerebras models & pricing

Cerebras hosts 7 models (7 with public pricing) covering 1 modalities. Wafer-scale inference for Llama and Qwen models — among the fastest tokens/sec available. Cheapest input starts at $0.1000/M tokens; the most premium goes up to $2.25/M. Use Future AGI's Agent Command Center to route any Cerebras model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 7

Model Input / 1M Output / 1M Context Caps
Llama3.1 8B $0.1000/M $0.1000/M 128,000 tools
GPT Oss 120B $0.350/M $0.750/M 131,072 tools · reasoning
Qwen 3.32B $0.400/M $0.800/M 128,000 tools · reasoning
Llama3.1 70B $0.600/M $0.600/M 128,000 tools
Llama 3.3 70B $0.850/M $1.20/M 128,000 tools
Zai Glm 4.6 Deprecated $2.25/M $2.75/M 128,000 tools · reasoning
Zai Glm 4.7 $2.25/M $2.75/M 128,000 tools · reasoning

FAQ

How many Cerebras models are there?

7 Cerebras models are listed across 1 modality on this page. 7 have public per-token pricing.

How is Cerebras pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which Cerebras model is cheapest?

Input pricing on Cerebras starts at $0.1000 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to Cerebras via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a Cerebras target, and call Cerebras models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any Cerebras model via Agent Command Center →
OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.