Cerebras models & pricing
Cerebras hosts 7 models (7 with public pricing) covering 1 modalities. Wafer-scale inference for Llama and Qwen models — among the fastest tokens/sec available. Cheapest input starts at $0.1000/M tokens; the most premium goes up to $2.25/M. Use Future AGI's Agent Command Center to route any Cerebras model with cost-optimized fallback and unified observability.
chat 7
| Model ↕ | Input / 1M ↕ | Output / 1M ↕ | Context ↕ | Caps |
|---|---|---|---|---|
| Llama3.1 8B | $0.1000/M | $0.1000/M | 128,000 | tools |
| GPT Oss 120B | $0.350/M | $0.750/M | 131,072 | tools · reasoning |
| Qwen 3.32B | $0.400/M | $0.800/M | 128,000 | tools · reasoning |
| Llama3.1 70B | $0.600/M | $0.600/M | 128,000 | tools |
| Llama 3.3 70B | $0.850/M | $1.20/M | 128,000 | tools |
| Zai Glm 4.6 Deprecated | $2.25/M | $2.75/M | 128,000 | tools · reasoning |
| Zai Glm 4.7 | $2.25/M | $2.75/M | 128,000 | tools · reasoning |
FAQ
How many Cerebras models are there?
7 Cerebras models are listed across 1 modality on this page. 7 have public per-token pricing.
How is Cerebras pricing verified?
Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.
Which Cerebras model is cheapest?
Input pricing on Cerebras starts at $0.1000 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.
Can I route to Cerebras via an OpenAI-compatible API?
Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a Cerebras target, and call Cerebras models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.