Anyscale models & pricing

Anyscale hosts 12 models (12 with public pricing) covering 1 modalities. Ray-powered serverless inference for Llama, Mistral, and Qwen. Cheapest input starts at $0.150/M tokens; the most premium goes up to $1.00/M. Use Future AGI's Agent Command Center to route any Anyscale model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 12

Model ↕	Input / 1M ↕	Output / 1M ↕	Context ↕	Caps
Google Gemma 7B It	$0.150/M	$0.150/M	8,192
Huggingfaceh4 Zephyr 7B beta	$0.150/M	$0.150/M	16,384
Meta Llama Llama 2.7B Chat Hf	$0.150/M	$0.150/M	4,096
Meta Llama Meta Llama 3.8B Instruct	$0.150/M	$0.150/M	8,192
Mistralai Mistral 7B Instruct v0.1	$0.150/M	$0.150/M	16,384	tools
Mistralai Mixtral 8×7B Instruct v0.1	$0.150/M	$0.150/M	16,384	tools
Meta Llama Llama 2.13B Chat Hf	$0.250/M	$0.250/M	4,096
Mistralai Mixtral 8×22B Instruct v0.1	$0.900/M	$0.900/M	65,536	tools
Codellama Codellama 34B Instruct Hf	$1.00/M	$1.00/M	4,096
Codellama Codellama 70B Instruct Hf	$1.00/M	$1.00/M	4,096
Meta Llama Llama 2.70B Chat Hf	$1.00/M	$1.00/M	4,096
Meta Llama Meta Llama 3.70B Instruct	$1.00/M	$1.00/M	8,192

FAQ

How many Anyscale models are there?

12 Anyscale models are listed across 1 modality on this page. 12 have public per-token pricing.

How is Anyscale pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which Anyscale model is cheapest?

Input pricing on Anyscale starts at $0.150 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to Anyscale via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a Anyscale target, and call Anyscale models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any Anyscale model via Agent Command Center →

OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.