LlamaGate models & pricing

LlamaGate hosts 16 models (16 with public pricing) covering 2 modalities. Llama-family inference gateway with OpenAI-compatible endpoints. Cheapest input starts at $0.0200/M tokens; the most premium goes up to $0.150/M. Use Future AGI's Agent Command Center to route any LlamaGate model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 14

Model ↕	Input / 1M ↕	Output / 1M ↕	Blended ↕	Context ↕	Caps
Gemma3.4b	$0.0300/M	$0.0800/M	$0.0425/M	128,000	tools · vision
Llama 3.1 8B	$0.0300/M	$0.0500/M	$0.0350/M	131,072	tools
Llama 3.2 3B	$0.0400/M	$0.0800/M	$0.0500/M	131,072	tools
Qwen3.8b	$0.0400/M	$0.140/M	$0.0650/M	32,768	tools
Codellama 7B	$0.0600/M	$0.120/M	$0.0750/M	16,384	tools
DeepSeek Coder 6.7B	$0.0600/M	$0.120/M	$0.0750/M	16,384	tools
Qwen2.5 Coder 7B	$0.0600/M	$0.120/M	$0.0750/M	32,768	tools
DeepSeek R1.7b Qwen	$0.0800/M	$0.150/M	$0.0975/M	131,072	tools · reasoning
Dolphin3.8b	$0.0800/M	$0.150/M	$0.0975/M	128,000	tools
Openthinker 7B	$0.0800/M	$0.150/M	$0.0975/M	32,768	tools · reasoning
DeepSeek R1.8b	$0.1000/M	$0.200/M	$0.125/M	65,536	tools · reasoning
Llava 7B	$0.1000/M	$0.200/M	$0.125/M	4,096	vision
Mistral 7B v0.3	$0.1000/M	$0.150/M	$0.112/M	32,768	tools
Qwen3 VL 8B	$0.150/M	$0.550/M	$0.250/M	32,768	tools · vision

embedding 2

Model ↕	Input / 1M ↕	Output / 1M ↕	Blended ↕	Context ↕	Caps
Nomic Embed Text	$0.0200/M	—	—	8,192
Qwen3 Embedding 8B	$0.0200/M	—	—	40,960

FAQ

How many LlamaGate models are there?

16 LlamaGate models are listed across 2 modalities on this page. 16 have public per-token pricing.

How is LlamaGate pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which LlamaGate model is cheapest?

Input pricing on LlamaGate starts at $0.0200 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to LlamaGate via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a LlamaGate target, and call LlamaGate models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any LlamaGate model via Agent Command Center →

OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.