LlamaGate models & pricing
LlamaGate hosts 16 models (16 with public pricing) covering 2 modalities. Llama-family inference gateway with OpenAI-compatible endpoints. Cheapest input starts at $0.0200/M tokens; the most premium goes up to $0.150/M. Use Future AGI's Agent Command Center to route any LlamaGate model with cost-optimized fallback and unified observability.
chat 14
| Model ↕ | Input / 1M ↕ | Output / 1M ↕ | Context ↕ | Caps |
|---|---|---|---|---|
| Gemma3.4b | $0.0300/M | $0.0800/M | 128,000 | tools · vision |
| Llama 3.1 8B | $0.0300/M | $0.0500/M | 131,072 | tools |
| Llama 3.2 3B | $0.0400/M | $0.0800/M | 131,072 | tools |
| Qwen3.8b | $0.0400/M | $0.140/M | 32,768 | tools |
| Codellama 7B | $0.0600/M | $0.120/M | 16,384 | tools |
| DeepSeek Coder 6.7B | $0.0600/M | $0.120/M | 16,384 | tools |
| Qwen2.5 Coder 7B | $0.0600/M | $0.120/M | 32,768 | tools |
| DeepSeek R1.7b Qwen | $0.0800/M | $0.150/M | 131,072 | tools · reasoning |
| Dolphin3.8b | $0.0800/M | $0.150/M | 128,000 | tools |
| Openthinker 7B | $0.0800/M | $0.150/M | 32,768 | tools · reasoning |
| DeepSeek R1.8b | $0.1000/M | $0.200/M | 65,536 | tools · reasoning |
| Llava 7B | $0.1000/M | $0.200/M | 4,096 | vision |
| Mistral 7B v0.3 | $0.1000/M | $0.150/M | 32,768 | tools |
| Qwen3 VL 8B | $0.150/M | $0.550/M | 32,768 | tools · vision |
embedding 2
| Model ↕ | Input / 1M ↕ | Output / 1M ↕ | Context ↕ | Caps |
|---|---|---|---|---|
| Nomic Embed Text | $0.0200/M | — | 8,192 | |
| Qwen3 Embedding 8B | $0.0200/M | — | 40,960 |
FAQ
How many LlamaGate models are there?
16 LlamaGate models are listed across 2 modalities on this page. 16 have public per-token pricing.
How is LlamaGate pricing verified?
Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.
Which LlamaGate model is cheapest?
Input pricing on LlamaGate starts at $0.0200 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.
Can I route to LlamaGate via an OpenAI-compatible API?
Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a LlamaGate target, and call LlamaGate models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.