LlamaGate models & pricing

LlamaGate hosts 16 models (16 with public pricing) covering 2 modalities. Llama-family inference gateway with OpenAI-compatible endpoints. Cheapest input starts at $0.0200/M tokens; the most premium goes up to $0.150/M. Use Future AGI's Agent Command Center to route any LlamaGate model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 14

Model Input / 1M Output / 1M Context Caps
Gemma3.4b $0.0300/M $0.0800/M 128,000 tools · vision
Llama 3.1 8B $0.0300/M $0.0500/M 131,072 tools
Llama 3.2 3B $0.0400/M $0.0800/M 131,072 tools
Qwen3.8b $0.0400/M $0.140/M 32,768 tools
Codellama 7B $0.0600/M $0.120/M 16,384 tools
DeepSeek Coder 6.7B $0.0600/M $0.120/M 16,384 tools
Qwen2.5 Coder 7B $0.0600/M $0.120/M 32,768 tools
DeepSeek R1.7b Qwen $0.0800/M $0.150/M 131,072 tools · reasoning
Dolphin3.8b $0.0800/M $0.150/M 128,000 tools
Openthinker 7B $0.0800/M $0.150/M 32,768 tools · reasoning
DeepSeek R1.8b $0.1000/M $0.200/M 65,536 tools · reasoning
Llava 7B $0.1000/M $0.200/M 4,096 vision
Mistral 7B v0.3 $0.1000/M $0.150/M 32,768 tools
Qwen3 VL 8B $0.150/M $0.550/M 32,768 tools · vision

embedding 2

Model Input / 1M Output / 1M Context Caps
Nomic Embed Text $0.0200/M 8,192
Qwen3 Embedding 8B $0.0200/M 40,960

FAQ

How many LlamaGate models are there?

16 LlamaGate models are listed across 2 modalities on this page. 16 have public per-token pricing.

How is LlamaGate pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which LlamaGate model is cheapest?

Input pricing on LlamaGate starts at $0.0200 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to LlamaGate via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a LlamaGate target, and call LlamaGate models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any LlamaGate model via Agent Command Center →
OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.