Groq models & pricing

Groq hosts 14 models (11 with public pricing) covering 3 modalities. LPU inference for Llama, Mixtral, Whisper — sub-second TTFT at high throughput. Cheapest input starts at $0.0500/M tokens; the most premium goes up to $1.00/M. Use Future AGI's Agent Command Center to route any Groq model with cost-optimized fallback and unified observability.

Homepage ↗ Docs ↗

chat 11

Model Input / 1M Output / 1M Context Caps
Gemma 7B It $0.0500/M $0.0800/M 8,192 tools
Llama 3.1 8B Instant $0.0500/M $0.0800/M 128,000 tools
OpenAI GPT Oss 20B $0.0750/M $0.300/M 131,072 tools · reasoning
OpenAI GPT Oss Safeguard 20B $0.0750/M $0.300/M 131,072 tools · reasoning
Meta Llama Llama 4 Scout 17B 16e Instruct $0.110/M $0.340/M 131,072 tools · vision
OpenAI GPT Oss 120B $0.150/M $0.600/M 131,072 tools · reasoning
Meta Llama Llama 4 Maverick 17B 128e Instruct $0.200/M $0.600/M 131,072 tools · vision
Meta Llama Llama Guard 4.12B $0.200/M $0.200/M 8,192
Qwen Qwen3.32b $0.290/M $0.590/M 131,000 tools · reasoning
Llama 3.3 70B Versatile $0.590/M $0.790/M 128,000 tools
Moonshotai Kimi K2 Instruct 0905 $1.00/M $3.00/M 262,144 tools

audio transcription 2

Model Input / 1M Output / 1M Context Caps
Whisper Large v3
Whisper Large V3 Turbo

audio speech 1

Model Input / 1M Output / 1M Context Caps
Playai TTS 10,000

FAQ

How many Groq models are there?

14 Groq models are listed across 3 modalities on this page. 11 have public per-token pricing.

How is Groq pricing verified?

Pricing is aggregated from BerriAI/litellm, models.dev, and OpenRouter and refreshed weekly. Each row shows a per-model "verified" date. If a price is wrong, click the row to open the model page and use the inline "suggest edit" link — submissions go into a public review queue.

Which Groq model is cheapest?

Input pricing on Groq starts at $0.0500 per 1M tokens. Sort the table by price (or use the in-page filter at the top) to find the cheapest model that matches your capability requirements.

Can I route to Groq via an OpenAI-compatible API?

Yes — point your OpenAI client at Future AGI's Agent Command Center, configure a Groq target, and call Groq models with the standard /v1/chat/completions surface. The same gateway can route to other providers as fallback. Free for the first 100K requests/month.

Route any Groq model via Agent Command Center →
OpenAI-compatible endpoint. Caching, fallback, guardrails, observability. Free for 100K requests/month.