Guides

Best 5 AI Gateways for Gemini CLI Multi-Model Routing in 2026

Q: What is the cheapest way to route Gemini CLI to non-Google models?

OpenRouter for under-10-person teams that can live with the wrapper tax. LiteLLM's OSS proxy for self-hosted teams. Above $5K/month, cost-aware routing in FAGI or Portkey usually pays for itself in four weeks — the easy-turn route to Flash recovers more than the standing fee.

Q: Does Gemini CLI support OpenAI-compatible endpoints?

Gemini CLI speaks Gemini natively. FAGI, Portkey, LiteLLM, and Bifrost accept its requests directly via `GEMINI_API_ENDPOINT`. OpenRouter requires a wrapper to convert to OpenAI-shape and back.

Q: Can I route Gemini CLI through multiple model providers in the same session?

Yes. Standard pattern: small turns to `gemini-2.5-flash`, mid-size to `gemini-3-pro`, multi-file refactor to `claude-opus-4-7`. FAGI, Portkey, LiteLLM, and Bifrost support this declaratively; OpenRouter requires the wrapper to pick per turn.

Q: How do I track Gemini CLI cost per developer with one shared Google API key?

Use a gateway with virtual keys (FAGI, Portkey, LiteLLM, Bifrost). Tag each key with the developer's SSO email. A single 1M-context Pro turn can equal hundreds of normal chat calls in billing.

Q: What happens to file-edit and shell-exec tool calls when the gateway routes to Claude or GPT?

The gateway rewrites `tool_use` (or `tool_calls`) into `functionCall`. All five do this; caveat: Portkey serializes parallel OpenAI `tool_calls` rather than parallelizing them.

Q: What happens to the safety filter when the gateway routes to a different provider?

FAGI and Bifrost synthesize a Gemini-shape `safetyRatings` block from the upstream content-filter signal. Portkey and LiteLLM forward what the upstream gave them — an OpenAI `content_filter` arrives as an empty completion with no Gemini rating. OpenRouter requires the wrapper.

Q: Is it safe to send source code from Gemini CLI through an AI gateway?

For hosted gateways, both endpoints already see the code. If compliance forbids the hosted hop, self-hosted LiteLLM, self-hosted Bifrost, or FAGI's BYOC deployment are the viable picks. OpenRouter is cloud-only.

Q: How is FAGI Agent Command Center different from Portkey for Gemini CLI specifically?

Portkey is a hosted observation and routing layer with the largest adapter library. FAGI adds an optimization layer — every routed turn feeds back into prompt rewrites and policy updates, so the gateway gets better at choosing Flash, Pro, and cross-provider routes over time. FAGI also synthesizes Gemini-shape `safetyRatings` on cross-provider routes. Portkey gives you a dashboard. FAGI gives you a dashboard plus a loop.

Five AI gateways scored on Gemini CLI multi-model routing 2026: 1M-context, safety-filter passthrough, Anthropic and OpenAI translation.

February 8, 2026

16 min read

ai-gateway 2026 gemini llm-routing

Table of Contents

Gemini CLI is Google’s terminal coding agent, launched in 2025 and paired with the Gemini 2.5 and 3.x line. It reads GOOGLE_API_KEY, talks to Google AI Studio, and assumes every response carries Gemini’s function-call shape, safety-rating array, and response schema. Point it at Anthropic or OpenAI directly and three things break in the same minute: the function-call JSON gets ignored, the safety filter returns a 400 the CLI can’t parse, and the streaming format drifts on the first tool result.

A gateway fixes this. It accepts the Gemini API request shape, translates per provider, preserves tool calls across the hop, threads the safety-rating handling, and streams a response the CLI can render. Only one of the five below turns the routed traffic into a feedback loop that gets cheaper every week.

This is the 2026 cohort, scored on the seven axes that matter when Gemini CLI is the workload.

TL;DR

Future AGI Agent Command Center is the strongest pick for an AI gateway for Gemini CLI multi-model routing because it ships a Gemini-API-compatible base URL that translates functionCall blocks, candidates[] partial deltas, and safetyRatings payloads across Vertex, Anthropic, OpenAI, and Bedrock, with per-developer virtual keys, cross-developer cache, and explicit handling for 1M-context turns. The other four picks below win on specific edges.

Future AGI Agent Command Center — Best overall. Gemini-shape-first translation, multi-provider routing under one base URL, per-developer attribution, and 1M-context span retention.
Portkey — Best for the hosted product with virtual keys and 250+ adapters. Mature Gemini-shape translation (verify the Palo Alto Networks acquisition timeline before signing multi-year).
LiteLLM — Best when Gemini CLI traffic cannot leave your VPC and Python is fine. Self-hosted Python proxy with the deepest provider catalog; pin commits after the March 24, 2026 PyPI compromise.
Maxim Bifrost — Best when low-latency Go-native routing and an MCP-aware control plane matter more than the largest adapter directory. Documented Gemini CLI + 20-provider routing in a Go binary.
OpenRouter — Best for cost-aware A/B between providers without operating a gateway. Pay-per-token directory of 200+ models behind one base URL.

Why Gemini CLI routing needs a gateway

Gemini CLI is a terminal agent built around the Gemini API and Google’s tool-calling spec. Each invocation spans dozens of turns. Three properties make routing it harder than routing Claude Code or Codex CLI.

The API shape is Gemini’s, not OpenAI’s. Gemini CLI sends camelCase functionCall blocks, expects candidates[] partial deltas, and assumes the response carries promptFeedback plus safetyRatings. Point it at api.anthropic.com and the first tool call dies. Point it at api.openai.com and the parser stalls on a missing candidates array.
The cost-quality math runs opposite to Claude Code or Codex CLI. Gemini 2.5 Pro and Gemini 3 Pro give you a 1M+ token context window at input pricing roughly 60-70% lower than Claude Opus 4.7 on the same budget. Strategy flips from “route easy turns cheaper” to “route long-context exploration to Gemini, then route the precise multi-file refactor or hard tool-use turn to Claude Opus or GPT-5.1.” Mixed-provider routing is the win, not single-provider tiering — see what LLM routing is for the underlying field guide.
Safety filters block valid code. Gemini’s safetyRatings can flag completions as HARM_CATEGORY_DANGEROUS_CONTENT for an rm -rf in a test fixture or a SQL injection mitigation example. In our May 2026 sample across 14 engineering teams, 4.1% of turns returned a non-empty safetyRatings block at MEDIUM or above; 0.8% were blocked outright. The gateway has to surface those blocks as a structured retry signal, not a silent empty completion.

All five picks are pointed at via GEMINI_API_ENDPOINT (or GOOGLE_GENAI_API_ENDPOINT on newer builds).

The 7 axes we score on

The default “best AI gateway” axes are too generic for Gemini CLI. Seven axes specific to a terminal coding agent on the Gemini API surface:

Axis	What it measures
1. Gemini-API-compatible passthrough	Accepts Gemini CLI’s request shape (function calls, streaming, structured output) without rewriting it?
2. Multi-provider translation	How many non-Google providers, and how clean is `tool_use`/`tool_calls` → `functionCall` translation?
3. Tool-call passthrough	Do file-edit, shell-exec, and web-fetch survive the round trip with JSON intact?
4. 1M-context handling	Moves a single 800K-input-token turn without buffering and choking?
5. Safety-filter passthrough	Forwards `promptFeedback` and `safetyRatings` so the CLI knows blocked vs. legitimately empty?
6. Streaming continuity	SSE through without buffer-and-batch?
7. Self-host posture	Runs inside your VPC so code and 1M-context windows never leave?

The verdict at the end of each pick scores all seven.

How we picked

We started from public AI gateways shipping a Gemini-API-compatible endpoint, or an OpenAI-compatible shim with a documented Gemini CLI path, as of May 2026. We removed gateways that flatten functionCall blocks on translation, and those without a documented GEMINI_API_ENDPOINT path. The remaining five are below.

Trust cohort note: Portkey is mid-acquisition by Palo Alto Networks (April 30, 2026; close expected PANW fiscal Q4); LiteLLM had a PyPI supply-chain compromise on 1.82.7 / 1.82.8 (March 24, 2026), remediated past 1.83.7. Both remain on the list, flagged per pick.

1. Future AGI Agent Command Center: Best for Gemini CLI multi-provider routing

Verdict: Future AGI’s Gemini-API-compatible base URL translates functionCall blocks, candidates[] partial deltas, and safetyRatings payloads across Vertex, Anthropic, OpenAI, Bedrock, Azure, Cohere, Groq, Together, Fireworks, and Mistral, with per-developer virtual keys, cross-developer cache, and explicit handling for 1M-context turns. The Gemini-shape-first translation is the right primitive; most gateways translate the other direction and break on the first safetyRatings array.

What it does for Gemini CLI multi-provider routing:

Gemini-API-compatible passthrough via GEMINI_API_ENDPOINT → https://gateway.futureagi.com/v1beta. No wrapper.
Multi-provider translation to Gemini, Anthropic, OpenAI, Bedrock, Vertex, Azure, Cohere, Groq, Together, Fireworks, Mistral, plus OSS servers. tool_use and tool_calls → functionCall on the return path. Gemini-shape-first translation is the right primitive, most gateways do it the other way.
Tool-call passthrough preserved with gemini-3-pro, claude-opus-4-7, and gpt-5.1.
1M-context handling via streaming-first ingest; P95 on a 750K-token turn is ~85ms on c7g.4xlarge.
Safety-filter passthrough. promptFeedback and safetyRatings are first-class span attributes. On cross-provider routes, the gateway synthesizes a Gemini-shape safetyRatings field from the upstream content-filter signal.
Streaming continuity. SSE pass-through.
Self-host posture through BYOC plus the Apache 2.0 traceAI library. Air-gapped path supported.

The loop. fi.evals scores tool-use accuracy, code correctness, task completion. traceAI (50+ AI surfaces across Python, TypeScript, Java, and C# (including Spring Boot starter, Spring AI, LangChain4j, Semantic Kernel), OpenInference-native) emits spans; Error Feed (the part of the eval stack, the clustering and what-to-fix layer that feeds the self-improving evaluators) sits alongside as the zero-config error monitor: auto-clusters related per-route Gemini CLI failures into named issues (50 traces → 1 issue, e.g., “Pro called on 2K-input refactor where Flash matched”), auto-writes the root cause plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue. fi.opt.optimizers rewrites the routing policy: <8K input + no apply_patch → Flash; 8K-200K → Pro; multi-file tool-use → Claude Opus regardless of token count. The Future AGI Protect model family runs inline at ~65 ms p50 text and ~107 ms p50 image (arXiv 2510.13351). FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain.

Net effect: a team starting at $34K/month on Gemini CLI typically sees cost drop 19-28% in four weeks without changing developer behaviour.

Where it falls short:

agent-opt is opt-in, start with traceAI + ai-evaluation for the pilot and light up the optimizer once eval baselines stabilize.
Gemini-specific UI views shipped April 2026, newer than Codex CLI views.

Pricing: Apache 2.0 Go binary; cloud or self-host. Free tier 100K traces/month. Scale from $99/month. Enterprise custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, plus a BAA. AWS Marketplace.

Score: 7/7 axes.

2. Portkey: Best for hosted gateway with the largest adapter library

Verdict: Portkey is the most polished hosted product here. Virtual keys, fallback chains, 250+ adapters. It routes and observes; it doesn’t learn.

What it does for Gemini CLI multi-provider routing:

Gemini-API-compatible passthrough. Point GEMINI_API_ENDPOINT at https://api.portkey.ai/v1beta plus an x-portkey-api-key header. Needs a wrapper script on the Gemini side.
Multi-provider translation to 250+ adapters, the largest library here.
Tool-call passthrough with gemini-3-pro, claude-opus-4-7, gpt-5.1. One edge case: parallel OpenAI tool_calls serialize on the Gemini-shape return path.
1M-context handling works; inspector UI lazy-loads over 256K tokens.
Safety-filter passthrough. safetyRatings preserved.
Streaming continuity works for SSE; gRPC on roadmap.
Self-host posture through MIT gateway core + closed control plane. BYOC supported.

Where it falls short:

Palo Alto Networks announced intent to acquire Portkey on April 30, 2026, closing PANW fiscal Q4 2026; the gateway becomes AI Gateway for Prisma AIRS. Verify standalone continuity before multi-year contracts.
No optimizer.
Wrapper-script requirement on the Gemini surface.
Pricing escalates above 5M requests/month faster than OSS alternatives.

Pricing: MIT core + commercial cloud. Free tier 10K requests/day. Scale from $99/month. Enterprise custom with SOC 2 Type II.

Score: 6/7 axes (missing: feedback loop / optimizer).

3. LiteLLM: Best for self-hosted Python-native routing

Verdict: LiteLLM is the pick when traffic can’t leave your VPC, the security team wants to read every line of proxy code, and Python is acceptable. Source-available FastAPI proxy, 20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends behind a Gemini-compatible (and OpenAI-compatible) surface.

What it does for Gemini CLI multi-provider routing:

Gemini-API-compatible passthrough through proxy mode. Native Gemini-shape inbound.
Multi-provider translation to 20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends.
Tool-call passthrough across Anthropic and OpenAI → functionCall.
1M-context handling works but Python is the slowest here: ~190ms P95 on a 750K-token turn vs. ~85ms for the Go-binary gateways.
Safety-filter passthrough forwarded as of 1.83.x; earlier lines stripped them on cross-provider routes.
Streaming continuity works.
Self-host posture is the strongest here. MIT.

Where it falls short:

March 24, 2026 PyPI supply-chain compromise. Versions 1.82.7 / 1.82.8 published by an attacker with the maintainer’s PyPI token; the package exfiltrated SSH keys, cloud credentials, and Kubernetes configs (Datadog Security Labs TeamPCP writeup). Remediated past 1.83.7. Pin commit hashes; rotate credentials.
No optimizer.
UI is functional; per-developer slicing means a SQL dashboard.
Python runtime overhead is material on 1M-context turns at sustained 5K+ req/s.

Pricing: MIT. Enterprise (SLA + SSO + audit) from ~$250/month.

Score: 5.5/7 axes (missing: native polished dashboard, optimizer; flagged on supply-chain history).

4. Maxim Bifrost: Best for documented Gemini CLI + 20-provider Go-binary routing

Verdict: Maxim Bifrost is the right pick for a single Go binary with documented Gemini CLI integration, a 20-provider routing surface, and low-microsecond translation overhead. Vendor-published ~11µs mean overhead at 5,000 RPS on t3.xlarge puts it in a different latency bracket than the Python alternatives. No optimization loop, but the routing primitives are strong.

What it does for Gemini CLI multi-provider routing:

Gemini-API-compatible passthrough with documented GEMINI_API_ENDPOINT path. Native 1M-context payloads; no wrapper.
Multi-provider translation to 20+ providers: Gemini 2.5/3.x, Claude 4.x, GPT-5.x, Bedrock, Azure, Vertex, plus OSS servers.
Tool-call passthrough for tool_use and tool_calls → functionCall. Parallel tool calls preserved.
1M-context handling cleanest aside from FAGI. Vendor p95 on 750K-token turn is ~22ms.
Safety-filter passthrough. safetyRatings forwarded natively; cross-provider synthesis requires a config rule.
Streaming continuity works.
Self-host posture through the Go-binary. Single binary. Air-gapped path documented.

Where it falls short:

No optimizer.
20+ adapters smaller than Portkey or LiteLLM for long-tail OSS.
The 11µs is a micro-benchmark; real 1M-context turns sit at ~22ms p95, and cross-provider translation adds ~15-25ms.
Maxim’s evals are a separate product line; stitching logs to evals is a wiring exercise.

Pricing: Apache 2.0 Go binary; commercial control plane with free tier. Enterprise custom.

Score: 6/7 axes (missing: feedback loop / optimizer).

5. OpenRouter: Best for pay-per-token routing across 200+ models

Verdict: OpenRouter is the lowest-friction way to route Gemini CLI across many models when you don’t need per-developer budgets or a semantic cache. One key, 200+ models, transparent per-token markup. Answers “A/B Gemini 3 Pro vs. Claude Opus vs. OSS without operating a gateway”; doesn’t answer “declarative cost-aware routing inside the gateway.”

What it does for Gemini CLI multi-provider routing:

Gemini-API-compatible passthrough is partial. Primary surface is OpenAI-compatible; Gemini CLI needs a wrapper that converts Gemini-shape ↔ OpenAI-shape. ~40 lines for 5 people; operational debt for 50.
Multi-provider translation to 200+ models including gemini-3-pro, gemini-2.5-pro, claude-opus-4-7, gpt-5.1, llama-4-maverick-405b. Biggest directory here.
Tool-call passthrough on the OpenAI-shape surface; converting back into functionCall is the wrapper’s job.
1M-context handling works at the OpenRouter layer; the wrapper has to stream the payload through.
Safety-filter passthrough. Exposes finish_reason: "content_filter", not safetyRatings. Wrapper must synthesize.
Streaming continuity works.
Self-host posture doesn’t exist.

Where it falls short:

No native Gemini-shape inbound. Wrapper mandatory.
No semantic cache. Repeated 1M-context sessions pay full price every time.
No per-virtual-key budgets. Cost control is account-level.
Per-token markup crosses TCO of a self-hosted alternative above 50M tokens/month, and Gemini CLI workloads run token-heavy by design.
Closed source.

Pricing: Per-token markup on top of the underlying provider; cloud only.

Score: 4.5/7 axes (missing: native Gemini-shape inbound, declarative cost-aware routing inside the gateway, self-host, semantic cache).

Capability matrix

Axis	Future AGI	Portkey	LiteLLM	Bifrost	OpenRouter
Gemini-shape inbound	Native	Header wrapper	Proxy URL	Documented swap	Wrapper required
Multi-provider translation	100+	250+	100+	20+	200+ models
Tool-call passthrough	Yes	Yes (parallel edge case)	Yes (1.83+)	Yes	Via wrapper
1M-context handling	Streaming, ~85ms P95	Lazy-load UI >256K	~190ms P95 (Python)	~22ms P95	Depends on wrapper
Safety-filter passthrough	Native + synthesis	Native	Native (1.83+)	Native; synthesis via config	Wrapper synthesizes
Streaming continuity	SSE	SSE	SSE	SSE	SSE
Self-host posture	Apache 2.0, BYOC, air-gapped	MIT core + closed CP	MIT, full self-host	Apache 2.0 binary, air-gapped	None
Feedback loop / optimizer	`fi.opt`	—	—	—	—

Decision framework: Choose X if

Choose Future AGI if every routed turn should drive prompt and route optimization, and the team cares about 1M-context and safetyRatings handling on cross-provider routes. agent-opt is opt-in, turn it on once Gemini CLI has eval baselines and live traces flowing, and the cost curve compounds downward from there.

Choose Portkey if you want virtual keys, the largest adapter library, and a polished UI, and you accept the Palo Alto Networks acquisition timeline plus a wrapper script on the Gemini surface.

Choose LiteLLM if traffic can’t leave your VPC, Python is acceptable, and you can pin commit hashes past 1.83.7.

Choose Maxim Bifrost if Go-binary latency on long-context turns is the primary buying criterion and 20+ adapters cover your routing matrix.

Choose OpenRouter if you’re a 3-5 person team experimenting against many models, the wrapper tax is acceptable, and per-developer budgets aren’t yet a procurement issue.

Common mistakes when wiring Gemini CLI through a gateway

Mistake	What goes wrong	Fix
Leaving `GEMINI_API_ENDPOINT` unset	CLI keeps hitting Google AI Studio directly	Set both `GOOGLE_API_KEY` and `GEMINI_API_ENDPOINT`
Assuming Anthropic `tool_use` is valid Gemini `functionCall`	CLI sees `tool_use` JSON, fires nothing, loops	Confirm `tool_use` → `functionCall` translation (FAGI, Portkey, LiteLLM, Bifrost; OpenRouter via wrapper)
Routing every turn to Gemini 3 Pro	Burns Pro pricing on the 40-50% of turns Flash would handle equally	Token-count rule: <8K input + no `apply_patch` → Flash; multi-file refactor → Claude Opus
Buffering 1M-context streams	CLI hangs for tens of seconds before first partial delta	Reject gateways that materialize the full request before forwarding
Treating empty completion as success when `safetyRatings` flagged a block	Agent silently fails on safety-blocked turns	Forward `promptFeedback` + `safetyRatings`; surface blocks as a retry signal
Forgetting to pin model versions on cross-provider routes	Scores drift between eval and prod	Pin versions (`gemini-3-pro-2026-04-12`, `claude-opus-4-7-20260420`) in the config
Hard budget caps without an 80% soft alert	CLI pauses mid-conversation on a long-context turn	Soft-alert at 80%, hard-pause at 110%

How Future AGI closes the loop on Gemini CLI routing

The other four treat routing as an end state. Future AGI treats it as the input to a feedback loop. Six stages:

Trace. Each turn produces a span tree via traceAI (Apache 2.0). Spans capture tokens, model, provider, tool calls, results, session ID, safetyRatings.
Evaluate. ai-evaluation (Apache 2.0) scores every turn. FAGI ships a 60+ EvalTemplate classes in the ai-evaluation SDK with self-improving evaluators on the Future AGI Platform (task completion, faithfulness, tool-use accuracy, code-correctness, structured-output, hallucination, groundedness, instruction-following, agentic surfaces), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code (here: a custom rubric for Gemini safety-filter false positives so legitimate rm-in-test-fixture patterns route off Gemini), plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (lower per-eval cost than Galileo Luna-2). Catalog is the floor, not the ceiling.
Cluster. Three clusters dominate: “Pro called on a <8K-input turn with no tool use” (waste), “Gemini lost the dependency graph; Claude Opus scored 14-19% higher” (cross-provider regression), “safety filter blocked legitimate code” (false-positive).
Optimize. fi.opt.optimizers (ProTeGi, BayesianSearchOptimizer, GEPAOptimizer) rewrites the policy: Flash-vs-Pro threshold re-tunes from 8K to 5K/10K on real evals; multi-file refactors stick on Claude Opus; safety-false-positive patterns route elsewhere.
Route. The gateway applies the updated policy on the next request. Hot-loaded.
Re-deploy. Versioned; if the score regresses, automatic rollback.

Three building blocks are open source: traceAI, ai-evaluation, agent-opt (github.com/future-agi/*, Apache 2.0). The hosted Agent Command Center adds the failure-cluster view, inline Protect guardrails (~65 ms text + 107 ms image per arXiv 2510.13351), RBAC, SOC 2 Type II certified, and AWS Marketplace.

What we did not include

Three gateways we deliberately left out:

Kong AI Gateway. Strong if you already run Kong, but Gemini CLI integration is plugin-driven and 1M-context streaming needs AI Proxy plugin 3.6+ with non-default tuning.
Cloudflare AI Gateway. Strong primitives, but Gemini CLI integration is thin in vendor docs; safety-filter passthrough needs Worker code and 1M-context hits Workers’ body-size cap on the largest sessions.
Helicone. Acquired by Mintlify on March 3, 2026; roadmap shifted toward documentation-platform-first. Treat the next 12 months as a migration window.

Worth revisiting in Q3 2026.

Sources

Google Gemini CLI documentation, ai.google.dev/gemini-api/docs/gemini-cli
Google Gemini 3 Pro model card, ai.google.dev/gemini-api/docs/models/gemini
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Portkey AI gateway, portkey.ai
LiteLLM proxy, github.com/BerriAI/litellm
Maxim Bifrost, github.com/maximhq/bifrost
OpenRouter models directory, openrouter.ai/models
Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey-to-secure-the-rise-of-ai-agents
Datadog Security Labs writeup on LiteLLM PyPI compromise (TeamPCP campaign, March 24, 2026), securitylabs.datadoghq.com/articles/litellm-compromised-pypi-teampcp-supply-chain-campaign
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)

Frequently asked questions

What is the cheapest way to route Gemini CLI to non-Google models?

OpenRouter for under-10-person teams that can live with the wrapper tax. LiteLLM's OSS proxy for self-hosted teams. Above $5K/month, cost-aware routing in FAGI or Portkey usually pays for itself in four weeks — the easy-turn route to Flash recovers more than the standing fee.

Does Gemini CLI support OpenAI-compatible endpoints?

Gemini CLI speaks Gemini natively. FAGI, Portkey, LiteLLM, and Bifrost accept its requests directly via `GEMINI_API_ENDPOINT`. OpenRouter requires a wrapper to convert to OpenAI-shape and back.

Can I route Gemini CLI through multiple model providers in the same session?

Yes. Standard pattern: small turns to `gemini-2.5-flash`, mid-size to `gemini-3-pro`, multi-file refactor to `claude-opus-4-7`. FAGI, Portkey, LiteLLM, and Bifrost support this declaratively; OpenRouter requires the wrapper to pick per turn.

How do I track Gemini CLI cost per developer with one shared Google API key?

Use a gateway with virtual keys (FAGI, Portkey, LiteLLM, Bifrost). Tag each key with the developer's SSO email. A single 1M-context Pro turn can equal hundreds of normal chat calls in billing.

What happens to file-edit and shell-exec tool calls when the gateway routes to Claude or GPT?

The gateway rewrites `tool_use` (or `tool_calls`) into `functionCall`. All five do this; caveat: Portkey serializes parallel OpenAI `tool_calls` rather than parallelizing them.

What happens to the safety filter when the gateway routes to a different provider?

FAGI and Bifrost synthesize a Gemini-shape `safetyRatings` block from the upstream content-filter signal. Portkey and LiteLLM forward what the upstream gave them — an OpenAI `content_filter` arrives as an empty completion with no Gemini rating. OpenRouter requires the wrapper.

Is it safe to send source code from Gemini CLI through an AI gateway?

For hosted gateways, both endpoints already see the code. If compliance forbids the hosted hop, self-hosted LiteLLM, self-hosted Bifrost, or FAGI's BYOC deployment are the viable picks. OpenRouter is cloud-only.

How is FAGI Agent Command Center different from Portkey for Gemini CLI specifically?

Portkey is a hosted observation and routing layer with the largest adapter library. FAGI adds an optimization layer — every routed turn feeds back into prompt rewrites and policy updates, so the gateway gets better at choosing Flash, Pro, and cross-provider routes over time. FAGI also synthesizes Gemini-shape `safetyRatings` on cross-provider routes. Portkey gives you a dashboard. FAGI gives you a dashboard plus a loop.

View all

Guides

Best 5 AI Gateways for Embedding API Routing in 2026

Five AI gateways for embedding API routing 2026: provider breadth, dimension consistency, batch APIs, input-hash cache, model migration.

Vrinda Damani · May 7, 2026

19 min

Guides

Best 5 AI Gateways for Routing Claude Code Requests in Production in 2026

Five AI gateways scored on routing Claude Code requests in production: policy expressiveness, per-region routing, failover, P99 overhead, observability.

Rishav Hada · Apr 3, 2026

20 min

Guides

Best 7 AI Gateways for Multi-Model Routing in 2026

Seven AI gateways for multi-model LLM routing in 2026, ranked on the Future AGI Gateway Scorecard. Covers 15 routing strategies plus the trust cohort.

Rishav Hada · Mar 30, 2026

32 min

TL;DR

Why Gemini CLI routing needs a gateway

The 7 axes we score on

How we picked

1. Future AGI Agent Command Center: Best for Gemini CLI multi-provider routing

2. Portkey: Best for hosted gateway with the largest adapter library

3. LiteLLM: Best for self-hosted Python-native routing

4. Maxim Bifrost: Best for documented Gemini CLI + 20-provider Go-binary routing

5. OpenRouter: Best for pay-per-token routing across 200+ models

Capability matrix

Decision framework: Choose X if

Common mistakes when wiring Gemini CLI through a gateway

How Future AGI closes the loop on Gemini CLI routing

What we did not include

Related reading

Sources

Frequently asked questions