Guides

Best 5 AI Gateways to Route Codex CLI to Any Model in 2026

Q: What is the cheapest way to route Codex CLI to non-OpenAI models?

OpenRouter for under-10-person teams (no operational cost, per-token markup). LiteLLM's open-source proxy for self-hosted teams. For teams over $5K/month on Codex CLI, cost-aware routing in Future AGI or Portkey usually pays for itself within four weeks — the easy-turn route to a cheaper model recovers more than the gateway's standing fee.

Q: Does Codex CLI support OpenAI-compatible endpoints other than OpenAI's?

Yes, via `OPENAI_BASE_URL` (or `OPENAI_API_BASE` on older builds). Codex CLI does not care what is on the other side of the URL as long as it speaks the OpenAI Responses API and returns OpenAI-shaped tool calls. All five gateways above translate to that shape on the way back.

Q: Can I route Codex CLI through multiple model providers in the same session?

Yes. The standard pattern is a routing rule keyed on input-token count or tool-call presence, so easy turns go to a cheaper model and hard turns go to the flagship. Future AGI, Portkey, and LiteLLM support this declaratively. Cloudflare requires Worker code; OpenRouter requires the calling app to pick a model per turn.

Q: How do I track Codex CLI cost per developer when everyone shares one OpenAI key?

Use a gateway with virtual keys (Future AGI, Portkey, LiteLLM). Each developer gets a virtual key that fans out to the team's underlying provider keys, preserving bulk pricing. Tag the virtual key with the developer's SSO email and the gateway dashboard groups cost by developer.

Q: What happens to Codex CLI's `bash` and `apply_patch` tool calls when the gateway routes to Claude or Gemini?

The gateway rewrites Anthropic's `tool_use` (or Gemini's `function_call`) back into OpenAI's `tool_calls` shape on the response. All five gateways above do this correctly as of May 2026. Older versions of some proxies flattened tool calls into text, silently breaking Codex CLI — confirm the gateway's tool-call test matrix before adopting.

Q: Is it safe to send source code from Codex CLI through an AI gateway?

For hosted gateways, the data flow is gateway → provider; both endpoints already see the code. If compliance forbids the hosted hop, the safe pick is self-hosted LiteLLM or Future AGI's BYOC deployment, with provider traffic egressing through your own network. Cloudflare AI Gateway and OpenRouter are cloud-only.

Q: How is Future AGI Agent Command Center different from Portkey for Codex CLI specifically?

Portkey is a hosted observation and routing layer. Future AGI adds an optimization layer on top — every routed Codex CLI turn feeds back into prompt rewrites and routing-policy updates, so the gateway gets better at choosing the cheaper model for easy turns and the stronger model for hard turns over time. Portkey gives you a dashboard. Future AGI gives you a dashboard plus a loop.

Five AI gateways scored on Codex CLI multi-provider routing in 2026: OpenAI-compatible passthrough, tool-call fidelity, cost-aware routing, latency.

January 5, 2026

19 min read

ai-gateway 2026 codex-cli llm-routing

Table of Contents

Codex CLI is OpenAI’s terminal coding agent. It reads OPENAI_API_KEY, talks to api.openai.com, and assumes every model on the other side speaks the OpenAI chat-completions and Responses API. Point it at a Claude or Gemini endpoint directly and the tool calls collapse, the function-call JSON, the SSE format, and the response_format semantics all drift the moment you leave OpenAI’s surface.

A gateway in front of Codex CLI fixes this. It accepts the OpenAI-shaped request, translates per provider, preserves the bash and apply_patch tool calls across the translation hop, and streams a response the CLI can render. The five gateways below all do that. Only one turns the same routed traffic into a feedback loop that gets cheaper and more accurate every week.

This is the 2026 cohort, scored on the seven routing axes that matter when Codex CLI is the workload.

TL;DR

Future AGI Agent Command Center is the strongest pick for an AI gateway for Codex CLI multi-provider routing because it ships an OPENAI_BASE_URL swap that exposes OpenAI, Anthropic, Bedrock, Vertex, Together, Groq, and Fireworks behind one Responses-API endpoint with parallel tool-call translation preserved, per-developer virtual keys, cross-developer cache, and OTel-native cost telemetry. The other four picks below win on specific edges.

Future AGI Agent Command Center — Best overall. Multi-provider Responses-API translation, per-developer attribution, cross-developer cache, and ~18 ms p95 same-provider routing overhead.
Portkey — Best for the hosted product with virtual keys and 250+ adapters. Fastest hosted setup with the broadest adapter library (verify the Palo Alto Networks acquisition timeline before signing multi-year).
LiteLLM — Best when Codex CLI traffic cannot leave your VPC and Python is fine. Self-hosted Python proxy with the deepest provider catalog; pin commits after the March 24, 2026 PyPI compromise.
OpenRouter — Best for cost-aware A/B between providers without operating a gateway. Pay-per-token directory of 200+ models behind one base URL.
Cloudflare AI Gateway — Best if Codex CLI runs from many regions and you want POPs near your developers. Edge-deployed cache + retry layer on Cloudflare’s network.

Why Codex CLI routing needs a gateway

Codex CLI is a terminal agent built around the OpenAI Responses API and tool calling. Each invocation spans dozens of turns, with bash, apply_patch, file-read, and shell-exec dispatched on most of them. Three properties make routing it painful.

It’s hard-wired to OpenAI’s API shape. Codex CLI uses OPENAI_API_KEY, sends OpenAI tool-call JSON, and expects OpenAI’s streaming format. Pointing it at api.anthropic.com returns a 401 the moment the first tool call goes out. A gateway has to translate the API surface, not the model alone.
The cost-quality mismatch is huge per turn. In our internal usage data across 18 engineering teams in Q1 2026, 62% of Codex CLI turns had input contexts under 8K tokens, easy turns where gpt-5.1-mini or claude-haiku-4-5 does the job. The remaining 38% are multi-file refactors where you want gpt-5.1 or claude-opus-4-7. Routing every turn to the flagship wastes $14K-$22K per month per team.
Tool calls break on lazy gateways. Codex CLI’s bash tool sends tool_calls in the OpenAI function-call format. A gateway that re-serializes that block as text silently breaks the agent, the CLI sees a string where it expected a structured call, fires nothing, loops. Tool-call passthrough is the single feature that decides whether the gateway is usable for Codex CLI at all.

All five picks below are pointed at via OPENAI_BASE_URL.

The 7 axes we score on

The default “best AI gateway” axes (provider breadth, routing, fallback, observability) are too generic for Codex CLI. We scored each pick on seven axes specific to terminal coding agents pointed at multi-provider routing.

Axis	What it measures
1. OpenAI-compatible passthrough fidelity	Does the gateway accept Codex CLI’s exact request shape (Responses API, tool calls, streaming) without rewriting it?
2. Multi-provider translation	How many non-OpenAI providers does it speak natively, and how clean is the tool-call translation?
3. Tool-call passthrough	Do `bash`, `apply_patch`, and file-edit tools survive the round trip with their JSON shape intact?
4. Cost-aware turn routing	Can it route easy turns to a cheaper model and hard turns to a flagship, without per-call Python?
5. Streaming continuity	Does SSE pass through without buffer-and-batch, so the CLI’s progress UI stays smooth?
6. Latency overhead per turn	How much extra ms per turn does the translation hop add at P95?
7. Self-host posture	Can the gateway run inside your VPC so code never leaves the perimeter?

The verdict line at the end of each pick scores all seven.

How we picked

We started from the universe of public AI gateways that ship an OpenAI-compatible endpoint as of May 2026. We removed gateways that don’t preserve OpenAI function-call JSON on translation (this excluded two early proxies that flattened tool calls into text). We removed gateways without an OPENAI_BASE_URL configuration path. The remaining five are below.

A note on the 2026 trust cohort: Portkey is mid-acquisition by Palo Alto Networks (announced April 30, 2026, close expected in PANW fiscal Q4); LiteLLM had a PyPI supply-chain compromise on 1.82.7 / 1.82.8 (March 24, 2026), remediated past 1.83.7. Both are still in the list, both still in production at large teams. But the procurement story now includes acquisition independence and dependency pinning. Flagged per pick.

1. Future AGI Agent Command Center: Best for multi-provider routing under one OpenAI-compatible base URL

Verdict: Future AGI’s gateway exposes OpenAI, Anthropic, Bedrock, Vertex, Together, Groq, and Fireworks behind one OpenAI-compatible Responses-API endpoint, with parallel tool-call translation preserved across providers. Per-developer virtual keys, cross-developer cache, and OpenTelemetry-native cost telemetry sit on top so finance gets the per-developer chargeback table directly out of the dashboard.

What it does for Codex CLI multi-provider routing:

OpenAI-compatible passthrough via a base_url swap to https://gateway.futureagi.com/v1. Codex CLI sees its native Responses API surface; no SDK or wrapper changes.
Multi-provider translation to OpenAI, Anthropic, Gemini, Bedrock, Azure, Cohere, Groq, Together, Fireworks, Mistral, plus OpenAI-compatible OSS servers (Ollama, vLLM, LM Studio). Anthropic tool_use blocks get rewritten to OpenAI tool_calls on the way back.
Tool-call passthrough preserved, the gateway parses each provider’s native tool-use block and re-emits it in OpenAI’s exact JSON. bash, apply_patch, shell, and file-edit tools survive intact as of May 2026 testing with gpt-5.1, claude-opus-4-7, and gemini-2.5-pro.
Cost-aware turn routing through a declarative config: under 10K input tokens → claude-haiku-4-5 or gpt-5.1-mini; over → claude-opus-4-7 or gpt-5.1. Configured once, applied on every turn, no per-call Python.
Streaming continuity. SSE pass-through, not buffer-and-batch.
Latency overhead averages ~18ms P95 same-provider and ~42ms P95 cross-provider in our internal load tests.
Self-host posture through BYOC deployment plus the Apache 2.0 traceAI library. Air-gapped path supported.

The loop. Every Codex CLI turn produces a span tree via traceAI (50+ AI surfaces across Python, TypeScript, Java, and C# (including Spring Boot starter, Spring AI, LangChain4j, Semantic Kernel), OpenInference-native). fi.evals scores each turn on tool-use accuracy, code correctness, and task completion. Error Feed (the part of the eval stack, the clustering and what-to-fix layer that feeds the self-improving evaluators) sits alongside as the zero-config error monitor: auto-clusters related low-scoring turns into named issues (50 traces → 1 issue, e.g., “Opus called on turns under 8K input”), auto-writes the root cause from the span evidence plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue. fi.opt.optimizers (ProTeGi, BayesianSearchOptimizer, GEPAOptimizer) rewrites the routing policy against the clustered failures. Next deploy uses the updated route. The Future AGI Protect model family runs on the same hop at ~65 ms p50 text and ~107 ms p50 image (arXiv 2510.13351). FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain.

Net effect: a team starting at $28K/month on Codex CLI typically sees cost drop 22-34% within four weeks without changing developer behaviour, because the router gets better at choosing the cheaper model for easy turns and the optimizer stops over-prompting on the long ones.

Where it falls short:

agent-opt is opt-in, for one-week pilots with small teams running OpenAI-compatible fanout, start with traceAI + ai-evaluation and turn the optimizer on once eval baselines stabilize.
The control-plane UI for per-developer Codex CLI usage is newer than Portkey’s. If a polished out-of-the-box dashboard is the primary buying criterion, Portkey has the head start.

Pricing: Apache 2.0 single Go binary; cloud at gateway.futureagi.com/v1 or self-host. Free tier with 100K traces/month. Scale tier from $99/month. Enterprise custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, plus a BAA. AWS Marketplace listing.

Score: 7/7 axes.

2. Portkey: Best for hosted gateway with the largest adapter library

Verdict: Portkey is the most polished hosted product in this category for Codex CLI routing. If your team is using Codex CLI through a shared workspace and wants virtual keys, fallback chains, and 250+ provider adapters out of the box, Portkey is the fastest path. It doesn’t learn from the routed traffic; it routes and observes.

What it does for Codex CLI multi-provider routing:

OpenAI-compatible passthrough through Portkey’s universal API. Set OPENAI_BASE_URL=https://api.portkey.ai/v1 plus an x-portkey-api-key header.
Multi-provider translation to 250+ adapters, the largest library on this list.
Tool-call passthrough confirmed working as of May 2026 with gpt-5.1, claude-opus-4-7, and gemini-2.5-pro.
Cost-aware turn routing through YAML configs (conditions on token count, model, metadata). Easy turns route to gpt-5.1-mini, hard turns to claude-opus-4-7.
Streaming continuity works for SSE; gRPC pass-through is on the roadmap.
Latency overhead averages ~25ms P95 same-provider and ~55ms P95 cross-provider.
Self-host posture through Portkey’s open-source gateway core (MIT) plus a closed control plane. BYOC supported.

Where it falls short:

Palo Alto Networks announced intent to acquire Portkey on April 30, 2026; the deal is expected to close in PANW fiscal Q4 2026, with the gateway becoming the AI Gateway for Prisma AIRS. Verify standalone-product continuity before signing a multi-year contract.
No optimizer. The routed traces inform humans through the dashboard, not the gateway.
The x-portkey-api-key header workflow means Codex CLI needs a wrapper script. Not hard, but one more thing in ~/.zshrc.
Pricing escalates above 5M requests/month faster than the open-source alternatives.

Pricing: Open-source core (MIT) + commercial cloud control plane. Free tier with 10K requests/day. Scale from $99/month. Enterprise custom with SOC 2 Type II.

Score: 6/7 axes (missing: feedback loop / optimizer).

3. LiteLLM: Best for self-hosted Python-native routing

Verdict: LiteLLM is the pick when Codex CLI traffic can’t leave your VPC, the security team wants to read every line of code that touches a prompt, and Python is an acceptable runtime. Source-available, runs as a FastAPI proxy inside your infra, speaks 20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends behind an OpenAI-compatible surface.

What it does for Codex CLI multi-provider routing:

OpenAI-compatible passthrough through LiteLLM’s proxy mode. Point OPENAI_BASE_URL at the proxy.
Multi-provider translation to 20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends via the LiteLLM router. Anthropic, Gemini, Bedrock, Azure, Cohere, Groq, Together, Fireworks, plus OSS endpoints.
Tool-call passthrough confirmed across Anthropic and Gemini. Gemini’s function_call shape has needed two LiteLLM release fixes historically; the May 2026 line handles all three flagship targets cleanly.
Cost-aware turn routing through LiteLLM’s router policies, model_group with primary/fallback chains, strategy simple-shuffle, least-busy, or usage-based-routing-v2. Token-count-aware routing requires a custom pre-call hook in Python.
Streaming continuity works.
Latency overhead averages ~35ms P95 same-provider, ~70ms P95 cross-provider. Python runtime overhead vs. Go-binary gateways.
Self-host posture is the strongest in this list. MIT, runs on your nodes, no telemetry leaves the VPC.

Where it falls short:

March 24, 2026 PyPI supply-chain compromise. Versions 1.82.7 and 1.82.8 were published by an attacker who had taken over the maintainer’s PyPI token; the package exfiltrated SSH keys, cloud credentials, and Kubernetes configs. Datadog Security Labs documents the TeamPCP campaign. Remediated past 1.83.7. If you adopt LiteLLM, pin commit hashes or version-lock past 1.83.7 and rotate any credentials touched by affected installs.
No optimizer. Traces go to your OTel sink; routing improvements are a human exercise.
The UI is functional. Per-developer or per-repo slicing means a SQL dashboard, not a polished view.
Python runtime is materially slower under high concurrency than Go-binary alternatives. Teams over ~10K req/s usually pair LiteLLM with caching or move on.

Pricing: Open source under MIT (enterprise dir licensed separately). Enterprise tier with SLA + SSO + audit starts ~$250/month for small teams.

Score: 5.5/7 axes (missing: native polished dashboard, optimizer; flagged on supply-chain history).

4. OpenRouter: Best for pay-per-token routing across 200+ models

Verdict: OpenRouter is the lowest-friction way to route Codex CLI across many models. One API key, one base URL, 200+ models, transparent per-token markup. It answers “I want to A/B-test Claude vs. Gemini vs. an OSS model without operating a gateway.” It doesn’t answer “I want per-developer budgets and a semantic cache.”

What it does for Codex CLI multi-provider routing:

OpenAI-compatible passthrough through https://openrouter.ai/api/v1. Set OPENAI_BASE_URL and OPENAI_API_KEY.
Multi-provider translation to 200+ models including gpt-5.1, claude-opus-4-7, gemini-2.5-pro, llama-4-maverick-405b, deepseek-v4, qwen-3-235b. Biggest directory of any pick here.
Tool-call passthrough works for major providers; OpenRouter’s tool-call shape matches OpenAI’s exactly for Anthropic and Gemini-served models.
Cost-aware turn routing is caller-side, not gateway-side. You pick a model per-call; no token-count-aware config inside OpenRouter. For Codex CLI that means a wrapper script.
Streaming continuity works.
Latency overhead averages ~22ms P95.
Self-host posture doesn’t exist. OpenRouter is cloud-only.

Where it falls short:

No semantic cache, no exact cache at the gateway layer. Repeated Codex CLI sessions on the same repo pay full price every time.
No per-virtual-key budget enforcement. Cost control is a billing limit on the account, not a per-developer cap.
Per-token markup means the gateway is a recurring line item that crosses the TCO of a self-hosted alternative at modest scale. Small and transparent, but at 50M+ tokens/month it adds up.
Closed source. You’re betting on OpenRouter’s uptime, roadmap, and pricing stability.

Pricing: Per-token markup on top of the underlying provider’s rates; cloud only. No standing fee.

Score: 5/7 axes (missing: cost-aware routing inside the gateway, self-host, semantic cache).

5. Cloudflare AI Gateway: Best for edge-deployed routing across regions

Verdict: Cloudflare AI Gateway is the pick when Codex CLI runs from many regions, you want POPs near every developer, and the engineering team is already on Cloudflare. Strengths are edge presence, cache, and analytics. Weaknesses are AI-specific shallowness, most cost-aware routing logic has to be written as Worker code, not declared.

What it does for Codex CLI multi-provider routing:

OpenAI-compatible passthrough through Cloudflare AI Gateway’s universal endpoint. Set OPENAI_BASE_URL=https://gateway.ai.cloudflare.com/v1/<account-id>/<gateway-id>/openai.
Multi-provider translation to OpenAI, Anthropic, Google AI Studio, Azure, Workers AI, HuggingFace, Replicate, Mistral. Smaller than Portkey but covers the practical Codex CLI targets.
Tool-call passthrough confirmed for OpenAI, Anthropic, and Gemini as of May 2026.
Cost-aware turn routing through Workers. Routing-by-token-count lives in a Worker script that wraps the AI Gateway call; not a declarative config.
Streaming continuity works; Cloudflare’s edge SSE handling is solid.
Latency overhead is the lowest here at ~8-14ms P95 because the gateway runs at the POP closest to the caller. For a Codex CLI session in Singapore hitting Anthropic’s US-East endpoint, the Cloudflare hop is essentially free.
Self-host posture doesn’t exist. Cloud-only.

Where it falls short:

AI-specific observability is plugin-driven, not native. The default analytics view is requests, status codes, cache hit rate. Per-developer cost slicing requires log forwarding to your own warehouse and a dashboard you build yourself.
No optimizer.
No native virtual-key system for per-developer chargeback. You build it with Cloudflare Access + Worker code.
The Codex CLI integration story is thin in vendor docs. The “use with OpenAI SDK” example exists; the Codex-CLI-specific one doesn’t. You wire it yourself.

Pricing: Free tier with up to 100K requests/month per gateway. Workers AI Inference billed per request; pass-through to other providers billed at the provider’s rate. Workers Paid plan ($5/month) for higher limits.

Score: 5/7 axes (missing: declarative cost-aware routing, native per-developer chargeback, optimizer).

Capability matrix

Axis	Future AGI	Portkey	LiteLLM	OpenRouter	Cloudflare AI Gateway
OpenAI-compatible passthrough	Yes (base_url swap)	Yes (header wrapper)	Yes (proxy URL)	Yes (base_url swap)	Yes (base_url swap)
Multi-provider translation	20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends	250+ adapters	20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends	200+ models	~10 providers + Workers AI
Tool-call passthrough	Yes (gpt-5.1, claude-opus-4-7, gemini-2.5-pro)	Yes (same set)	Yes (May 2026 release line)	Yes (major providers)	Yes (OpenAI, Anthropic, Gemini)
Cost-aware turn routing	Declarative config	YAML config	Python hook	Caller-side	Worker code
Streaming continuity	SSE pass-through	SSE pass-through	SSE pass-through	SSE pass-through	SSE pass-through
Latency overhead per turn (P95)	~18ms / ~42ms	~25ms / ~55ms	~35ms / ~70ms	~22ms	~8-14ms
Self-host posture	Apache 2.0, BYOC, air-gapped	MIT core + closed CP	MIT, full self-host	None	None
Feedback loop / optimizer	Yes (`fi.opt`)	No	No	No	No

Decision framework: Choose X if

Choose Future AGI if you want the gateway to do more than route, if every routed turn should drive prompt and route optimization over time. Pick this when Codex CLI is a significant line item ($10K+/month) and the cost curve should bend downward instead of staying flat. Also when OpenTelemetry-native cost telemetry into your existing Grafana matters more than a dashboard out of the box.

Choose Portkey if you want a hosted gateway with virtual keys, the largest adapter library, and a polished UI, and you’re comfortable with the Palo Alto Networks acquisition timeline.

Choose LiteLLM if your security or compliance team requires Codex CLI traffic to never leave the VPC, Python is acceptable as a runtime, and you can pin commit hashes (or upgrade past 1.83.7) and run a FastAPI proxy.

Choose OpenRouter if you’re a solo developer or a 3-5 person team experimenting with Codex CLI against many models and per-developer budgets aren’t yet a procurement issue.

Choose Cloudflare AI Gateway if Codex CLI runs from many regions, latency-to-edge matters more than declarative routing config, and the platform team already runs Cloudflare for the rest of the stack.

Common mistakes when wiring Codex CLI through a gateway

Mistake	What goes wrong	Fix
Pointing only `OPENAI_API_KEY` at the gateway, leaving `OPENAI_BASE_URL` unset	Codex CLI keeps hitting `api.openai.com` directly	Set both `OPENAI_API_KEY` and `OPENAI_BASE_URL` in the shell profile
Assuming Anthropic tool-use blocks are valid OpenAI tool calls	The CLI sees `tool_use` JSON where it expected `tool_calls`, fires nothing, loops	Confirm the gateway translates `tool_use` → `tool_calls` (all five above do)
Routing every turn to the flagship model	Burns 2.5-4x more tokens than necessary on the 60%+ of easy turns	Add a token-count rule: under 10K input → cheaper model; over → flagship
Forgetting to pin the model version	The gateway routes to a model that updated between your eval run and prod	Pin model versions explicitly (`gpt-5.1-2026-04-15`, `claude-opus-4-7-20260420`) in the gateway config
Buffering streaming responses through the gateway	Codex CLI’s progress UI freezes mid-turn; developer thinks the agent hung	Confirm the gateway forwards SSE byte-stream, not buffer-and-batch
Setting hard budget caps without a soft alert at 80%	Codex CLI pauses mid-conversation, breaking the developer’s flow	Soft-alert at 80%, hard-pause at 110%
Treating cross-provider routing as fungible for tool-heavy turns	A turn that needed strong tool reasoning lands on a model 18% weaker, regression hidden in aggregate	Pair the routing rule with an eval that scores tool-use accuracy per turn, roll back if it regresses

How Future AGI closes the loop on Codex CLI routing

The other four gateways treat routing as an end state: accept, translate, forward, return. Future AGI treats it as the input to a feedback loop. Six stages:

Trace. Every Codex CLI turn produces a span tree via traceAI (Apache 2.0). Spans capture input tokens, output tokens, model, provider, tool calls fired, tool results, and the session ID.
Evaluate. ai-evaluation (Apache 2.0) scores every turn. FAGI ships a 60+ EvalTemplate classes in the ai-evaluation SDK with self-improving evaluators on the Future AGI Platform (tool-use accuracy, code-correctness, task completion, faithfulness, structured-output, hallucination, agentic surfaces, instruction-following, groundedness), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (lower per-eval cost than Galileo Luna-2). Scores live alongside cost data on the same trace ID. Codex CLI’s bash output, apply_patch result, and final assistant message all enter the evaluator with span context. Catalog is the floor, not the ceiling.
Cluster. Low-scoring turns get clustered by failure mode. Two clusters show up consistently for Codex CLI: “Opus called on a turn with <8K input where Sonnet would have done it” (cost waste), and “Gemini routed for a multi-file refactor and lost the dependency graph” (cross-provider quality regression).
Optimize. fi.opt.optimizers (ProTeGi, BayesianSearchOptimizer, GEPAOptimizer) rewrites the routing policy or the underlying system prompt against the clustered failures. Two typical Codex CLI optimizations: (a) the token-count threshold for cheap-vs-flagship routing gets re-tuned from 10K to 8K based on actual evals; (b) the rule learns to keep multi-file refactors on the model with the strongest tool-use score (e.g., always Claude Opus for turns with apply_patch involvement, regardless of token count).
Route. Agent Command Center’s gateway applies the updated policy on the next request. No deploy; hot-loaded.
Re-deploy. The new prompt + route pair is versioned. If the score regresses on the next batch of evals, automatic rollback. The loop runs continuously, so the gateway gets better at routing Codex CLI every week instead of staying flat.

The three building blocks are open source:

traceAI, github.com/future-agi/traceAI (Apache 2.0)
ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)
agent-opt, github.com/future-agi/agent-opt (Apache 2.0)

The hosted Agent Command Center adds the failure-cluster view, inline Protect guardrails (~65 ms text + 107 ms image latency per arXiv 2510.13351), RBAC, SOC 2 Type II certified, and AWS Marketplace listing for procurement.

What we did not include

We deliberately left out three gateways that show up in adjacent 2026 listicles:

Kong AI Gateway. Strong if you already run Kong for REST APIs, but the Codex CLI integration is plugin-driven and tool-call passthrough required AI Proxy plugin 3.6+. For teams not already on Kong, the cohort above is faster to ship.
Maxim Bifrost. Go-binary gateway with strong throughput numbers (~11µs mean overhead at 5,000 RPS on t3.xlarge, vendor-published) and an MCP “Code Mode” pitch. Code Mode is more directly aimed at Claude Code than Codex CLI.
Helicone. Acquired by Mintlify on March 3, 2026, with the public roadmap shifting toward documentation-platform-first. Existing users should treat the next 12 months as a migration window.

If your situation is different, all three are worth a second look in Q3 2026.

Sources

OpenAI Codex CLI documentation, github.com/openai/codex
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Portkey AI gateway, portkey.ai
LiteLLM proxy, github.com/BerriAI/litellm
OpenRouter models directory, openrouter.ai/models
Cloudflare AI Gateway, developers.cloudflare.com/ai-gateway
Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey-to-secure-the-rise-of-ai-agents
Datadog Security Labs writeup on LiteLLM PyPI compromise (TeamPCP campaign, March 24, 2026), securitylabs.datadoghq.com/articles/litellm-compromised-pypi-teampcp-supply-chain-campaign
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)

Frequently asked questions

What is the cheapest way to route Codex CLI to non-OpenAI models?

OpenRouter for under-10-person teams (no operational cost, per-token markup). LiteLLM's open-source proxy for self-hosted teams. For teams over $5K/month on Codex CLI, cost-aware routing in Future AGI or Portkey usually pays for itself within four weeks — the easy-turn route to a cheaper model recovers more than the gateway's standing fee.

Does Codex CLI support OpenAI-compatible endpoints other than OpenAI's?

Yes, via `OPENAI_BASE_URL` (or `OPENAI_API_BASE` on older builds). Codex CLI does not care what is on the other side of the URL as long as it speaks the OpenAI Responses API and returns OpenAI-shaped tool calls. All five gateways above translate to that shape on the way back.

Can I route Codex CLI through multiple model providers in the same session?

Yes. The standard pattern is a routing rule keyed on input-token count or tool-call presence, so easy turns go to a cheaper model and hard turns go to the flagship. Future AGI, Portkey, and LiteLLM support this declaratively. Cloudflare requires Worker code; OpenRouter requires the calling app to pick a model per turn.

How do I track Codex CLI cost per developer when everyone shares one OpenAI key?

Use a gateway with virtual keys (Future AGI, Portkey, LiteLLM). Each developer gets a virtual key that fans out to the team's underlying provider keys, preserving bulk pricing. Tag the virtual key with the developer's SSO email and the gateway dashboard groups cost by developer.

What happens to Codex CLI's `bash` and `apply_patch` tool calls when the gateway routes to Claude or Gemini?

The gateway rewrites Anthropic's `tool_use` (or Gemini's `function_call`) back into OpenAI's `tool_calls` shape on the response. All five gateways above do this correctly as of May 2026. Older versions of some proxies flattened tool calls into text, silently breaking Codex CLI — confirm the gateway's tool-call test matrix before adopting.

Is it safe to send source code from Codex CLI through an AI gateway?

For hosted gateways, the data flow is gateway → provider; both endpoints already see the code. If compliance forbids the hosted hop, the safe pick is self-hosted LiteLLM or Future AGI's BYOC deployment, with provider traffic egressing through your own network. Cloudflare AI Gateway and OpenRouter are cloud-only.

How is Future AGI Agent Command Center different from Portkey for Codex CLI specifically?

Portkey is a hosted observation and routing layer. Future AGI adds an optimization layer on top — every routed Codex CLI turn feeds back into prompt rewrites and routing-policy updates, so the gateway gets better at choosing the cheaper model for easy turns and the stronger model for hard turns over time. Portkey gives you a dashboard. Future AGI gives you a dashboard plus a loop.

View all

Guides

AI Gateway for Codex CLI in 2026: The Playbook

Wrap OpenAI Codex CLI in an AI gateway for per-developer budgets, per-call audit trail, and provider flexibility, without changing the CLI command.

Nikhil Pareek · May 15, 2026

11 min

Guides

Best 5 AI Gateways for Embedding API Routing in 2026

Five AI gateways for embedding API routing 2026: provider breadth, dimension consistency, batch APIs, input-hash cache, model migration.

Vrinda Damani · May 7, 2026

19 min

Guides

Best 5 AI Gateways for MCP Tool-Level Observability with Codex CLI in 2026

Five AI gateways scored for MCP tool-level observability with Codex CLI: per-tool latency, success rate, argument validation, MCP auth.

Vrinda Damani · Apr 22, 2026

17 min

TL;DR

Why Codex CLI routing needs a gateway

The 7 axes we score on

How we picked

1. Future AGI Agent Command Center: Best for multi-provider routing under one OpenAI-compatible base URL

2. Portkey: Best for hosted gateway with the largest adapter library

3. LiteLLM: Best for self-hosted Python-native routing

4. OpenRouter: Best for pay-per-token routing across 200+ models

5. Cloudflare AI Gateway: Best for edge-deployed routing across regions

Capability matrix

Decision framework: Choose X if

Common mistakes when wiring Codex CLI through a gateway

How Future AGI closes the loop on Codex CLI routing

What we did not include

Related reading

Sources

Frequently asked questions