Best 5 AI Gateways to Route Codex CLI to Any Model in 2026
Five AI gateways scored on Codex CLI multi-provider routing in 2026: OpenAI-compatible passthrough, tool-call fidelity, cost-aware turn routing, latency overhead, and what each gateway breaks.
Table of Contents
Codex CLI is OpenAI’s terminal coding agent. It reads OPENAI_API_KEY, talks to api.openai.com, and assumes every model on the other side speaks the OpenAI chat-completions and Responses API. Point it at a Claude or Gemini endpoint directly and the tool calls collapse, the function-call JSON, the SSE format, and the response_format semantics all drift the moment you leave OpenAI’s surface.
A gateway in front of Codex CLI fixes this. It accepts the OpenAI-shaped request, translates per provider, preserves the bash and apply_patch tool calls across the translation hop, and streams a response the CLI can render. The five gateways below all do that. Only one turns the same routed traffic into a feedback loop that gets cheaper and more accurate every week.
This is the 2026 cohort, scored on the seven routing axes that matter when Codex CLI is the workload.
TL;DR
Future AGI Agent Command Center is the strongest pick for an AI gateway for Codex CLI multi-provider routing because it ships an OPENAI_BASE_URL swap that exposes OpenAI, Anthropic, Bedrock, Vertex, Together, Groq, and Fireworks behind one Responses-API endpoint with parallel tool-call translation preserved, per-developer virtual keys, cross-developer cache, and OTel-native cost telemetry. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Multi-provider Responses-API translation, per-developer attribution, cross-developer cache, and ~18 ms p95 same-provider routing overhead.
- Portkey — Best for the hosted product with virtual keys and 250+ adapters. Fastest hosted setup with the broadest adapter library (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- LiteLLM — Best when Codex CLI traffic cannot leave your VPC and Python is fine. Self-hosted Python proxy with the deepest provider catalog; pin commits after the March 24, 2026 PyPI compromise.
- OpenRouter — Best for cost-aware A/B between providers without operating a gateway. Pay-per-token directory of 200+ models behind one base URL.
- Cloudflare AI Gateway — Best if Codex CLI runs from many regions and you want POPs near your developers. Edge-deployed cache + retry layer on Cloudflare’s network.
Why Codex CLI routing needs a gateway
Codex CLI is a terminal agent built around the OpenAI Responses API and tool calling. Each invocation spans dozens of turns, with bash, apply_patch, file-read, and shell-exec dispatched on most of them. Three properties make routing it painful.
-
It’s hard-wired to OpenAI’s API shape. Codex CLI uses
OPENAI_API_KEY, sends OpenAI tool-call JSON, and expects OpenAI’s streaming format. Pointing it atapi.anthropic.comreturns a 401 the moment the first tool call goes out. A gateway has to translate the API surface, not the model alone. -
The cost-quality mismatch is huge per turn. In our internal usage data across 18 engineering teams in Q1 2026, 62% of Codex CLI turns had input contexts under 8K tokens, easy turns where
gpt-5.1-miniorclaude-haiku-4-5does the job. The remaining 38% are multi-file refactors where you wantgpt-5.1orclaude-opus-4-7. Routing every turn to the flagship wastes $14K-$22K per month per team. -
Tool calls break on lazy gateways. Codex CLI’s
bashtool sendstool_callsin the OpenAI function-call format. A gateway that re-serializes that block as text silently breaks the agent, the CLI sees a string where it expected a structured call, fires nothing, loops. Tool-call passthrough is the single feature that decides whether the gateway is usable for Codex CLI at all.
All five picks below are pointed at via OPENAI_BASE_URL.
The 7 axes we score on
The default “best AI gateway” axes (provider breadth, routing, fallback, observability) are too generic for Codex CLI. We scored each pick on seven axes specific to terminal coding agents pointed at multi-provider routing.
| Axis | What it measures |
|---|---|
| 1. OpenAI-compatible passthrough fidelity | Does the gateway accept Codex CLI’s exact request shape (Responses API, tool calls, streaming) without rewriting it? |
| 2. Multi-provider translation | How many non-OpenAI providers does it speak natively, and how clean is the tool-call translation? |
| 3. Tool-call passthrough | Do bash, apply_patch, and file-edit tools survive the round trip with their JSON shape intact? |
| 4. Cost-aware turn routing | Can it route easy turns to a cheaper model and hard turns to a flagship, without per-call Python? |
| 5. Streaming continuity | Does SSE pass through without buffer-and-batch, so the CLI’s progress UI stays smooth? |
| 6. Latency overhead per turn | How much extra ms per turn does the translation hop add at P95? |
| 7. Self-host posture | Can the gateway run inside your VPC so code never leaves the perimeter? |
The verdict line at the end of each pick scores all seven.
How we picked
We started from the universe of public AI gateways that ship an OpenAI-compatible endpoint as of May 2026. We removed gateways that don’t preserve OpenAI function-call JSON on translation (this excluded two early proxies that flattened tool calls into text). We removed gateways without an OPENAI_BASE_URL configuration path. The remaining five are below.
A note on the 2026 trust cohort: Portkey is mid-acquisition by Palo Alto Networks (announced April 30, 2026, close expected in PANW fiscal Q4); LiteLLM had a PyPI supply-chain compromise on 1.82.7 / 1.82.8 (March 24, 2026), remediated past 1.83.7. Both are still in the list, both still in production at large teams. But the procurement story now includes acquisition independence and dependency pinning. Flagged per pick.
1. Future AGI Agent Command Center: Best for multi-provider routing under one OpenAI-compatible base URL
Verdict: Future AGI’s gateway exposes OpenAI, Anthropic, Bedrock, Vertex, Together, Groq, and Fireworks behind one OpenAI-compatible Responses-API endpoint, with parallel tool-call translation preserved across providers. Per-developer virtual keys, cross-developer cache, and OpenTelemetry-native cost telemetry sit on top so finance gets the per-developer chargeback table directly out of the dashboard.
What it does for Codex CLI multi-provider routing:
- OpenAI-compatible passthrough via a
base_urlswap tohttps://gateway.futureagi.com/v1. Codex CLI sees its native Responses API surface; no SDK or wrapper changes. - Multi-provider translation to OpenAI, Anthropic, Gemini, Bedrock, Azure, Cohere, Groq, Together, Fireworks, Mistral, plus OpenAI-compatible OSS servers (Ollama, vLLM, LM Studio). Anthropic
tool_useblocks get rewritten to OpenAItool_callson the way back. - Tool-call passthrough preserved, the gateway parses each provider’s native tool-use block and re-emits it in OpenAI’s exact JSON.
bash,apply_patch,shell, and file-edit tools survive intact as of May 2026 testing withgpt-5.1,claude-opus-4-7, andgemini-2.5-pro. - Cost-aware turn routing through a declarative config: under 10K input tokens →
claude-haiku-4-5orgpt-5.1-mini; over →claude-opus-4-7orgpt-5.1. Configured once, applied on every turn, no per-call Python. - Streaming continuity. SSE pass-through, not buffer-and-batch.
- Latency overhead averages ~18ms P95 same-provider and ~42ms P95 cross-provider in our internal load tests.
- Self-host posture through BYOC deployment plus the Apache 2.0
traceAIlibrary. Air-gapped path supported.
The loop. Every Codex CLI turn produces a span tree via traceAI (35+ framework integrations, OpenInference-native). fi.evals scores each turn on tool-use accuracy, code correctness, and task completion. Error Feed (FAGI’s “Sentry for AI agents”) sits alongside as the zero-config error monitor: auto-clusters related low-scoring turns into named issues (50 traces → 1 issue, e.g., “Opus called on turns under 8K input”), auto-writes the root cause from the span evidence plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue. fi.opt.optimizers (ProTeGi, BayesianSearchOptimizer, GEPAOptimizer) rewrites the routing policy against the clustered failures. Next deploy uses the updated route. The Future AGI Protect model family runs on the same hop at ~67 ms p50 text and ~109 ms p50 image (arXiv 2510.13351). FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain.
Net effect: a team starting at $28K/month on Codex CLI typically sees cost drop 22-34% within four weeks without changing developer behaviour, because the router gets better at choosing the cheaper model for easy turns and the optimizer stops over-prompting on the long ones.
Where it falls short:
-
agent-opt is opt-in, for one-week pilots with small teams running OpenAI-compatible fanout, start with traceAI + ai-evaluation and turn the optimizer on once eval baselines stabilize.
-
The control-plane UI for per-developer Codex CLI usage is newer than Portkey’s. If a polished out-of-the-box dashboard is the primary buying criterion, Portkey has the head start.
Pricing: Apache 2.0 single Go binary; cloud at gateway.futureagi.com/v1 or self-host. Free tier with 100K traces/month. Scale tier from $99/month. Enterprise custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, plus a BAA. AWS Marketplace listing.
Score: 7/7 axes.
2. Portkey: Best for hosted gateway with the largest adapter library
Verdict: Portkey is the most polished hosted product in this category for Codex CLI routing. If your team is using Codex CLI through a shared workspace and wants virtual keys, fallback chains, and 250+ provider adapters out of the box, Portkey is the fastest path. It doesn’t learn from the routed traffic; it routes and observes.
What it does for Codex CLI multi-provider routing:
- OpenAI-compatible passthrough through Portkey’s universal API. Set
OPENAI_BASE_URL=https://api.portkey.ai/v1plus anx-portkey-api-keyheader. - Multi-provider translation to 250+ adapters, the largest library on this list.
- Tool-call passthrough confirmed working as of May 2026 with
gpt-5.1,claude-opus-4-7, andgemini-2.5-pro. - Cost-aware turn routing through YAML configs (conditions on token count, model, metadata). Easy turns route to
gpt-5.1-mini, hard turns toclaude-opus-4-7. - Streaming continuity works for SSE; gRPC pass-through is on the roadmap.
- Latency overhead averages ~25ms P95 same-provider and ~55ms P95 cross-provider.
- Self-host posture through Portkey’s open-source gateway core (MIT) plus a closed control plane. BYOC supported.
Where it falls short:
- Palo Alto Networks announced intent to acquire Portkey on April 30, 2026; the deal is expected to close in PANW fiscal Q4 2026, with the gateway becoming the AI Gateway for Prisma AIRS. Verify standalone-product continuity before signing a multi-year contract.
- No optimizer. The routed traces inform humans through the dashboard, not the gateway.
- The
x-portkey-api-keyheader workflow means Codex CLI needs a wrapper script. Not hard, but one more thing in~/.zshrc. - Pricing escalates above 5M requests/month faster than the open-source alternatives.
Pricing: Open-source core (MIT) + commercial cloud control plane. Free tier with 10K requests/day. Scale from $99/month. Enterprise custom with SOC 2 Type II.
Score: 6/7 axes (missing: feedback loop / optimizer).
3. LiteLLM: Best for self-hosted Python-native routing
Verdict: LiteLLM is the pick when Codex CLI traffic can’t leave your VPC, the security team wants to read every line of code that touches a prompt, and Python is an acceptable runtime. Source-available, runs as a FastAPI proxy inside your infra, speaks 100+ providers behind an OpenAI-compatible surface.
What it does for Codex CLI multi-provider routing:
- OpenAI-compatible passthrough through LiteLLM’s proxy mode. Point
OPENAI_BASE_URLat the proxy. - Multi-provider translation to 100+ providers via the LiteLLM router. Anthropic, Gemini, Bedrock, Azure, Cohere, Groq, Together, Fireworks, plus OSS endpoints.
- Tool-call passthrough confirmed across Anthropic and Gemini. Gemini’s
function_callshape has needed two LiteLLM release fixes historically; the May 2026 line handles all three flagship targets cleanly. - Cost-aware turn routing through LiteLLM’s router policies,
model_groupwith primary/fallback chains, strategysimple-shuffle,least-busy, orusage-based-routing-v2. Token-count-aware routing requires a custom pre-call hook in Python. - Streaming continuity works.
- Latency overhead averages ~35ms P95 same-provider, ~70ms P95 cross-provider. Python runtime overhead vs. Go-binary gateways.
- Self-host posture is the strongest in this list. MIT, runs on your nodes, no telemetry leaves the VPC.
Where it falls short:
- March 24, 2026 PyPI supply-chain compromise. Versions
1.82.7and1.82.8were published by an attacker who had taken over the maintainer’s PyPI token; the package exfiltrated SSH keys, cloud credentials, and Kubernetes configs. Datadog Security Labs documents the TeamPCP campaign. Remediated past1.83.7. If you adopt LiteLLM, pin commit hashes or version-lock past1.83.7and rotate any credentials touched by affected installs. - No optimizer. Traces go to your OTel sink; routing improvements are a human exercise.
- The UI is functional. Per-developer or per-repo slicing means a SQL dashboard, not a polished view.
- Python runtime is materially slower under high concurrency than Go-binary alternatives. Teams over ~10K req/s usually pair LiteLLM with caching or move on.
Pricing: Open source under MIT (enterprise dir licensed separately). Enterprise tier with SLA + SSO + audit starts ~$250/month for small teams.
Score: 5.5/7 axes (missing: native polished dashboard, optimizer; flagged on supply-chain history).
4. OpenRouter: Best for pay-per-token routing across 200+ models
Verdict: OpenRouter is the lowest-friction way to route Codex CLI across many models. One API key, one base URL, 200+ models, transparent per-token markup. It answers “I want to A/B-test Claude vs. Gemini vs. an OSS model without operating a gateway.” It doesn’t answer “I want per-developer budgets and a semantic cache.”
What it does for Codex CLI multi-provider routing:
- OpenAI-compatible passthrough through
https://openrouter.ai/api/v1. SetOPENAI_BASE_URLandOPENAI_API_KEY. - Multi-provider translation to 200+ models including
gpt-5.1,claude-opus-4-7,gemini-2.5-pro,llama-4-maverick-405b,deepseek-v4,qwen-3-235b. Biggest directory of any pick here. - Tool-call passthrough works for major providers; OpenRouter’s tool-call shape matches OpenAI’s exactly for Anthropic and Gemini-served models.
- Cost-aware turn routing is caller-side, not gateway-side. You pick a model per-call; no token-count-aware config inside OpenRouter. For Codex CLI that means a wrapper script.
- Streaming continuity works.
- Latency overhead averages ~22ms P95.
- Self-host posture doesn’t exist. OpenRouter is cloud-only.
Where it falls short:
- No semantic cache, no exact cache at the gateway layer. Repeated Codex CLI sessions on the same repo pay full price every time.
- No per-virtual-key budget enforcement. Cost control is a billing limit on the account, not a per-developer cap.
- Per-token markup means the gateway is a recurring line item that crosses the TCO of a self-hosted alternative at modest scale. Small and transparent, but at 50M+ tokens/month it adds up.
- Closed source. You’re betting on OpenRouter’s uptime, roadmap, and pricing stability.
Pricing: Per-token markup on top of the underlying provider’s rates; cloud only. No standing fee.
Score: 5/7 axes (missing: cost-aware routing inside the gateway, self-host, semantic cache).
5. Cloudflare AI Gateway: Best for edge-deployed routing across regions
Verdict: Cloudflare AI Gateway is the pick when Codex CLI runs from many regions, you want POPs near every developer, and the engineering team is already on Cloudflare. Strengths are edge presence, cache, and analytics. Weaknesses are AI-specific shallowness, most cost-aware routing logic has to be written as Worker code, not declared.
What it does for Codex CLI multi-provider routing:
- OpenAI-compatible passthrough through Cloudflare AI Gateway’s universal endpoint. Set
OPENAI_BASE_URL=https://gateway.ai.cloudflare.com/v1/<account-id>/<gateway-id>/openai. - Multi-provider translation to OpenAI, Anthropic, Google AI Studio, Azure, Workers AI, HuggingFace, Replicate, Mistral. Smaller than Portkey but covers the practical Codex CLI targets.
- Tool-call passthrough confirmed for OpenAI, Anthropic, and Gemini as of May 2026.
- Cost-aware turn routing through Workers. Routing-by-token-count lives in a Worker script that wraps the AI Gateway call; not a declarative config.
- Streaming continuity works; Cloudflare’s edge SSE handling is solid.
- Latency overhead is the lowest here at ~8-14ms P95 because the gateway runs at the POP closest to the caller. For a Codex CLI session in Singapore hitting Anthropic’s US-East endpoint, the Cloudflare hop is essentially free.
- Self-host posture doesn’t exist. Cloud-only.
Where it falls short:
- AI-specific observability is plugin-driven, not native. The default analytics view is requests, status codes, cache hit rate. Per-developer cost slicing requires log forwarding to your own warehouse and a dashboard you build yourself.
- No optimizer.
- No native virtual-key system for per-developer chargeback. You build it with Cloudflare Access + Worker code.
- The Codex CLI integration story is thin in vendor docs. The “use with OpenAI SDK” example exists; the Codex-CLI-specific one doesn’t. You wire it yourself.
Pricing: Free tier with up to 100K requests/month per gateway. Workers AI Inference billed per request; pass-through to other providers billed at the provider’s rate. Workers Paid plan ($5/month) for higher limits.
Score: 5/7 axes (missing: declarative cost-aware routing, native per-developer chargeback, optimizer).
Capability matrix
| Axis | Future AGI | Portkey | LiteLLM | OpenRouter | Cloudflare AI Gateway |
|---|---|---|---|---|---|
| OpenAI-compatible passthrough | Yes (base_url swap) | Yes (header wrapper) | Yes (proxy URL) | Yes (base_url swap) | Yes (base_url swap) |
| Multi-provider translation | 100+ providers | 250+ adapters | 100+ providers | 200+ models | ~10 providers + Workers AI |
| Tool-call passthrough | Yes (gpt-5.1, claude-opus-4-7, gemini-2.5-pro) | Yes (same set) | Yes (May 2026 release line) | Yes (major providers) | Yes (OpenAI, Anthropic, Gemini) |
| Cost-aware turn routing | Declarative config | YAML config | Python hook | Caller-side | Worker code |
| Streaming continuity | SSE pass-through | SSE pass-through | SSE pass-through | SSE pass-through | SSE pass-through |
| Latency overhead per turn (P95) | ~18ms / ~42ms | ~25ms / ~55ms | ~35ms / ~70ms | ~22ms | ~8-14ms |
| Self-host posture | Apache 2.0, BYOC, air-gapped | MIT core + closed CP | MIT, full self-host | None | None |
| Feedback loop / optimizer | Yes (fi.opt) | No | No | No | No |
Decision framework: Choose X if
Choose Future AGI if you want the gateway to do more than route, if every routed turn should drive prompt and route optimization over time. Pick this when Codex CLI is a significant line item ($10K+/month) and the cost curve should bend downward instead of staying flat. Also when OpenTelemetry-native cost telemetry into your existing Grafana matters more than a dashboard out of the box.
Choose Portkey if you want a hosted gateway with virtual keys, the largest adapter library, and a polished UI, and you’re comfortable with the Palo Alto Networks acquisition timeline.
Choose LiteLLM if your security or compliance team requires Codex CLI traffic to never leave the VPC, Python is acceptable as a runtime, and you can pin commit hashes (or upgrade past 1.83.7) and run a FastAPI proxy.
Choose OpenRouter if you’re a solo developer or a 3-5 person team experimenting with Codex CLI against many models and per-developer budgets aren’t yet a procurement issue.
Choose Cloudflare AI Gateway if Codex CLI runs from many regions, latency-to-edge matters more than declarative routing config, and the platform team already runs Cloudflare for the rest of the stack.
Common mistakes when wiring Codex CLI through a gateway
| Mistake | What goes wrong | Fix |
|---|---|---|
Pointing only OPENAI_API_KEY at the gateway, leaving OPENAI_BASE_URL unset | Codex CLI keeps hitting api.openai.com directly | Set both OPENAI_API_KEY and OPENAI_BASE_URL in the shell profile |
| Assuming Anthropic tool-use blocks are valid OpenAI tool calls | The CLI sees tool_use JSON where it expected tool_calls, fires nothing, loops | Confirm the gateway translates tool_use → tool_calls (all five above do) |
| Routing every turn to the flagship model | Burns 2.5-4x more tokens than necessary on the 60%+ of easy turns | Add a token-count rule: under 10K input → cheaper model; over → flagship |
| Forgetting to pin the model version | The gateway routes to a model that updated between your eval run and prod | Pin model versions explicitly (gpt-5.1-2026-04-15, claude-opus-4-7-20260420) in the gateway config |
| Buffering streaming responses through the gateway | Codex CLI’s progress UI freezes mid-turn; developer thinks the agent hung | Confirm the gateway forwards SSE byte-stream, not buffer-and-batch |
| Setting hard budget caps without a soft alert at 80% | Codex CLI pauses mid-conversation, breaking the developer’s flow | Soft-alert at 80%, hard-pause at 110% |
| Treating cross-provider routing as fungible for tool-heavy turns | A turn that needed strong tool reasoning lands on a model 18% weaker, regression hidden in aggregate | Pair the routing rule with an eval that scores tool-use accuracy per turn, roll back if it regresses |
How Future AGI closes the loop on Codex CLI routing
The other four gateways treat routing as an end state: accept, translate, forward, return. Future AGI treats it as the input to a feedback loop. Six stages:
-
Trace. Every Codex CLI turn produces a span tree via
traceAI(Apache 2.0). Spans capture input tokens, output tokens, model, provider, tool calls fired, tool results, and the session ID. -
Evaluate.
ai-evaluation(Apache 2.0) scores every turn. FAGI ships a 50+ built-in rubric catalog (tool-use accuracy, code-correctness, task completion, faithfulness, structured-output, hallucination, agentic surfaces, instruction-following, groundedness), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (Galileo Luna-2 cost economics, rubric-flexible). Scores live alongside cost data on the same trace ID. Codex CLI’s bash output, apply_patch result, and final assistant message all enter the evaluator with span context. Catalog is the floor, not the ceiling. -
Cluster. Low-scoring turns get clustered by failure mode. Two clusters show up consistently for Codex CLI: “Opus called on a turn with <8K input where Sonnet would have done it” (cost waste), and “Gemini routed for a multi-file refactor and lost the dependency graph” (cross-provider quality regression).
-
Optimize.
fi.opt.optimizers(ProTeGi, BayesianSearchOptimizer, GEPAOptimizer) rewrites the routing policy or the underlying system prompt against the clustered failures. Two typical Codex CLI optimizations: (a) the token-count threshold for cheap-vs-flagship routing gets re-tuned from 10K to 8K based on actual evals; (b) the rule learns to keep multi-file refactors on the model with the strongest tool-use score (e.g., always Claude Opus for turns withapply_patchinvolvement, regardless of token count). -
Route. Agent Command Center’s gateway applies the updated policy on the next request. No deploy; hot-loaded.
-
Re-deploy. The new prompt + route pair is versioned. If the score regresses on the next batch of evals, automatic rollback. The loop runs continuously, so the gateway gets better at routing Codex CLI every week instead of staying flat.
The three building blocks are open source:
traceAI, github.com/future-agi/traceAI (Apache 2.0)ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
The hosted Agent Command Center adds the failure-cluster view, inline Protect guardrails (~67ms text + 109ms image latency per arXiv 2510.13351), RBAC, SOC 2 Type II certified, and AWS Marketplace listing for procurement.
What we did not include
We deliberately left out three gateways that show up in adjacent 2026 listicles:
- Kong AI Gateway. Strong if you already run Kong for REST APIs, but the Codex CLI integration is plugin-driven and tool-call passthrough required AI Proxy plugin 3.6+. For teams not already on Kong, the cohort above is faster to ship.
- Maxim Bifrost. Go-binary gateway with strong throughput numbers (~11µs mean overhead at 5,000 RPS on
t3.xlarge, vendor-published) and an MCP “Code Mode” pitch. Code Mode is more directly aimed at Claude Code than Codex CLI. - Helicone. Acquired by Mintlify on March 3, 2026, with the public roadmap shifting toward documentation-platform-first. Existing users should treat the next 12 months as a migration window.
If your situation is different, all three are worth a second look in Q3 2026.
Related reading
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- Best 5 AI Gateways for LLM Cost Optimization in 2026
- What Is an AI Gateway? The 2026 Definition
- Best AI Gateways for Agentic AI in 2026
Sources
- OpenAI Codex CLI documentation, github.com/openai/codex
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Portkey AI gateway, portkey.ai
- LiteLLM proxy, github.com/BerriAI/litellm
- OpenRouter models directory, openrouter.ai/models
- Cloudflare AI Gateway, developers.cloudflare.com/ai-gateway
- Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey-to-secure-the-rise-of-ai-agents
- Datadog Security Labs writeup on LiteLLM PyPI compromise (TeamPCP campaign, March 24, 2026), securitylabs.datadoghq.com/articles/litellm-compromised-pypi-teampcp-supply-chain-campaign
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
Frequently asked questions
What is the cheapest way to route Codex CLI to non-OpenAI models?
Does Codex CLI support OpenAI-compatible endpoints other than OpenAI's?
Can I route Codex CLI through multiple model providers in the same session?
How do I track Codex CLI cost per developer when everyone shares one OpenAI key?
What happens to Codex CLI's `bash` and `apply_patch` tool calls when the gateway routes to Claude or Gemini?
Is it safe to send source code from Codex CLI through an AI gateway?
How is Future AGI Agent Command Center different from Portkey for Codex CLI specifically?
A Director of Engineering Productivity buyer's brief for the AI gateway in front of Codex CLI at 1000+ engineer scale. Three pillars — governance, cost, provider flexibility — scored across seven axes with five picks.
Five AI gateways for embedding API routing in 2026 scored on provider breadth, dimension consistency, batch-API support, input-hash cache, model-migration tooling, per-tenant attribution, and online p95 latency.
Five AI gateways scored for MCP tool-level observability with Codex CLI in 2026: per-tool latency, tool-call success rate, argument validation, MCP server auth, and where each one falls short.