Guides

AI Gateway for Codex CLI in 2026: Governance, Cost Control, and Provider Flexibility at Scale

A Director of Engineering Productivity buyer's brief for the AI gateway in front of Codex CLI at 1000+ engineer scale. Three pillars — governance, cost, provider flexibility — scored across seven axes with five picks.

·
15 min read
ai-gateway 2026 codex-cli
Editorial cover image for AI Gateway for Codex CLI in 2026: Governance, Cost Control, and Provider Flexibility at Sc
Table of Contents

The first board deck I had to write about Codex CLI was three slides. Slide one showed terminal-agent adoption inside our 1,200-engineer org climb from 4% to 71% in two quarters. Slide two was an OpenAI invoice the CFO printed with red highlights. Slide three was a Slack screenshot where an engineer asked which model apply_patch was supposed to be calling, and got three different answers from three principal engineers.

That’s the moment a Director of Engineering Productivity learns “we adopted Codex CLI” and “we’re running Codex CLI responsibly at scale” are different projects. The agent ships fast and developers love it. What doesn’t ship on its own is the layer deciding which models it’s allowed to call, how much each cost-center can spend, how to fail over when OpenAI throttles, and how to make all of it legible to security, finance, and developers at once.

That layer is an AI gateway in front of Codex CLI. At 1000+ engineer scale the buyer’s brief is no longer a single pillar. It’s three at once: governance (model whitelists, approval workflows, audit trail), cost control (per-developer caps, budget-aware routing, cache observability), and provider flexibility (OpenAI-shape passthrough, Claude and Gemini fallback, tool-call translation, provider breadth). Miss any one and the gateway is the thing the platform team rips out at the eight-month mark.

Five gateways, seven axes across the three pillars, with explicit “Where it falls short” blocks.


TL;DR: pick by which pillar is most broken

Squeakiest wheelPickWhy
All three pillars need to compound; routing policy improves weeklyFuture AGI Agent Command CenterOnly entry wiring trace → eval → optimize → route as core primitive
Hosted polish, mature virtual keys, 250+ adaptersPortkeyFastest hosted for year-one if the optimizer can wait
Go-binary with aggressive throughput on the slide deckMaxim BifrostHighest published throughput; MCP “Code Mode” aimed at terminal coding agents
Platform team already runs Kong for RESTKong AI GatewayReuses Kong ops; AI Proxy plugin handles tool-call passthrough (3.6+)
Compliance forbids Codex CLI traffic leaving the VPCLiteLLMMost defensible self-host; largest provider catalog inside the perimeter

Why a single-pillar gateway is the wrong purchase at 1000+ engineers

At small scale the pillars look separable: a 30-engineer team picks Helicone for per-request cost; a 100-engineer team picks Portkey for chargeback. The decision is “which dashboard am I missing.”

At 1000+ engineers the pillars are three control planes that interact.

Governance affects cost. A whitelist forbidding gpt-5.1 on the finance squad drives that team to gpt-5.1-mini, roughly 1/8th the per-token cost. Allow vs. forbid is a budget decision.

Cost affects provider flexibility. A rule downgrading gpt-5.1 to gpt-5.1-mini at 95% of the daily cap only works if the gateway can translate the same apply_patch tool call into both providers’ shape and back. Drop translation fidelity and downgrade-on-breach is unusable.

Provider flexibility affects governance. Adding Claude as a fallback is a procurement question (AUP and DPA), an audit question (every cross-provider hop logged with reason), and a security question (does the prompt-injection guardrail run before or after the hop).

A single-pillar gateway makes each interaction an integration project outside the gateway. A three-pillar gateway makes it a configuration question inside it.


The 7 axes we score on, grouped by pillar

PillarAxis
Governance1. Whitelist enforcement — per-cost-center allowed-models, blocked at the gateway not the IDE
Governance2. Approval workflow + audit log — signed audit log for whitelist changes
Cost3. Per-developer caps + downgrade-on-breach — three-stage thresholds + OpenAI prompt-cache visibility
Cost4. Budget-aware routing — declarative, not per-call Python
Provider5. OpenAI-shape translation fidelity — bash and apply_patch survive to Anthropic, Gemini, Bedrock
Provider6. Deterministic fallback — defined chain with sticky session affinity
Provider7. Provider breadth — how many providers, how active the catalog

The verdict under each pick scores all seven.


How we picked

We started from AI gateways with an OpenAI-compatible endpoint live in mid-May 2026. We cut the ones that fail the basics, tool-call breakage, no virtual-key model, no audit log a SOC 2 reviewer accepts. The five below are the ones an enterprise procurement team would put in front of Codex CLI for over a thousand engineers.

Trust-cohort notes: Portkey is mid-acquisition by Palo Alto Networks (April 30, 2026, expected close PANW fiscal Q4 as AI Gateway for Prisma AIRS). LiteLLM had a PyPI supply-chain compromise on 1.82.7 and 1.82.8 (March 24, 2026, remediated past 1.83.7, Datadog Security Labs TeamPCP). Both still run production at large orgs; flagged per pick.


1. Future AGI Agent Command Center: Best for compounding all three pillars

Verdict: The only entry where governance, cost, and provider flexibility are wired to a self-improving loop instead of three separate dashboards. The other four enforce the policy you write. Agent Command Center enforces the policy you write and proposes the next policy diff for the approver to sign.

Governance, strong. Per-cost-center allow-lists block disallowed models at the gateway with a structured 403; a finance-squad key invoking gpt-5.1 fails before the OpenAI hop, violation in the audit log. Whitelist-change is pull-request-shaped, justification, RBAC-gated approver, signed policy diff exportable to your SIEM.

Cost control, strong. Virtual keys carry three-stage thresholds: 80% alert, 95% downgrade to a cheaper allowed model, 110% hard pause with structured 429. The downgrade respects the cost-center’s whitelist, so cap-firing-into-a-blocked-model never happens. OpenAI prompt-cache hit rate surfaces per developer, repo, session, sessions at 30% hit-rate (paying ~15% more than they should against the 60% baseline) are visible without SQL.

Provider flexibility, strong. Anthropic tool_use, Gemini function_call, and OSS variants get rewritten to OpenAI’s tool_calls. bash, apply_patch, shell, file-edit survive across gpt-5.1, claude-opus-4-7, gemini-2.5-pro, Bedrock. Fallback chains per cost-center with sticky session affinity. Cross-provider P95 hop ~42ms.

The loop. Every turn produces a span via traceAI (Apache 2.0). fi.evals scores tool-use accuracy, code correctness, task completion. fi.opt.optimizers (ProTeGi, Bayesian, GEPA in agent-opt) emits a policy diff with math attached, “for finance-squad, route turns under 8K input tokens to gpt-5.1-mini between 9am and 5pm, regression 0.4%, monthly saving $3,840.” Protect guardrails (~67ms text, 109ms image per arXiv 2510.13351) add no perceptible latency.

Where it falls short:

  • Three-stage cap and the loop assume layered control. A team wanting one hard cap and a CSV will find it over-engineered for year one.

  • Loop needs ~three weeks of traffic per cost-center before stable diffs emerge; day one is static rules with the optimizer in dry-run.

  • Dashboard polish newer than Portkey’s; if a console is the primary criterion, Portkey has the head start.

Pricing. Apache 2.0 Go binary; cloud or self-host. Free 100K traces / month. Scale from $99 / month. Enterprise custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, BAA available, AWS Marketplace.

Score: 7 / 7 axes.


2. Portkey: Best for year-one hosted polish

Verdict: The most polished hosted product for year-one rollout. Virtual-key budgets are the cleanest in the cohort, Slack and Teams alerting is first-class, the 250+ adapter library is the largest. Governance leans on YAML rather than a turnkey console; the deal-breaker is the acquisition timeline.

Governance. YAML-shaped. Whitelist enforcement through metadata-conditional rules: a key carrying metadata.cost_center=finance-squad against a config allowing only gpt-5.1-mini and claude-sonnet-4-6 blocks other models with a structured error. Approval flow is wired through your existing change-management system (GitOps, Backstage, ServiceNow) with Portkey’s audit log as the gateway-side record.

Cost control, strong. Virtual keys with daily and monthly caps and soft-alert thresholds. Downgrade-on-breach wireable through metadata-conditional routing; the three-stage YAML is platform-team-owned. Portkey’s semantic cache is dashboard-native; OpenAI’s prompt-cache hit rate is recoverable from raw traces but not first-class.

Provider flexibility, strong. Translation fidelity confirmed across gpt-5.1, claude-opus-4-7, gemini-2.5-pro, major Bedrock. Fallback chains with sticky-session affinity. Latency ~25ms P95 same-provider, ~55ms P95 cross-provider. 250+ adapters.

Where it falls short:

  • PANW announced intent to acquire Portkey on April 30, 2026, expected close PANW fiscal Q4 2026 as AI Gateway for Prisma AIRS. Multi-year buyers verify standalone-product continuity and pricing tied to close.
  • No self-improving loop.
  • Three-stage cap and whitelist-change are wireable, not turnkey.
  • Pricing escalates above 5M requests / month faster than self-host.

Pricing. Open-source core (MIT) plus commercial cloud control plane. Free 10K / day. Scale from $99 / month. Enterprise custom with SOC 2 Type II.

Score: 5 / 7 axes.


3. Maxim Bifrost: Best for the throughput-on-the-slide-deck buyer

Verdict: A Go-binary gateway optimised for throughput with an “MCP Code Mode” pitch aimed at terminal coding agents. Vendor-published ~11µs mean overhead at 5,000 RPS on t3.xlarge is aggressive enough to win the latency slide. Strong on the inference path; younger than the cohort on policy.

Governance, evolving. Policy primitives similar in shape to Portkey, virtual key carries cost-center metadata, policy gate blocks disallowed models. Control-plane UI is newer; wire some through config files. Approval flow follows the external-change-management pattern.

Cost control, maturing. Virtual keys, budget caps, rate-limit policies first-class. Three-stage cap with downgrade-on-breach is wireable; composed, not a toggle.

Provider flexibility, strong. MCP Code Mode emphasizes preserving tool-call shape across providers with explicit Claude Code testing; the same work pays off for Codex CLI. Fallback with sticky session affinity. Adapter catalog smaller than Portkey’s but covers Codex CLI targets.

Where it falls short:

  • ~11µs at 5,000 RPS on t3.xlarge is directional. Pin against your own instance with representative apply_patch payloads before this hits a slide deck.
  • Policy surface less mature than Portkey or Future AGI; plan more configuration work.
  • No self-improving loop.
  • MCP Code Mode framing leans toward Claude Code in vendor materials. Codex CLI pattern works; worked examples are thinner.
  • Smaller ecosystem and customer reference list than Portkey or Kong.

Pricing. Open-source Bifrost core; cloud and self-host. Enterprise via the vendor.

Score: 4.5 / 7 axes.


4. Kong AI Gateway: Best when the org-chart cost of a second gateway is unacceptable

Verdict: The pick when the platform team already runs Kong for REST and the path of least resistance is to extend the stack with the AI Proxy plugin. Strengths are operational familiarity, SLA, and the audit and access primitives REST already enjoys. Weaknesses are AI-specific shallowness, most cost-control work happens through plugins.

Governance, strong foundations, plugin-driven specifics. Kong’s RBAC, consumer-policy, and route-policy primitives map cleanly to a cost-center model. AI-specific enforcement, “block gpt-5.1 for cost-center finance-squad”, wires through the AI Proxy plugin (3.6+) and Lua. Approval workflow inherits Kong’s existing change-management story; Konnect’s audit log is mature, signed, SOC 2-ready.

Cost control, wireable, not native. Kong’s rate-limiting plugins are RPM- and bandwidth-first. Dollar caps require a plugin aggregating token cost from AI Proxy response data against a state store (Redis). Three-stage cap with downgrade-on-breach is a plugin stack, two-week sprint. OpenAI prompt-cache hit-rate observability isn’t first-class.

Provider flexibility, solid in 3.6+. Translation fidelity confirmed; tool calls preserved. Fallback through Kong’s existing routing. Provider breadth covers major targets, smaller than Portkey or LiteLLM.

Where it falls short:

  • AI-specific observability is plugin-driven. The LLM-cost view is something you build with OTel plus your warehouse and Grafana. Two weeks for the chargeback view finance accepts.
  • Per-developer cost slicing isn’t turnkey.
  • No self-improving loop.
  • AI Proxy plugin requires Kong 3.6+; older installs need an upgrade.

Pricing. Kong open source. Konnect (managed) starts free. Enterprise from ~$1.5K / month.

Score: 4 / 7 axes.


5. LiteLLM: Best when source code cannot leave the VPC

Verdict: The pick when compliance forbids Codex CLI traffic leaving the VPC and security wants to read every line of code touching a prompt. Python, FastAPI, MIT. Largest provider catalog of any self-host pick. Deal-breaker is the March 2026 PyPI supply-chain compromise, remediated past 1.83.7.

Governance, code-driven. Whitelist enforcement through team and user budgets plus model_group config; a pre_call_check hook blocks disallowed models with a structured error. Policy in YAML and Python; approval is GitOps plus your existing approver chain.

Cost control, strong inside the VPC. Team and user budgets first-class; hard caps return 429. Downgrade-on-breach is a 25-line pre_call_check hook rewriting model="gpt-5.1" to model="gpt-5.1-mini" based on remaining budget. OpenAI prompt-cache headers pass through but aren’t aggregated, pair with traceAI.

Provider flexibility, strong inside the VPC. Translation fidelity confirmed for Anthropic and Gemini in the May 2026 release line; tool calls preserved. Fallback through model_group chains. Latency averages ~35ms P95 same-provider, ~70ms P95 cross-provider. 100+ providers.

Where it falls short:

  • March 24, 2026 PyPI supply-chain compromise. 1.82.7 and 1.82.8 were published by an attacker with the maintainer’s PyPI token; the package exfiltrated SSH keys, cloud credentials, and Kubernetes configs (Datadog Security Labs TeamPCP). Remediated past 1.83.7. Pin commit hashes and rotate touched credentials.
  • No self-improving loop.
  • UI is functional, not polished; per-developer slicing means a SQL dashboard.
  • Python runtime is materially slower than Go-binary gateways under high concurrency; teams over ~10K req/s pair with caching.

Pricing. Open source under MIT. Enterprise from ~$250 / month for small teams.

Score: 4.5 / 7 axes.


Capability matrix

AxisFAGIPortkeyMaximKongLiteLLM
Whitelist enforcementNativeYAMLConfigLua pluginYAML + hook
Approval + auditConsole + signed diffGitOps + auditAudit + externalGitOps + KonnectGitOps + audit
Caps + downgradeNative 3-stage cross-providerCap + YAMLCap + compositionPlugin + RedisBudgets + Python hook
Budget-aware routingDeclarative + optimizerYAML conditionalCompositionPlugin compositionpre_call_check
Translation fidelityAll + OSSAll + OSSAllAll (3.6+)All + OSS
Deterministic fallbackCascade + stickyChainRoutingKong routingmodel_group
Provider breadth100+250+SolidMajor + plugins100+
Self-improving loopfi.opt

Decision framework: Choose X if

Choose Future AGI Agent Command Center when the brief is the full three-pillar story and the gateway needs to compound. Codex CLI is a serious line item ($25K+ / month typical floor), governance has to satisfy a SOC 2 reviewer, the routing policy itself should improve every week.

Choose Portkey when year-one rollout is the concern, hosted polish beats the loop, the team can wire the three-stage cap and whitelist change-management themselves, and the PANW acquisition timeline is acceptable through the contract term.

Choose Maxim Bifrost when raw throughput is the buying criterion and the team will compose policy as configuration. Re-test the vendor’s numbers against your apply_patch traffic before they hit a slide deck.

Choose Kong AI Gateway when the platform team already runs Kong and the governance foundations inherited from REST are worth more than turnkey AI cost dashboards. Plan a two-week sprint for the chargeback view and three-stage cap.

Choose LiteLLM when compliance forbids Codex CLI traffic leaving the VPC, Python is acceptable, the platform team writes the pre_call_check hook, and the team can pin commit hashes past 1.83.7.


Common mistakes when wiring Codex CLI through a three-pillar gateway

MistakeFix
Treating governance, cost, and provider flex as separate projects — the cap firing at 95% routes to a model the whitelist forbidsPick a gateway where the pillars compose; verify cap → downgrade → whitelist in dry-run
Whitelist only in the IDE config — bypassed by changing config or using a different CLIEnforce at the gateway with a structured 403
Single hard cap at 100% — Codex CLI pauses mid-conversation; developers route around with ChatGPT.com or a personal keyThree-stage cap: alert 80%, downgrade 95%, hard pause 110% with structured 429
Adding a fallback provider without an audit hook — Anthropic called for a finance-squad session under an OpenAI-only whitelist; security finds out months laterLog every cross-provider hop; alert on provider-boundary hops
Tool-use retries counted only per-request — agent loops 14 times on a bash error, burns $80 in 4 minutesCap retries per session (default 3) with structured 429

How Future AGI compounds across the three pillars

The other four gateways enforce the policy you write. When policy changes, a human reads the dashboard, writes the diff, opens a PR, ships the new config. Compounding stops at the dashboard.

Agent Command Center treats every pillar as input to a closed loop. Every Codex CLI turn produces a traceAI span (Apache 2.0); every cross-provider hop logs original-model, target-model, reason; every whitelist event logs the policy version. fi.evals (Apache 2.0) scores tool-use accuracy, code correctness, task completion alongside cost. fi.opt.optimizers (ProTeGi, Bayesian, GEPA in agent-opt) emits a policy diff with math attached. The gateway hot-loads it signed against the approver’s IdP claim; eval regression triggers automatic rollback, recorded as an audit event.

Net effect at 1000+ engineer scale: cost trends down 22-34% within four weeks without changing developer behaviour. Governance log gets cleaner because whitelist-violating fallbacks become rule changes the optimizer proposes, not incidents security finds. Fallback chains get tuned by tool-use accuracy data, not guesswork.

traceAI, ai-evaluation, agent-opt are open source at github.com/future-agi (Apache 2.0). Agent Command Center adds signed approver workflows, inline Protect guardrails (~67ms text, 109ms image per arXiv 2510.13351), RBAC scoped to cost-center, SOC 2 Type II, HIPAA, GDPR, and CCPA certified, BAA available, AWS Marketplace. BYOC for compliance regimes requiring the gateway inside the VPC.


What we did not include

Three adjacent picks that didn’t fit the three-pillar brief at this scale: OpenRouter (right for a 3-5 person team A/B-testing; not built for per-virtual-key budgets or on-prem), Cloudflare AI Gateway (lowest latency in any cohort, but cost-aware routing and per-developer governance still live in Worker code), and Helicone (acquired by Mintlify on March 3, 2026; roadmap pivoting documentation-platform-first).



Sources

  • OpenAI Codex CLI, github.com/openai/codex
  • OpenAI prompt caching, platform.openai.com/docs/guides/prompt-caching
  • Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
  • Future AGI traceAI / ai-evaluation / agent-opt, github.com/future-agi (Apache 2.0)
  • Future AGI Protect latency, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
  • Portkey, portkey.ai
  • PANW intent to acquire Portkey (April 30, 2026), paloaltonetworks.com/company/press/2026
  • Maxim Bifrost, getmaxim.ai
  • Kong AI Gateway, konghq.com/products/kong-ai-gateway
  • LiteLLM, github.com/BerriAI/litellm
  • Datadog Security Labs on LiteLLM PyPI compromise (TeamPCP, March 24, 2026), securitylabs.datadoghq.com

Frequently asked questions

How do I enforce a model whitelist on Codex CLI when developers can edit their shell config?
At the gateway, not the IDE. Issue per-developer virtual keys with cost-center metadata; the gateway reads the key and blocks the disallowed model with a structured 403 regardless of shell config.
Can the gateway downgrade `gpt-5.1` to `gpt-5.1-mini` and across providers when a cost-center is over budget?
Future AGI ships this as a declarative primitive respecting the cost-center's whitelist. Portkey through YAML conditional routing. Maxim and Kong through policy composition. LiteLLM through a `pre_call_check` hook.
What happens to Codex CLI's tool calls when the gateway routes to Claude or Gemini?
The gateway rewrites Anthropic `tool_use` or Gemini `function_call` into OpenAI's `tool_calls` shape. All five picks do this correctly as of May 2026 for `gpt-5.1`, `claude-opus-4-7`, and `gemini-2.5-pro`. Older proxies flattened tool calls into text and silently broke the agent loop.
How does the audit log on cross-provider hops satisfy SOC 2 or HIPAA?
For any source-code-bearing request in the last 90 days, can you produce the original requested model, served model, provider, cost-center, developer's IdP claim, policy version, and approver? Future AGI's audit log answers in a single query. Portkey, Kong, and LiteLLM answer after a SQL join across gateway audit log and your IdP and change-management logs.
How is Future AGI Agent Command Center different from Portkey here?
Portkey is a hosted enforcement layer with mature virtual-key budgets, the largest adapter library, and Slack-native alerts. Future AGI adds signed approver workflows, cross-provider downgrade as a declarative primitive, first-class cache-hit and retry-budget surface, and the optimizer that proposes the next policy diff. Portkey enforces the policy you write. Future AGI enforces the policy you write *and* writes the next policy for you.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.