Best AI Gateway to Manage Codex CLI Token Spend in 2026
Five AI gateways scored on Codex CLI token spend management in 2026: per-session attribution, per-developer caps, alert routing, model downgrade on breach, cache observability, retry-budget interaction, and burndown forecasting.
Table of Contents
The first time Codex CLI broke a monthly budget at our company, finance pinged engineering on the 19th: “We’re at 142% of the OpenAI line item. Why is Codex CLI $38,000 ahead of plan?” The engineering director opened the OpenAI usage dashboard. Aggregate spend, no per-developer split, no per-repo split, no way to tell whether the overrun came from the platform team running multi-file refactors or one engineer running a nightly cron that re-summarised the monorepo every six hours.
That was the day the team understood the difference between token tracking and spend management. Tracking tells finance what happened after the bill arrives. Spend management is the layer that prevents the breach, per-developer caps with auto-pause, alerts routed to the manager who can act, requests downgraded from gpt-5.1 to gpt-5.1-mini when a team is over budget, cache-hit-rate dashboards, and a monthly burndown forecast that surfaces the overrun on the 11th.
Codex CLI is OpenAI’s terminal coding agent. It reads OPENAI_API_KEY, hits the Responses API on api.openai.com, and produces no native chargeback view, no native cap, no native downgrade. Everything comes from the gateway in front. This post scores the 2026 cohort on the seven axes that matter when Codex CLI is rolled out across 30+ engineers and the FinOps lead’s job is to stop the bleeding without turning developers against the tool.
TL;DR
Future AGI Agent Command Center is the strongest pick for an AI gateway for Codex CLI token spend management because it ships per-developer virtual keys with three-stage budget thresholds, an automatic gpt-5.1 → gpt-5.1-mini downgrade on breach, per-session attribution that maps each Codex CLI session to one chargeback line item, and OpenAI / Bedrock / Anthropic all reachable behind one OpenAI-compatible Responses-API base URL. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Three-stage budget thresholds, per-session attribution, virtual-key fan-out that preserves OpenAI tiered-pricing discounts, and dry-run policy testing.
- Portkey — Best for cleanest virtual-key budgets with Slack/Teams alerts out of the box. Mature hosted-only product (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- Helicone — Best for 10-developer Codex CLI rollouts where minimal infra wins. Lightweight per-key rate limiting (treat as planned migration after the March 3, 2026 Mintlify acquisition).
- LiteLLM — Best when Codex CLI traffic cannot leave the VPC. Self-hosted Python proxy with team-level budgets and webhooks; pin commits after the March 24, 2026 PyPI compromise.
- OpenRouter — Best for 3-5 person teams A/B-testing Codex CLI against many models before the FinOps lead gets involved. Pay-per-token directory with caller-side model selection.
Why Codex CLI spend management is harder than spend tracking
Tracking is read-only: dump cost rows into a spreadsheet, finance produces the chargeback table, conversation ends until next month. Spend management is write-side, it decides what happens next, in real time, when a budget is at risk. Three properties of Codex CLI make this harder than tracking it.
First, Codex CLI sessions are long and lumpy. A single multi-file refactor can run 80 to 120 turns, with input spiking to 180K tokens on the merge-conflict turn near the end. A requests-per-minute rate limit is the wrong tool, a $54 session was three requests. The cap that matters is dollar-denominated, at the developer or cost-center level.
Second, the cost-quality mismatch is concrete and large. In our Q1 2026 data across 18 engineering teams, 62% of Codex CLI turns had input contexts under 8K tokens, file renames, lint fixes, completion-style turns. They run fine on gpt-5.1-mini at ~1/8th the per-token cost of gpt-5.1. The remaining 38% are multi-file refactors where gpt-5.1 earns its keep. Routing every turn to the flagship is the second-most-common reason a Codex CLI rollout blows budget. The most common isn’t capping it at all.
Third, auto-pause mid-session destroys developer trust faster than over-spend destroys the budget. The pattern that survives is three-stage: soft alert at 80%, automatic downgrade at 95%, structured 429 with a “your EM has been alerted” body at 110%. Anything cruder generates a Slack thread the FinOps lead loses.
All five picks below sit in front of Codex CLI via OPENAI_BASE_URL.
The 7 axes we score on
The generic “best AI gateway” axis list is the wrong shape for a FinOps lead with a Codex CLI overrun. Seven axes specifically about spend management for the OpenAI terminal coding agent:
| Axis | What it measures |
|---|---|
| 1. Per-session token attribution for Codex CLI | Can it group token spend by Codex CLI session ID so one long refactor is a single line item? |
| 2. Per-developer budget caps | Can it enforce a hard dollar cap per developer per day, with virtual keys that fan out to one underlying OpenAI key? |
| 3. Alert routing | When budget is at 80% / 95% / 110%, can it page the right person — the EM, not #general? |
| 4. Model downgrade on budget breach (gpt-5.1 → gpt-5.1-mini) | When a wallet is exhausted, can the gateway auto-route easy turns to a cheaper model without code changes? |
| 5. Cache hit-rate observability | Does it surface OpenAI prompt-cache hit rate per developer / per repo? |
| 6. Retry-budget interaction | When tool-use retries fail, does it count retries against budget and cap retries-per-session? |
| 7. Monthly burndown forecasting | Does it project end-of-month spend from the first 10 days, so overrun surfaces on the 11th instead of the 19th? |
Verdict line at the end of each pick scores all seven.
How we picked
We started from public AI gateways with an OpenAI-compatible endpoint as of May 2026. Four cuts: gateways that only do RPM rate limiting (no dollar caps); gateways without per-virtual-key budget enforcement; gateways whose auto-pause returns a 500 instead of a structured 429 (Codex CLI hangs on 500s); gateways with no surfacing of OpenAI’s prompt-cache hit rate. Five survived.
A note on the 2026 trust cohort: Portkey is mid-acquisition by Palo Alto Networks (announced April 30, 2026, close expected PANW fiscal Q4 2026); LiteLLM had a PyPI supply-chain compromise on 1.82.7 / 1.82.8 (March 24, 2026, remediated past 1.83.7 per Datadog Security Labs); Helicone was acquired by Mintlify on March 3, 2026 with the roadmap pivoting toward documentation-platform-first. All three remain in the cohort; the procurement story is now different. Flagged per pick.
1. Future AGI Agent Command Center: Best for per-developer Codex CLI budget caps with auto-downgrade routing
Verdict: Future AGI ships three-stage budget thresholds (soft alert at 80%, automatic downgrade at 95%, hard pause at 110%), per-session attribution that maps every Codex CLI session to one chargeback line item, virtual keys that preserve OpenAI tiered-pricing discounts via fan-out to one underlying key, and Bedrock alongside OpenAI both reachable behind one OpenAI-compatible Responses-API base URL so a budget breach downgrades from gpt-5.1 to gpt-5.1-mini mid-session without an SDK change.
What it does for Codex CLI token spend management:
- Per-session attribution through
fi.attributes.session.id, set when Codex CLI’s session header is forwarded. Each session is one chargeback line item; each turn a child span. The 180K-token turn that broke the budget is one click away. - Per-developer budget caps through virtual keys with native three-stage thresholds (80% soft alert, 95% downgrade, 110% hard pause with structured 429). Each virtual key fans out to one underlying OpenAI key, preserving tiered-pricing discounts.
- Alert routing through
fi.alertswith per-cost-center rules. Platform squad’s breach DMs the platform EM; data squad’s DMs the data EM. FinOps gets the end-of-day digest, not a 9pm firehose. - Model downgrade (gpt-5.1 → gpt-5.1-mini) through the budget-aware routing policy. At 95%, turns under 10K input tokens auto-route to
gpt-5.1-mini; flagship stays for hard turns until 110%. Past 110%, both drop to mini until midnight UTC. - Cache hit-rate observability through the prompt-cache span attribute. OpenAI’s cached-input pricing (~50% of full input cost on hit) surfaces per developer, per repo, per session, no SQL needed.
- Retry-budget interaction through the retry-counter attribute. Each retry counts against budget and a per-session cap (default 3). A session looping 14 times on a bash error pauses after the third with a structured 429.
- Monthly burndown forecasting through the spend-projection view. On day 11, three bands (P10 / P50 / P90). If P50 crosses budget, FinOps has nine days to act.
The loop. Every turn produces a span via traceAI (Apache 2.0). fi.evals (Apache 2.0) scores tool-use accuracy, code correctness, task completion. On a breach, fi.opt.optimizers (ProTeGi, Bayesian, GEPA, all Apache 2.0 in agent-opt) reads breach + trace + eval history and emits a policy diff: “for platform-squad, route turns under 8K input tokens to gpt-5.1-mini between 9am and 5pm, regression 0.4%, under 2% tolerance. Estimated monthly saving: $3,840.” Math shown. Protect guardrails (~67ms text, 109ms image per arXiv 2510.13351) add no perceptible latency.
Where it falls short:
- The three-stage threshold (80 / 95 / 110) is opinionated. Teams wanting a single hard cap will find the default over-engineered. Configurable.
- The “learn which turns to downgrade” loop needs ~three weeks of traffic before stable policy diffs emerge. Day-one uses static rules.
Pricing: Apache 2.0 single Go binary; cloud or self-host. Free tier 100K traces / month. Scale from $99 / month. Enterprise custom, SOC 2 Type II, HIPAA, GDPR, and CCPA certified, BAA available, AWS Marketplace listing.
Score: 7 / 7 axes.
2. Portkey: Best for hosted Codex CLI spend management with Slack-native alerts
Verdict: Portkey is the most polished hosted product for the enforce-budgets-and-alert slice. Virtual-key budgets are the cleanest in the cohort, Slack and Teams integrations are first-class, the dashboard is the one FinOps leads log into. It doesn’t close the loop on routing policy. Verify the PANW acquisition timeline before signing multi-year.
What it does for Codex CLI token spend management:
- Per-session attribution through Portkey’s
trace_idheader. The Codex CLI wrapper has to set it; without it, sessions blend. A 12-line shell wrapper is the standard pattern. - Per-developer budget caps through virtual keys with daily or monthly resets. Soft-alert thresholds configurable per key.
- Alert routing through native Slack and Teams. Per-key alerts target channels or DM specific users. “Platform-EM versus #general” is first-class.
- Model downgrade on budget breach is partial. Metadata-conditional routing exists, but “at 95%, route easy turns to
gpt-5.1-mini” requires a YAML rule. Wireable, not turnkey. Plan a sprint for the three-stage pattern. - Cache hit-rate observability is partial. Portkey surfaces its own semantic cache hit rate; OpenAI’s prompt-cache hit rate is recoverable from raw trace, not from the dashboard.
- Retry-budget interaction isn’t natively modeled. Each retry counts as a separate request with no per-session cap. The “agent loops on a bash error and burns $80 in 4 minutes” failure mode happens.
- Monthly burndown forecasting isn’t a dashboard feature.
Where it falls short:
- Palo Alto Networks announced intent to acquire Portkey on April 30, 2026, close expected PANW fiscal Q4 2026. The gateway is slated to become the AI Gateway for Prisma AIRS. Verify standalone-product continuity before signing multi-year.
- No optimizer. This month’s breach pattern doesn’t influence next month’s policy unless the EM manually re-tunes.
- Route-by-budget is wireable but not turnkey. Day-one is caps + alerts; the downgrade rule is a follow-on sprint.
Pricing: Open-source core (MIT) + commercial cloud control plane. Free tier 10K requests / day. Scale from $99 / month. Enterprise custom with SOC 2 Type II.
Score: 5 / 7 axes (missing: turnkey downgrade-on-breach, retry-budget cap, burndown forecast, self-improving policy).
3. Helicone: Best for lightweight Codex CLI spend management on small teams
Verdict: Helicone is the right pick when the story is “10 developers on Codex CLI, $5K / month, daily cap per developer plus a Slack ping when it trips.” Drop in the proxy, set the rate-limit policies, the overhead matches the budget. Past 20 developers or once FinOps asks for downgrade-on-breach, the cracks show. Acquired by Mintlify in March 2026 with the roadmap pivoting toward documentation-platform-first.
What it does for Codex CLI token spend management:
- Per-session attribution through
Helicone-Session-Id. Wrapper has to set it. - Per-developer budget caps through rate-limit policies. Helicone is RPM-first; dollar caps work through usage alerts plus a webhook that flips the key off. Less clean than Portkey’s virtual-key budgets.
- Alert routing through usage alerts; Slack hook is standard. Per-EM routing requires per-policy channel config.
- Model downgrade on budget breach isn’t present out of the box. Failover fires on errors, not budget proximity. The downgrade has to be coded in a wrapper upstream.
- Cache hit-rate observability through Helicone’s own cache. OpenAI’s native prompt-cache hit rate isn’t first-class.
- Retry-budget interaction isn’t modeled. Each retry counts as a request; no per-session cap.
- Monthly burndown forecasting isn’t present.
Where it falls short:
- Acquired by Mintlify (March 3, 2026) with the public roadmap shifting toward documentation-platform-first. Treat 2026 as a migration-evaluation window.
- No optimizer.
- Policy expressiveness is below Portkey and Future AGI. The three-stage pattern is hand-wired through webhooks.
- Self-host beyond a few hundred RPS gets operational.
Pricing: Free tier 10K requests / month. Pro from $25 / month. Enterprise custom.
Score: 3.5 / 7 axes (missing: downgrade-on-breach, retry-budget cap, burndown forecast, self-improving policy).
4. LiteLLM: Best for self-hosted Codex CLI spend management inside the VPC
Verdict: LiteLLM is the pick when Codex CLI traffic can’t leave the VPC and security wants source code they can read. Budget primitives are real, team budgets, user budgets, virtual keys, webhook alerts. Polish is below the hosted alternatives; the downgrade story is Python, not a toggle. Pin commit hashes past 1.83.7 per the March 2026 PyPI compromise.
What it does for Codex CLI token spend management:
- Per-session attribution through metadata pass-through. Wire
metadata.session_idfrom the Codex CLI header into the proxy config. - Per-developer budget caps through team and user budgets; hard cap returns 429. team_id maps to the SSO claim if you wire your IdP.
- Alert routing through webhook hooks. PagerDuty, Opsgenie, Slack, bring your own destination.
- Model downgrade on budget breach through
pre_call_checkhooks. A 25-line Python hook that checks remaining budget and rewritesmodel="gpt-5.1"tomodel="gpt-5.1-mini". Real engineering time, not a toggle. - Cache hit-rate observability is thin. LiteLLM forwards OpenAI’s cache headers but doesn’t aggregate them. Pair with
traceAIor build the dashboard yourself. - Retry-budget interaction through
num_retries(per-request only). Per-session caps live in the pre-call hook. - Monthly burndown forecasting isn’t present. Export the spend table to your analytics warehouse.
Where it falls short:
- March 24, 2026 PyPI supply-chain compromise. Versions
1.82.7and1.82.8were published by an attacker with the maintainer’s PyPI token; the package exfiltrated SSH keys, cloud credentials, and Kubernetes configs (Datadog Security Labs TeamPCP writeup). Remediated past1.83.7. Pin commit hashes and rotate touched credentials. - No optimizer.
- Observability is thinner than Portkey or Future AGI. Most enterprises pair LiteLLM with
traceAIfor breach forensics. - Day-one setup is heavier than hosted alternatives. Plan a platform-team sprint.
Pricing: Open source under MIT (enterprise dir licensed separately). Enterprise from ~$250 / month for small teams.
Score: 4 / 7 axes (missing: turnkey downgrade-on-breach, cache hit-rate UI, retry-session cap, burndown forecast, self-improving policy).
5. OpenRouter: Best for early-stage Codex CLI A/B testing before FinOps gets involved
Verdict: OpenRouter is the lowest-friction way to route Codex CLI across many models. One API key, one base URL, 200+ models, per-token markup. It answers “three engineers A/B-testing models without operating a gateway.” It doesn’t answer “cap each developer at $40 / day and downgrade easy turns on breach.” Past 5 developers, the wrong shape.
What it does for Codex CLI token spend management:
- Per-session attribution is shallow. Aggregates by API key. Session ID requires CSV export.
- Per-developer budget caps are account-level only. One developer’s spend isn’t isolated. Bulk-pricing discounts aren’t preserved since OpenRouter is the marketplace.
- Alert routing is account-level usage alerts. No per-EM routing.
- Model downgrade on budget breach is caller-side. No budget-aware routing rule.
- Cache hit-rate observability isn’t present.
- Retry-budget interaction isn’t modeled.
- Monthly burndown forecasting isn’t present.
Where it falls short:
- No per-virtual-key budget enforcement. Cost control is a single account billing limit.
- No semantic or exact cache at the gateway layer.
- Per-token markup adds up at 50M+ tokens / month.
- Closed source, no self-host.
Pricing: Per-token markup on provider rates; cloud only.
Score: 2 / 7 axes (missing: per-virtual-key caps, alert routing, downgrade-on-breach, cache UI, retry cap, burndown forecast, self-improving policy).
Capability matrix
| Axis | Future AGI | Portkey | Helicone | LiteLLM | OpenRouter |
|---|---|---|---|---|---|
| Per-session token attribution for Codex CLI | Native via session-ID span attr | Header (trace_id) | Header (Helicone-Session-Id) | Metadata pass-through | CSV export only |
| Per-developer budget caps | Native virtual key, 3-stage | Virtual key, hard cap | Rate limit + webhook | Team + user budgets | Account-level only |
| Alert routing | Per-cost-center routing UI | Slack/Teams native | Webhook (Slack standard) | Webhook (BYO destination) | Account-level alerts |
| Model downgrade on budget breach (gpt-5.1 → gpt-5.1-mini) | Native budget-aware policy | Wireable conditional (sprint) | Not present | Wireable via pre_call_check hook | Caller-side only |
| Cache hit-rate observability | First-class span attr, dashboard view | Partial (own cache only) | Partial (own cache only) | Header pass-through; bring own dashboard | Not present |
| Retry-budget interaction | Per-session retry cap + budget counter | Per-request only | Per-request only | Router num_retries only | Not modeled |
| Monthly burndown forecasting | P10 / P50 / P90 projection on day 10 | Not present | Not present | Not present | Not present |
| Self-improving budget policy | fi.opt auto-tunes thresholds | Not present | Not present | Not present | Not present |
Decision framework: Choose X if
Choose Future AGI Agent Command Center if budget breaches are a recurring monthly conversation, the FinOps lead is tired of paging the EM at 9pm on the 23rd, and the goal is to make next month’s policy automatically tighter than this month’s. agent-opt is opt-in, turn it on once Codex CLI has eval baselines and live traces flowing, and the cost curve compounds downward from there.
Choose Portkey if you want hosted polish, virtual-key budgets are enough for the year-one story, you can wire the downgrade-on-breach rules yourself, and you have a read on the Palo Alto Networks acquisition timeline.
Choose Helicone if the team is under 10 developers, the story is “daily cap + Slack ping,” policy gets re-tuned manually in standup, and the Mintlify acquisition timeline doesn’t bother you.
Choose LiteLLM if compliance forbids Codex CLI traffic leaving the VPC, Python is acceptable as a runtime, you can pin commit hashes past 1.83.7, and the platform team will write the downgrade-on-breach hook.
Choose OpenRouter if you’re a 3-5 person team A/B-testing Codex CLI against many models and per-developer budgets aren’t yet a procurement issue. Re-evaluate the moment a FinOps lead joins the conversation.
Common mistakes when wiring Codex CLI spend management
| Mistake | What goes wrong | Fix |
|---|---|---|
| Single hard cap at 100% | Codex CLI pauses mid-conversation; engineers route around by pasting prompts into ChatGPT.com | Three-stage cap: alert 80%, downgrade 95%, hard pause 110% |
| Capping at the OpenAI-key level instead of per-developer | One heavy engineer pauses the entire team’s key | Issue virtual keys per developer; underlying OpenAI key keeps the bulk discount |
| Alerts to a shared channel | Notifications get muted; the next breach goes unnoticed until day 19 | Per-cost-center routing to the EM responsible; FinOps gets the daily digest, not the per-event firehose |
| Not surfacing cache hit rate per session | Sessions with drifted cache keys pay full price while aggregate looks fine | Surface hit rate per session and per repo; flag any session under 30% for prompt-shape review |
| Not capping retries per session | Agent loops on a bash error and burns $80 in 4 minutes; per-request budget never sees it | Cap retries per session (default 3) with structured 429 “retry budget exhausted” body |
| Forecasting burndown linearly | Spend curves are not linear; weekends pull the average down; forecast undershoots | Forecast in P10 / P50 / P90 bands; act when P50 crosses budget |
| Static downgrade list | Fallback model gets deprecated; gateway errors at 95% threshold | Wire the cascade as a list (gpt-5.1 → gpt-5.1-mini → gpt-4.1-mini); update quarterly |
| Skipping the dry-run before shipping a new cap | First production day pauses three sessions the EM did not predict; trust drops | Run the proposed cap against the previous 30 days of traffic; iterate offline |
How Future AGI closes the loop on Codex CLI spend management
The other four gateways treat spend management as a static policy: write the cap, enforce it, re-tune when reality drifts. Future AGI treats the cap as the input to a self-improving policy. Six stages:
-
Trace. Every Codex CLI turn produces a span via
traceAI(Apache 2.0) capturing tokens, cost, model, session ID, cache state, retry count, and budget state. -
Evaluate.
fi.evalsscores every turn for code-correctness, faithfulness, and tool-use accuracy. The gateway knows which turns were “gpt-5.1called whengpt-5.1-miniwould have done it.” -
Cluster. High-cost-low-difficulty sessions cluster by failure mode. The spend-management cluster is the routine work that ate the budget, small inputs, simple tool calls, trivial diffs at flagship prices.
-
Optimize.
fi.opt.optimizers(ProTeGi, Bayesian, GEPA) reads breach + trace + eval history and emits a policy diff: “forplatform-squad, route turns under 8K input tokens togpt-5.1-minibetween 9am and 5pm, regression 0.4%, under 2% tolerance. Estimated monthly saving: $3,840.” -
Route. Gateway applies the policy on the next request, versioned. The squad goes from “we breach by 40% every month” to “87% with headroom.”
-
Re-deploy. New rule is signed; if eval scores regress, automatic rollback. Each deploy is an audit event tied to the approver’s IdP claim.
Net effect: a 30-engineer team starting at $28,000 / month with monthly breaches typically settles into a steady-state where the cap holds, monthly spend trends down 22-30% within four weeks, and the FinOps lead stops getting paged at 9pm on the 23rd.
Building blocks are open source: traceAI, ai-evaluation, agent-opt (all Apache 2.0 at github.com/future-agi). Agent Command Center adds the budget-policy UI, dry-run mode, per-cost-center alert routing, cache-hit dashboard, retry-budget primitives, burndown forecast, Protect guardrails (~67ms text, 109ms image per arXiv 2510.13351), RBAC, SOC 2 Type II certified, BAA available, and AWS Marketplace listing.
What we did not include
- Cloudflare AI Gateway. Strong edge primitives but budget-cap UX is rate-limit-first; dollar-denominated caps with automatic downgrade still require Worker code as of May 2026. Better as an edge-routing pick than a spend-management pick.
- Kong AI Gateway. Strong if you already run Kong for REST APIs, but AI-specific spend-management is plugin-driven and requires AI Proxy plugin 3.6+ plus a state-store wiring sprint. For teams not already on Kong, the cohort above is faster to ship.
- TrueFoundry. Solid ML-platform gateway with tenant-level cost centers, but the actuator side (auto-pause, downgrade-on-breach) is less mature than the tracking side. Better as a tracking pick than a management pick.
All three are worth a second look in Q3 2026.
Related reading
- Best 5 AI Gateways to Route Codex CLI to Any Model in 2026
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- Best AI Gateway for Claude Code Cost Management in 2026
- What Is an AI Gateway? The 2026 Definition
- Best LLM Cost Tracking Tools in 2026
Sources
- OpenAI Codex CLI documentation, github.com/openai/codex
- OpenAI Responses API + prompt caching, platform.openai.com/docs/guides/prompt-caching
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI traceAI, github.com/future-agi/traceAI
- Future AGI ai-evaluation, github.com/future-agi/ai-evaluation
- Future AGI agent-opt, github.com/future-agi/agent-opt
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Portkey AI gateway, portkey.ai
- Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey-to-secure-the-rise-of-ai-agents
- Helicone proxy, helicone.ai
- Mintlify acquisition of Helicone (March 3, 2026), mintlify.com/blog/helicone-joins-mintlify
- LiteLLM proxy, github.com/BerriAI/litellm
- Datadog Security Labs writeup on LiteLLM PyPI compromise (TeamPCP campaign, March 24, 2026), securitylabs.datadoghq.com/articles/litellm-compromised-pypi-teampcp-supply-chain-campaign
- OpenRouter models directory, openrouter.ai/models
Frequently asked questions
How is Codex CLI token spend management different from tracking?
How do I set a budget cap on Codex CLI without breaking developer flow?
Can the gateway automatically downgrade Codex CLI from gpt-5.1 to gpt-5.1-mini when over budget?
How do I track Codex CLI cost per developer when everyone shares one OpenAI key?
What is OpenAI's prompt-cache hit rate, and why does it matter?
Does the gateway count Codex CLI's tool-use retries against the budget?
Is it safe to send Codex CLI source code through an AI gateway?
How is Future AGI different from Portkey for Codex CLI spend management?
A Director of Engineering Productivity buyer's brief for the AI gateway in front of Codex CLI at 1000+ engineer scale. Three pillars — governance, cost, provider flexibility — scored across seven axes with five picks.
Five AI gateways scored for MCP tool-level observability with Codex CLI in 2026: per-tool latency, tool-call success rate, argument validation, MCP server auth, and where each one falls short.
A 2026 architecture essay on why MCP traffic blows up coding-agent token bills in Claude Code and Codex CLI — and the five named mechanisms by which an MCP gateway compresses the cost.