Best 5 AI Gateways for MCP Tool-Level Observability with Codex CLI in 2026
Five AI gateways scored for MCP tool-level observability with Codex CLI in 2026: per-tool latency, tool-call success rate, argument validation, MCP server auth, and where each one falls short.
Table of Contents
A Codex CLI session wired to four MCP servers (filesystem, git, postgres, custom) and running forty minutes issues several hundred tool calls. Half are the model probing the filesystem. A handful actually moved the work forward. One, in April 2026, was the call that handed an attacker AWS keys through the STDIO transport.
Codex CLI’s own logs tell you the model said something. They don’t tell you which MCP tool was called, what arguments it received, whether it returned in 40 ms or 4 seconds, or whether the argument string contained a prompt-injection payload. Tool-level observability is the missing layer, and after the April 15, 2026 OX Security disclosure it’s also a security requirement.
An AI gateway fixes this when, and only when, it speaks MCP on the tool side. Most AI gateways only see the LLM call and treat the tool round-trip as opaque. The five in this post parse MCP, attach per-tool span attributes, and emit tool-level traces. This is the 2026 cohort, scored on the seven axes that matter when MCP traffic is the workload and Codex CLI is the client.
TL;DR
Future AGI Agent Command Center is the strongest pick for an AI gateway for Codex CLI MCP tool observability because it captures every MCP invocation as a structured OpenTelemetry span (tool, server, full argument payload, start/end/duration), runs Protect’s MCP Security scanner inline on every tools/call blocking the prompt-injection, secret-leak, and tool-poisoning patterns the April 2026 OX Security disclosure surfaced, enforces OAuth 2.1 at the boundary, and ships per-agent tool allowlists. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Per-tool spans, inline MCP Security scanner, OAuth 2.1 boundary, and foundation-aligned (Linux Foundation Agentic Trust).
- Portkey — Best for hosted MCP observability with virtual keys and a polished UI. Mature MCP tool-call traces plus RBAC (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- LiteLLM — Best when Codex CLI traffic cannot leave the VPC and the security team wants to read the source. Self-hosted Python-native proxy with MCP gateway mode; pin commits after the March 24, 2026 PyPI compromise.
- Kong AI Gateway — Best if you already run Kong. The MCP extension fits the existing operational story with API-gateway-grade SLA.
- Maxim Bifrost — Best when tool-call concurrency is the binding constraint. Vendor-published ~11 µs P50 at 5,000 RPS with MCP plus LLM routing in one binary.
Why Codex CLI needs an MCP gateway in front of it
Codex CLI is OpenAI’s terminal coding agent. MCP support lives in ~/.codex/config.toml; registered servers become tools the model can call.
Three properties make Codex CLI MCP traffic hard to monitor without a gateway:
-
Tool calls outnumber model calls by an order of magnitude. A single prompt to fix a flaky test issues 30 to 80 MCP tool calls. The model call count is single digits. Instrument only the LLM call and the failure mode sits in the other ninety percent.
-
The CLI’s own logging is text, not structured spans. Codex CLI prints a one-line summary per call. No built-in span tree, no per-tool latency histogram, no per-argument validation log.
-
MCP servers are an authentication surface. Every server is its own auth boundary. Without a gateway, the Codex CLI process holds direct OAuth tokens for every server it touches, and audit logs sit on each server separately. The April 15, 2026 STDIO RCE class (OX Security) made centralizing this at a gateway the practical production requirement.
An MCP gateway sits between Codex CLI and the downstream servers, intercepts every discovery and invocation, attaches span attributes, and forwards. That interception point is where observability, OAuth 2.1 enforcement, tool allowlists, and argument-level guardrails all happen. Codex CLI points at the gateway via OPENAI_BASE_URL plus the MCP federation endpoint.
The 7 axes we score on
The default MCP gateway axes (transport, OAuth, policy, federation, audit, deployment, license) are the right starting point. For Codex CLI tool-level observability, we tightened them into seven coding-agent-aware axes.
| Axis | What it measures |
|---|---|
| 1. Per-tool latency capture | Span per MCP tool call with start, end, duration |
| 2. Tool-call success-rate aggregation | Slice success or failure by tool, MCP server, session |
| 3. Argument validation + guardrails | Inline scanning for prompt injection, secrets, PII |
| 4. MCP server registration + auth | OAuth 2.1 boundary plus per-agent tool allowlists |
| 5. Codex CLI session correlation | MCP spans linked to the OpenAI Responses API call |
| 6. STDIO mitigation posture | Block or sanitize the April 2026 STDIO RCE class |
| 7. Self-host posture | Runs in your VPC so prompts, code, tool outputs never leave |
Verdict line at the end of each pick scores all seven.
How we picked
We started from the universe of AI gateways that, as of May 2026, advertise both an OpenAI-compatible endpoint and an MCP gateway surface. We removed gateways that proxy only the LLM call and treat MCP as opaque, and gateways whose MCP support is STDIO-only without sanitization (the April 2026 RCE class makes this disqualifying). Helicone is excluded after its March 3, 2026 Mintlify acquisition shifted its roadmap toward documentation. The five that remain ship MCP gateway features production teams can use today.
1. Future AGI Agent Command Center: Best for per-tool MCP observability with inline security scanning
Verdict: Future AGI captures every MCP invocation as a structured OpenTelemetry span (tool, server, full argument payload, start/end/duration) and runs Protect’s MCP Security scanner inline on every tools/call, blocking the prompt-injection, secret-leak, and tool-poisoning patterns the April 2026 OX Security disclosure surfaced. OAuth 2.1 at the boundary, per-agent tool allowlists, and scope rewriting keep Codex CLI from ever holding raw downstream tokens.
What it gives Codex CLI users:
- Per-tool latency capture through
traceAI(Apache 2.0). Each MCP invocation becomes an OTel span with start, end, duration, tool, server, full argument payload. Drops into Grafana, Datadog, any OTLP backend. - Tool-call success-rate aggregation through native span-attribute slicing. Group by
mcp.tool.name,mcp.server.id,session.id. “filesystem.read failed 4.2 percent of the time on paths over 280 characters” is a fix; “the agent is flaky” isn’t. - Argument validation inline through the Future AGI Protect model family. Protect is FAGI’s own fine-tuned model family built on Google’s Gemma 3n with specialized adapters across four safety dimensions (content moderation, bias detection, security/prompt-injection, data privacy/PII), natively multi-modal across text, image, and audio, a model family, not a plugin chain. Scans for prompt injection, secrets, PII, and the MCP tool-poisoning patterns the April 2026 disclosure surfaced at ~67 ms p50 text and ~109 ms p50 image (arXiv 2510.13351); same dimensions reusable as offline eval metrics so the prod policy and the eval rubric stay in sync.
- MCP server auth through OAuth 2.1 at the boundary, per-agent tool allowlists, scope rewriting. Codex CLI never holds raw downstream tokens.
- Session correlation through
span_idlinking the OpenAI Responses API call to every MCP call. The full session is one tree. - STDIO mitigation through the dedicated MCP Security scanner inside the 18+ scanner library.
- Self-host posture through the Apache 2.0 single Go binary; Docker, Kubernetes, AWS, GCP, Azure, air-gapped, on-prem.
The loop. Captured traces are scored by ai-evaluation (faithfulness, tool-call accuracy, code-correctness). traceAI instruments 35+ frameworks OpenInference-natively, and Error Feed (FAGI’s “Sentry for AI agents”) sits alongside as the zero-config error monitor: auto-clusters related MCP tool-call failures into named issues (50 traces → 1 issue), auto-writes the root cause plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue so a flaky tool surfaces like an exception rather than buried in span search. Low-scoring sessions cluster by failure mode. agent-opt (Apache 2.0; ProTeGi, Bayesian, GEPA) rewrites the system prompt or tool-routing policy. Typical Codex CLI rewrite: stop calling filesystem.list recursively from the project root when 91 percent of returned bytes are ignored. No other gateway here implements this loop.
Where it falls short:
-
agent-opt is opt-in, for one-week pilots where the brief is per-tool dashboards only, start with traceAI + ai-evaluation and turn the optimizer on once eval baselines stabilize.
-
Managed MCP catalog is smaller than Composio’s. Pair with Composio for 250+ pre-built servers.
Pricing: Free tier 100K traces / month. Scale tier $99/month. Enterprise custom with SOC 2 Type II, HIPAA BAA, AWS Marketplace.
Score: 7/7 axes.
2. Portkey: Best for hosted MCP gateway with mature RBAC
Verdict: Portkey is the most polished hosted-only product in this category. It speaks MCP on the tool side, ships per-tool span attributes through its trace API, and has the cleanest virtual-key story for per-developer chargeback. No optimizer loop. The April 30, 2026 Palo Alto Networks acquisition (close PANW fiscal Q4 2026; roadmap merging into Prisma AIRS) is a procurement signal.
What it gives Codex CLI users:
- Per-tool latency capture through Portkey’s MCP trace surface. Per-tool latency histogram rendered natively in the dashboard.
- Tool-call success-rate aggregation through the analytics view extended to MCP. Group-by tool and server works out of the box.
- Argument validation and guardrails through Portkey’s Guardrails plugin set. Narrower than Future AGI’s 18+ scanner library.
- MCP server registration and auth through Portkey’s MCP virtual servers. OAuth 2.1 at the gateway boundary; per-agent allowlists in the console.
- Codex CLI session correlation through
trace_idheader, the wrapper has to set it. - STDIO mitigation via Streamable HTTP default. Correct post-April 2026 posture.
- Self-host posture through Portkey’s BYOC option. Good for compliance, not air-gapped.
Where it falls short:
- No optimizer.
- Palo Alto Networks acquisition merges the AI gateway roadmap into Prisma AIRS. Verify MCP feature continuity.
- Pricing escalates above 5M MCP calls/month faster than self-hosted alternatives.
Pricing: Free tier 10K requests/day. Scale tier $99/month. Enterprise custom with SOC 2 Type II.
Score: 6/7 axes (missing: feedback loop / optimizer).
3. LiteLLM: Best for self-hosted Python-native MCP gateway
Verdict: LiteLLM is the pick when Codex CLI traffic can’t leave the VPC and the security team wants to read the source. It ships MCP gateway mode plus the LLM proxy. Two 2026 caveats are non-negotiable: the March 24, 2026 PyPI supply-chain incident (1.82.7 and 1.82.8 compromised; pin to 1.82.6 or upgrade past 1.83.7) and CVE-2026-30623 authenticated MCP STDIO command injection (fixed in 1.83.7-stable).
What it gives Codex CLI users:
- Per-tool latency capture through metadata pass-through. Standard OTLP spans; pipe into Datadog or internal Tempo.
- Tool-call success-rate aggregation through spend-tracking and metrics tables in PostgreSQL. Slicing by tool typically means a SQL dashboard.
- Argument validation and guardrails through LiteLLM’s guardrails interface. Extensible but most scanners are wired yourself, or paired with Future AGI Protect.
- MCP server registration and auth through virtual keys extended to MCP. OAuth 2.1 supported but configured in code.
- Codex CLI session correlation through
metadata.trace_idandmetadata.session_id. - STDIO mitigation is the operationally fraught axis. Pin to 1.83.7-stable or later before exposing MCP to autonomous agents.
- Self-host posture is the strongest in this list. MIT-licensed Python proxy; auditable end-to-end in a week.
Where it falls short:
- No optimizer.
- MCP tool-call UI is functional, not polished. Wire
traceAIor another OTel sink for the dashboard engineering will actually use. - 2026 trust events make version hygiene a permanent operational task.
Pricing: MIT open source. Enterprise (SLA, SSO, audit) starts ~$250/month.
Score: 5.5/7 axes (missing: polished dashboard, optimizer).
4. Kong AI Gateway: Best if you already run Kong
Verdict: Kong AI Gateway is the pick when the platform team already runs Kong for REST APIs and the path of least resistance is to extend the stack with AI plus MCP plugins. Strengths: SLA, plugin ecosystem, ops familiarity. Weakness: MCP observability is plugin-driven (AI Proxy since Kong 3.6 plus MCP-aware additions), not native.
What it gives Codex CLI users:
- Per-tool latency capture through the OpenTelemetry plugin plus AI Proxy MCP extension. Lives in your OTel backend, not Kong’s console.
- Tool-call success-rate aggregation through consumer + tag patterns. The view lives in Grafana on top of the OTel sink.
- Argument validation and guardrails through Kong’s AI plugin library. MCP tool-poisoning detection is assembled from plugins.
- MCP server registration and auth through Kong consumers and the OAuth 2.0 plugin. Operational story is mature.
- Codex CLI session correlation through OTel trace context propagation.
- STDIO mitigation through transport policy. Plugin configuration, not a built-in scanner.
- Self-host posture is the entire point of Kong.
Where it falls short:
- Observability is plugin-driven, not native. Plan two weeks of platform-team time for the per-tool MCP dashboard.
- No optimizer.
- MCP scanner depth is shallower than Future AGI or Portkey.
Pricing: Open source. Kong Konnect starts free. Enterprise from ~$1.5K/month.
Score: 5/7 axes (missing: native MCP dashboard, optimizer, polished tool-call view).
5. Maxim Bifrost: Best for Go-native MCP throughput
Verdict: Bifrost is the Apache 2.0 Go binary that ships LLM routing and MCP gateway in one process. Maxim’s own benchmark publishes 11 microsecond P50 at 5,000 RPS on t3.xlarge (mock 60 ms upstream, treat as gateway overhead, not end-to-end). For high-concurrency Codex CLI workloads, overhead matters; Bifrost’s is the lowest specific figure on this list.
What it gives Codex CLI users:
- Per-tool latency capture through OTel exporter on the MCP path. Per-tool spans emitted; dashboard polish is thinner than Future AGI’s.
- Tool-call success-rate aggregation through per-tool metrics. MCP plus LLM unification in one console is genuinely useful.
- Argument validation and guardrails through Bifrost’s guardrail surface. Partial, not a 18+ scanner library and not a dedicated MCP Security scanner.
- MCP server registration and auth through Bifrost’s MCP config. OAuth 2.1 supported; per-agent allowlists documented but less polished than Portkey or Future AGI.
- Codex CLI session correlation through trace context propagation.
- STDIO mitigation through transport config. Default is Streamable HTTP.
- Self-host posture is strong; single Go binary, Apache 2.0.
Bifrost also ships “Code Mode”, vendor-published MCP token-reduction claiming up to 92.8 percent input-token reduction across 508 tools on 16 MCP servers in their own harness. Reproduce on your fleet before underwriting.
Where it falls short:
- Maxim self-ranks Bifrost number one across its own listicles without a published “where it falls short” block. Trust signal worth weighing.
- MCP dashboards are thinner than Future AGI’s; teams end up writing custom OTel exporters.
- No optimizer. Throughput pitch and “contain the blast radius” pitch are different problems.
Pricing: Apache 2.0. Commercial cloud tier on request.
Score: 5.5/7 axes (missing: deep MCP guardrail library, native optimizer).
Capability matrix
| Axis | Future AGI | Portkey | LiteLLM | Kong AI Gateway | Maxim Bifrost |
|---|---|---|---|---|---|
| Per-tool latency capture | Native (traceAI) | Native | OTLP export | Plugin | Native |
| Tool-call success aggregation | Native | Native | SQL view | Grafana | Native |
| Argument validation + guardrails | 18+ scanners + MCP Security | Plugins | Extensible | Plugins | Partial |
| MCP server auth + allowlists | OAuth 2.1 + scope rewrite | OAuth 2.1 + console | OAuth 2.1 in code | Consumer + OAuth plugin | OAuth 2.1 |
| Codex CLI session correlation | span_id linkage | trace_id header | metadata pass-through | OTel context | Trace propagation |
| STDIO mitigation posture | MCP Security scanner | Streamable HTTP default | Patch hygiene required (1.83.7+) | Transport policy | Streamable HTTP default |
| Self-host | Apache 2.0 Go binary | BYOC | MIT Python | OSS Kong | Apache 2.0 Go binary |
| Feedback loop / optimizer | agent-opt (Apache 2.0) | No | No | No | No |
Decision framework: Choose X if
Choose Future AGI if tool-call traces should drive prompt and routing optimization, and if a dedicated MCP Security scanner inline on every call is part of the production requirement.
Choose Portkey if you want a hosted gateway with mature RBAC, virtual keys, and a polished UI for MCP traces, and you don’t need the optimizer. Confirm multi-year roadmap continuity given the Palo Alto Networks acquisition.
Choose LiteLLM if your security team requires Codex CLI traffic to never leave the VPC and you can operate the version hygiene the 2026 incidents made permanent.
Choose Kong AI Gateway if you already operate Kong for REST APIs and platform-team familiarity outweighs AI-specific shallowness. Plan the dashboard work explicitly.
Choose Maxim Bifrost if tool-call concurrency is the binding constraint and gateway overhead matters per call. Trade-off: shallower MCP guardrail depth and a vendor narrative without a published limitations block.
Common mistakes when wiring Codex CLI through an MCP gateway
| Mistake | Fix |
|---|---|
| Pointing only the OpenAI side at the gateway, leaving MCP direct | Wire both OPENAI_BASE_URL and the MCP federation endpoint |
| Leaving STDIO as the default MCP transport | Set Streamable HTTP default; STDIO only for allowlisted local servers |
| Sharing one MCP token across all Codex CLI users | Issue per-agent OAuth 2.1 identities; gateway rewrites scopes downstream |
| Not pinning the LiteLLM version | Pin to 1.82.6 or upgrade past 1.83.7 |
Tagging only user_id, not session_id plus tool_name | Tag user, session, tool, MCP server; the four-tuple makes failure clusters legible |
| Skipping argument-level guardrails because LLM guardrails are on | Tool-poisoning targets the argument string, not the model prompt — scan the MCP path |
| Buffering streaming tool responses | Confirm SSE pass-through on the MCP path without buffer-and-batch |
How Future AGI closes the loop on Codex CLI MCP tool calls
The other four gateways treat tool-level observability as an end state: capture the span, show it in a dashboard, alert on threshold breach. Future AGI treats it as the input to a feedback loop.
- Trace.
traceAI(Apache 2.0). Parent span: OpenAI Responses API call. Children: MCP tool calls. Spans capture inputs, outputs, arguments, model, server, session. - Evaluate.
ai-evaluation(Apache 2.0) scores every span. FAGI ships a 50+ built-in rubric catalog (task-completion, tool-call accuracy, faithfulness, code-correctness, structured-output, hallucination, agentic surfaces, instruction-following, groundedness), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code and MCP schema, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (Galileo Luna-2 cost economics, rubric-flexible). Catalog is the floor, not the ceiling. - Cluster. Low-scoring sessions cluster by failure mode. Common Codex CLI pattern: the agent called
filesystem.readon a binary file, got 200KB of unreadable bytes into context, then made three more wasteful tool calls before timing out. - Optimize.
agent-opt(Apache 2.0; ProTeGi, Bayesian, GEPA) rewrites the system prompt or tool-routing policy. Typical rewrite: drop 14 unused tools from the discovery response. - Route. Agent Command Center applies the new policy on the next session. Protect guardrails (~67 ms text, arXiv 2510.13351) run inline.
- Re-deploy. Prompt and allowlist versioned. Eval regression triggers automatic rollback.
Net effect: a Codex CLI team starting with 12 percent of tool calls returning failure or wasted output typically sees that rate drop to 2-4 percent within four weeks.
Building blocks (Apache 2.0): traceAI, ai-evaluation, agent-opt at github.com/future-agi. The hosted Agent Command Center adds the failure-cluster view, live Protect guardrails, the MCP Security scanner, RBAC, SOC 2 Type II certified, and AWS Marketplace listing.
What we did not include
- Helicone. Strong on LLM observability but post-Mintlify (March 3, 2026) the MCP roadmap is downstream of a documentation product.
- Composio. Outstanding managed MCP catalog when integration breadth is the binding constraint; pair with one of the picks above for guardrails.
- Cloudflare AI Gateway. Strong LLM primitives but the MCP surface is still thin as of May 2026.
All three are worth a second look in Q3 2026.
Related reading
- Best 5 MCP Gateways in 2026: Post-RCE Production Picks
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- What Is an AI Gateway? The 2026 Definition
- Best AI Gateways for Agentic AI in 2026
Sources
- OpenAI Codex CLI documentation and config reference, platform.openai.com/docs/codex-cli
- Model Context Protocol specification 2025-11-25, modelcontextprotocol.io/specification/2025-11-25
- OX Security advisory on MCP STDIO RCE class (April 15, 2026), ox.security/blog/mcp-supply-chain-advisory-rce-vulnerabilities-across-the-ai-ecosystem
- Future AGI Agent Command Center docs, docs.futureagi.com/docs/command-center
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Portkey AI gateway, portkey.ai
- Palo Alto Networks Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey
- LiteLLM proxy and MCP advisory (CVE-2026-30623), docs.litellm.ai/blog/mcp-stdio-command-injection-april-2026
- LiteLLM PyPI supply-chain advisory (March 2026), docs.litellm.ai/blog/security-update-march-2026
- Kong AI Gateway, konghq.com/products/kong-ai-gateway
- Maxim Bifrost benchmarks, getmaxim.ai/bifrost/resources/benchmarks
Frequently asked questions
What is the cheapest way to monitor Codex CLI MCP tool calls?
Does Codex CLI support MCP through its config file?
Can I route Codex CLI through multiple model providers?
How do I track Codex CLI tool-call cost per developer when everyone shares one MCP token?
What happens to MCP tool calls when Codex CLI runs through a gateway?
Is it safe to send source code through an MCP gateway?
How is Future AGI different from Portkey for Codex CLI MCP traffic?
Did the April 2026 MCP RCE class change how teams wire MCP gateways?
A 2026 architecture essay on why MCP traffic blows up coding-agent token bills in Claude Code and Codex CLI — and the five named mechanisms by which an MCP gateway compresses the cost.
A Director of Engineering Productivity buyer's brief for the AI gateway in front of Codex CLI at 1000+ engineer scale. Three pillars — governance, cost, provider flexibility — scored across seven axes with five picks.
Future AGI vs LangSmith scored on tracing, evaluation, prompt management, deployment, security, and developer experience. Honest verdict, May 2026 pricing, where each one falls short, and why only one closes the loop.