Guides

Best 5 AI Gateways for MCP Tool-Level Observability with Codex CLI in 2026

Five AI gateways scored for MCP tool-level observability with Codex CLI in 2026: per-tool latency, tool-call success rate, argument validation, MCP server auth, and where each one falls short.

·
16 min read
ai-gateway 2026 codex-cli mcp llm-observability
Editorial cover image for Best 5 AI Gateways for MCP Tool-Level Observability with Codex CLI in 2026
Table of Contents

A Codex CLI session wired to four MCP servers (filesystem, git, postgres, custom) and running forty minutes issues several hundred tool calls. Half are the model probing the filesystem. A handful actually moved the work forward. One, in April 2026, was the call that handed an attacker AWS keys through the STDIO transport.

Codex CLI’s own logs tell you the model said something. They don’t tell you which MCP tool was called, what arguments it received, whether it returned in 40 ms or 4 seconds, or whether the argument string contained a prompt-injection payload. Tool-level observability is the missing layer, and after the April 15, 2026 OX Security disclosure it’s also a security requirement.

An AI gateway fixes this when, and only when, it speaks MCP on the tool side. Most AI gateways only see the LLM call and treat the tool round-trip as opaque. The five in this post parse MCP, attach per-tool span attributes, and emit tool-level traces. This is the 2026 cohort, scored on the seven axes that matter when MCP traffic is the workload and Codex CLI is the client.


TL;DR

Future AGI Agent Command Center is the strongest pick for an AI gateway for Codex CLI MCP tool observability because it captures every MCP invocation as a structured OpenTelemetry span (tool, server, full argument payload, start/end/duration), runs Protect’s MCP Security scanner inline on every tools/call blocking the prompt-injection, secret-leak, and tool-poisoning patterns the April 2026 OX Security disclosure surfaced, enforces OAuth 2.1 at the boundary, and ships per-agent tool allowlists. The other four picks below win on specific edges.

  1. Future AGI Agent Command Center — Best overall. Per-tool spans, inline MCP Security scanner, OAuth 2.1 boundary, and foundation-aligned (Linux Foundation Agentic Trust).
  2. Portkey — Best for hosted MCP observability with virtual keys and a polished UI. Mature MCP tool-call traces plus RBAC (verify the Palo Alto Networks acquisition timeline before signing multi-year).
  3. LiteLLM — Best when Codex CLI traffic cannot leave the VPC and the security team wants to read the source. Self-hosted Python-native proxy with MCP gateway mode; pin commits after the March 24, 2026 PyPI compromise.
  4. Kong AI Gateway — Best if you already run Kong. The MCP extension fits the existing operational story with API-gateway-grade SLA.
  5. Maxim Bifrost — Best when tool-call concurrency is the binding constraint. Vendor-published ~11 µs P50 at 5,000 RPS with MCP plus LLM routing in one binary.

Why Codex CLI needs an MCP gateway in front of it

Codex CLI is OpenAI’s terminal coding agent. MCP support lives in ~/.codex/config.toml; registered servers become tools the model can call.

Three properties make Codex CLI MCP traffic hard to monitor without a gateway:

  1. Tool calls outnumber model calls by an order of magnitude. A single prompt to fix a flaky test issues 30 to 80 MCP tool calls. The model call count is single digits. Instrument only the LLM call and the failure mode sits in the other ninety percent.

  2. The CLI’s own logging is text, not structured spans. Codex CLI prints a one-line summary per call. No built-in span tree, no per-tool latency histogram, no per-argument validation log.

  3. MCP servers are an authentication surface. Every server is its own auth boundary. Without a gateway, the Codex CLI process holds direct OAuth tokens for every server it touches, and audit logs sit on each server separately. The April 15, 2026 STDIO RCE class (OX Security) made centralizing this at a gateway the practical production requirement.

An MCP gateway sits between Codex CLI and the downstream servers, intercepts every discovery and invocation, attaches span attributes, and forwards. That interception point is where observability, OAuth 2.1 enforcement, tool allowlists, and argument-level guardrails all happen. Codex CLI points at the gateway via OPENAI_BASE_URL plus the MCP federation endpoint.


The 7 axes we score on

The default MCP gateway axes (transport, OAuth, policy, federation, audit, deployment, license) are the right starting point. For Codex CLI tool-level observability, we tightened them into seven coding-agent-aware axes.

AxisWhat it measures
1. Per-tool latency captureSpan per MCP tool call with start, end, duration
2. Tool-call success-rate aggregationSlice success or failure by tool, MCP server, session
3. Argument validation + guardrailsInline scanning for prompt injection, secrets, PII
4. MCP server registration + authOAuth 2.1 boundary plus per-agent tool allowlists
5. Codex CLI session correlationMCP spans linked to the OpenAI Responses API call
6. STDIO mitigation postureBlock or sanitize the April 2026 STDIO RCE class
7. Self-host postureRuns in your VPC so prompts, code, tool outputs never leave

Verdict line at the end of each pick scores all seven.


How we picked

We started from the universe of AI gateways that, as of May 2026, advertise both an OpenAI-compatible endpoint and an MCP gateway surface. We removed gateways that proxy only the LLM call and treat MCP as opaque, and gateways whose MCP support is STDIO-only without sanitization (the April 2026 RCE class makes this disqualifying). Helicone is excluded after its March 3, 2026 Mintlify acquisition shifted its roadmap toward documentation. The five that remain ship MCP gateway features production teams can use today.


1. Future AGI Agent Command Center: Best for per-tool MCP observability with inline security scanning

Verdict: Future AGI captures every MCP invocation as a structured OpenTelemetry span (tool, server, full argument payload, start/end/duration) and runs Protect’s MCP Security scanner inline on every tools/call, blocking the prompt-injection, secret-leak, and tool-poisoning patterns the April 2026 OX Security disclosure surfaced. OAuth 2.1 at the boundary, per-agent tool allowlists, and scope rewriting keep Codex CLI from ever holding raw downstream tokens.

What it gives Codex CLI users:

  • Per-tool latency capture through traceAI (Apache 2.0). Each MCP invocation becomes an OTel span with start, end, duration, tool, server, full argument payload. Drops into Grafana, Datadog, any OTLP backend.
  • Tool-call success-rate aggregation through native span-attribute slicing. Group by mcp.tool.name, mcp.server.id, session.id. “filesystem.read failed 4.2 percent of the time on paths over 280 characters” is a fix; “the agent is flaky” isn’t.
  • Argument validation inline through the Future AGI Protect model family. Protect is FAGI’s own fine-tuned model family built on Google’s Gemma 3n with specialized adapters across four safety dimensions (content moderation, bias detection, security/prompt-injection, data privacy/PII), natively multi-modal across text, image, and audio, a model family, not a plugin chain. Scans for prompt injection, secrets, PII, and the MCP tool-poisoning patterns the April 2026 disclosure surfaced at ~67 ms p50 text and ~109 ms p50 image (arXiv 2510.13351); same dimensions reusable as offline eval metrics so the prod policy and the eval rubric stay in sync.
  • MCP server auth through OAuth 2.1 at the boundary, per-agent tool allowlists, scope rewriting. Codex CLI never holds raw downstream tokens.
  • Session correlation through span_id linking the OpenAI Responses API call to every MCP call. The full session is one tree.
  • STDIO mitigation through the dedicated MCP Security scanner inside the 18+ scanner library.
  • Self-host posture through the Apache 2.0 single Go binary; Docker, Kubernetes, AWS, GCP, Azure, air-gapped, on-prem.

The loop. Captured traces are scored by ai-evaluation (faithfulness, tool-call accuracy, code-correctness). traceAI instruments 35+ frameworks OpenInference-natively, and Error Feed (FAGI’s “Sentry for AI agents”) sits alongside as the zero-config error monitor: auto-clusters related MCP tool-call failures into named issues (50 traces → 1 issue), auto-writes the root cause plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue so a flaky tool surfaces like an exception rather than buried in span search. Low-scoring sessions cluster by failure mode. agent-opt (Apache 2.0; ProTeGi, Bayesian, GEPA) rewrites the system prompt or tool-routing policy. Typical Codex CLI rewrite: stop calling filesystem.list recursively from the project root when 91 percent of returned bytes are ignored. No other gateway here implements this loop.

Where it falls short:

  • agent-opt is opt-in, for one-week pilots where the brief is per-tool dashboards only, start with traceAI + ai-evaluation and turn the optimizer on once eval baselines stabilize.

  • Managed MCP catalog is smaller than Composio’s. Pair with Composio for 250+ pre-built servers.

Pricing: Free tier 100K traces / month. Scale tier $99/month. Enterprise custom with SOC 2 Type II, HIPAA BAA, AWS Marketplace.

Score: 7/7 axes.


2. Portkey: Best for hosted MCP gateway with mature RBAC

Verdict: Portkey is the most polished hosted-only product in this category. It speaks MCP on the tool side, ships per-tool span attributes through its trace API, and has the cleanest virtual-key story for per-developer chargeback. No optimizer loop. The April 30, 2026 Palo Alto Networks acquisition (close PANW fiscal Q4 2026; roadmap merging into Prisma AIRS) is a procurement signal.

What it gives Codex CLI users:

  • Per-tool latency capture through Portkey’s MCP trace surface. Per-tool latency histogram rendered natively in the dashboard.
  • Tool-call success-rate aggregation through the analytics view extended to MCP. Group-by tool and server works out of the box.
  • Argument validation and guardrails through Portkey’s Guardrails plugin set. Narrower than Future AGI’s 18+ scanner library.
  • MCP server registration and auth through Portkey’s MCP virtual servers. OAuth 2.1 at the gateway boundary; per-agent allowlists in the console.
  • Codex CLI session correlation through trace_id header, the wrapper has to set it.
  • STDIO mitigation via Streamable HTTP default. Correct post-April 2026 posture.
  • Self-host posture through Portkey’s BYOC option. Good for compliance, not air-gapped.

Where it falls short:

  • No optimizer.
  • Palo Alto Networks acquisition merges the AI gateway roadmap into Prisma AIRS. Verify MCP feature continuity.
  • Pricing escalates above 5M MCP calls/month faster than self-hosted alternatives.

Pricing: Free tier 10K requests/day. Scale tier $99/month. Enterprise custom with SOC 2 Type II.

Score: 6/7 axes (missing: feedback loop / optimizer).


3. LiteLLM: Best for self-hosted Python-native MCP gateway

Verdict: LiteLLM is the pick when Codex CLI traffic can’t leave the VPC and the security team wants to read the source. It ships MCP gateway mode plus the LLM proxy. Two 2026 caveats are non-negotiable: the March 24, 2026 PyPI supply-chain incident (1.82.7 and 1.82.8 compromised; pin to 1.82.6 or upgrade past 1.83.7) and CVE-2026-30623 authenticated MCP STDIO command injection (fixed in 1.83.7-stable).

What it gives Codex CLI users:

  • Per-tool latency capture through metadata pass-through. Standard OTLP spans; pipe into Datadog or internal Tempo.
  • Tool-call success-rate aggregation through spend-tracking and metrics tables in PostgreSQL. Slicing by tool typically means a SQL dashboard.
  • Argument validation and guardrails through LiteLLM’s guardrails interface. Extensible but most scanners are wired yourself, or paired with Future AGI Protect.
  • MCP server registration and auth through virtual keys extended to MCP. OAuth 2.1 supported but configured in code.
  • Codex CLI session correlation through metadata.trace_id and metadata.session_id.
  • STDIO mitigation is the operationally fraught axis. Pin to 1.83.7-stable or later before exposing MCP to autonomous agents.
  • Self-host posture is the strongest in this list. MIT-licensed Python proxy; auditable end-to-end in a week.

Where it falls short:

  • No optimizer.
  • MCP tool-call UI is functional, not polished. Wire traceAI or another OTel sink for the dashboard engineering will actually use.
  • 2026 trust events make version hygiene a permanent operational task.

Pricing: MIT open source. Enterprise (SLA, SSO, audit) starts ~$250/month.

Score: 5.5/7 axes (missing: polished dashboard, optimizer).


4. Kong AI Gateway: Best if you already run Kong

Verdict: Kong AI Gateway is the pick when the platform team already runs Kong for REST APIs and the path of least resistance is to extend the stack with AI plus MCP plugins. Strengths: SLA, plugin ecosystem, ops familiarity. Weakness: MCP observability is plugin-driven (AI Proxy since Kong 3.6 plus MCP-aware additions), not native.

What it gives Codex CLI users:

  • Per-tool latency capture through the OpenTelemetry plugin plus AI Proxy MCP extension. Lives in your OTel backend, not Kong’s console.
  • Tool-call success-rate aggregation through consumer + tag patterns. The view lives in Grafana on top of the OTel sink.
  • Argument validation and guardrails through Kong’s AI plugin library. MCP tool-poisoning detection is assembled from plugins.
  • MCP server registration and auth through Kong consumers and the OAuth 2.0 plugin. Operational story is mature.
  • Codex CLI session correlation through OTel trace context propagation.
  • STDIO mitigation through transport policy. Plugin configuration, not a built-in scanner.
  • Self-host posture is the entire point of Kong.

Where it falls short:

  • Observability is plugin-driven, not native. Plan two weeks of platform-team time for the per-tool MCP dashboard.
  • No optimizer.
  • MCP scanner depth is shallower than Future AGI or Portkey.

Pricing: Open source. Kong Konnect starts free. Enterprise from ~$1.5K/month.

Score: 5/7 axes (missing: native MCP dashboard, optimizer, polished tool-call view).


5. Maxim Bifrost: Best for Go-native MCP throughput

Verdict: Bifrost is the Apache 2.0 Go binary that ships LLM routing and MCP gateway in one process. Maxim’s own benchmark publishes 11 microsecond P50 at 5,000 RPS on t3.xlarge (mock 60 ms upstream, treat as gateway overhead, not end-to-end). For high-concurrency Codex CLI workloads, overhead matters; Bifrost’s is the lowest specific figure on this list.

What it gives Codex CLI users:

  • Per-tool latency capture through OTel exporter on the MCP path. Per-tool spans emitted; dashboard polish is thinner than Future AGI’s.
  • Tool-call success-rate aggregation through per-tool metrics. MCP plus LLM unification in one console is genuinely useful.
  • Argument validation and guardrails through Bifrost’s guardrail surface. Partial, not a 18+ scanner library and not a dedicated MCP Security scanner.
  • MCP server registration and auth through Bifrost’s MCP config. OAuth 2.1 supported; per-agent allowlists documented but less polished than Portkey or Future AGI.
  • Codex CLI session correlation through trace context propagation.
  • STDIO mitigation through transport config. Default is Streamable HTTP.
  • Self-host posture is strong; single Go binary, Apache 2.0.

Bifrost also ships “Code Mode”, vendor-published MCP token-reduction claiming up to 92.8 percent input-token reduction across 508 tools on 16 MCP servers in their own harness. Reproduce on your fleet before underwriting.

Where it falls short:

  • Maxim self-ranks Bifrost number one across its own listicles without a published “where it falls short” block. Trust signal worth weighing.
  • MCP dashboards are thinner than Future AGI’s; teams end up writing custom OTel exporters.
  • No optimizer. Throughput pitch and “contain the blast radius” pitch are different problems.

Pricing: Apache 2.0. Commercial cloud tier on request.

Score: 5.5/7 axes (missing: deep MCP guardrail library, native optimizer).


Capability matrix

AxisFuture AGIPortkeyLiteLLMKong AI GatewayMaxim Bifrost
Per-tool latency captureNative (traceAI)NativeOTLP exportPluginNative
Tool-call success aggregationNativeNativeSQL viewGrafanaNative
Argument validation + guardrails18+ scanners + MCP SecurityPluginsExtensiblePluginsPartial
MCP server auth + allowlistsOAuth 2.1 + scope rewriteOAuth 2.1 + consoleOAuth 2.1 in codeConsumer + OAuth pluginOAuth 2.1
Codex CLI session correlationspan_id linkagetrace_id headermetadata pass-throughOTel contextTrace propagation
STDIO mitigation postureMCP Security scannerStreamable HTTP defaultPatch hygiene required (1.83.7+)Transport policyStreamable HTTP default
Self-hostApache 2.0 Go binaryBYOCMIT PythonOSS KongApache 2.0 Go binary
Feedback loop / optimizeragent-opt (Apache 2.0)NoNoNoNo

Decision framework: Choose X if

Choose Future AGI if tool-call traces should drive prompt and routing optimization, and if a dedicated MCP Security scanner inline on every call is part of the production requirement.

Choose Portkey if you want a hosted gateway with mature RBAC, virtual keys, and a polished UI for MCP traces, and you don’t need the optimizer. Confirm multi-year roadmap continuity given the Palo Alto Networks acquisition.

Choose LiteLLM if your security team requires Codex CLI traffic to never leave the VPC and you can operate the version hygiene the 2026 incidents made permanent.

Choose Kong AI Gateway if you already operate Kong for REST APIs and platform-team familiarity outweighs AI-specific shallowness. Plan the dashboard work explicitly.

Choose Maxim Bifrost if tool-call concurrency is the binding constraint and gateway overhead matters per call. Trade-off: shallower MCP guardrail depth and a vendor narrative without a published limitations block.


Common mistakes when wiring Codex CLI through an MCP gateway

MistakeFix
Pointing only the OpenAI side at the gateway, leaving MCP directWire both OPENAI_BASE_URL and the MCP federation endpoint
Leaving STDIO as the default MCP transportSet Streamable HTTP default; STDIO only for allowlisted local servers
Sharing one MCP token across all Codex CLI usersIssue per-agent OAuth 2.1 identities; gateway rewrites scopes downstream
Not pinning the LiteLLM versionPin to 1.82.6 or upgrade past 1.83.7
Tagging only user_id, not session_id plus tool_nameTag user, session, tool, MCP server; the four-tuple makes failure clusters legible
Skipping argument-level guardrails because LLM guardrails are onTool-poisoning targets the argument string, not the model prompt — scan the MCP path
Buffering streaming tool responsesConfirm SSE pass-through on the MCP path without buffer-and-batch

How Future AGI closes the loop on Codex CLI MCP tool calls

The other four gateways treat tool-level observability as an end state: capture the span, show it in a dashboard, alert on threshold breach. Future AGI treats it as the input to a feedback loop.

  1. Trace. traceAI (Apache 2.0). Parent span: OpenAI Responses API call. Children: MCP tool calls. Spans capture inputs, outputs, arguments, model, server, session.
  2. Evaluate. ai-evaluation (Apache 2.0) scores every span. FAGI ships a 50+ built-in rubric catalog (task-completion, tool-call accuracy, faithfulness, code-correctness, structured-output, hallucination, agentic surfaces, instruction-following, groundedness), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code and MCP schema, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (Galileo Luna-2 cost economics, rubric-flexible). Catalog is the floor, not the ceiling.
  3. Cluster. Low-scoring sessions cluster by failure mode. Common Codex CLI pattern: the agent called filesystem.read on a binary file, got 200KB of unreadable bytes into context, then made three more wasteful tool calls before timing out.
  4. Optimize. agent-opt (Apache 2.0; ProTeGi, Bayesian, GEPA) rewrites the system prompt or tool-routing policy. Typical rewrite: drop 14 unused tools from the discovery response.
  5. Route. Agent Command Center applies the new policy on the next session. Protect guardrails (~67 ms text, arXiv 2510.13351) run inline.
  6. Re-deploy. Prompt and allowlist versioned. Eval regression triggers automatic rollback.

Net effect: a Codex CLI team starting with 12 percent of tool calls returning failure or wasted output typically sees that rate drop to 2-4 percent within four weeks.

Building blocks (Apache 2.0): traceAI, ai-evaluation, agent-opt at github.com/future-agi. The hosted Agent Command Center adds the failure-cluster view, live Protect guardrails, the MCP Security scanner, RBAC, SOC 2 Type II certified, and AWS Marketplace listing.


What we did not include

  • Helicone. Strong on LLM observability but post-Mintlify (March 3, 2026) the MCP roadmap is downstream of a documentation product.
  • Composio. Outstanding managed MCP catalog when integration breadth is the binding constraint; pair with one of the picks above for guardrails.
  • Cloudflare AI Gateway. Strong LLM primitives but the MCP surface is still thin as of May 2026.

All three are worth a second look in Q3 2026.



Sources

  • OpenAI Codex CLI documentation and config reference, platform.openai.com/docs/codex-cli
  • Model Context Protocol specification 2025-11-25, modelcontextprotocol.io/specification/2025-11-25
  • OX Security advisory on MCP STDIO RCE class (April 15, 2026), ox.security/blog/mcp-supply-chain-advisory-rce-vulnerabilities-across-the-ai-ecosystem
  • Future AGI Agent Command Center docs, docs.futureagi.com/docs/command-center
  • Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
  • Portkey AI gateway, portkey.ai
  • Palo Alto Networks Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey
  • LiteLLM proxy and MCP advisory (CVE-2026-30623), docs.litellm.ai/blog/mcp-stdio-command-injection-april-2026
  • LiteLLM PyPI supply-chain advisory (March 2026), docs.litellm.ai/blog/security-update-march-2026
  • Kong AI Gateway, konghq.com/products/kong-ai-gateway
  • Maxim Bifrost benchmarks, getmaxim.ai/bifrost/resources/benchmarks

Frequently asked questions

What is the cheapest way to monitor Codex CLI MCP tool calls?
LiteLLM's open-source proxy in MCP gateway mode is the cheapest self-hosted route (with the version-hygiene caveat). Future AGI's free tier (100K traces / month) is the cheapest hosted route with per-tool span slicing and the MCP Security scanner.
Does Codex CLI support MCP through its config file?
Yes. MCP support lives in `~/.codex/config.toml`. Pointing servers at a gateway is a config change on the URL plus the OAuth identity.
Can I route Codex CLI through multiple model providers?
Codex CLI is tuned for GPT-5.x-Codex; routing to non-OpenAI models often degrades tool use. Route within the OpenAI Codex family based on token budget, and keep the MCP tool side unified at the gateway.
How do I track Codex CLI tool-call cost per developer when everyone shares one MCP token?
Use a gateway with virtual keys or per-agent OAuth identities (Future AGI, Portkey, LiteLLM). Each developer gets an identity that fans out to the team's downstream MCP credentials.
What happens to MCP tool calls when Codex CLI runs through a gateway?
All five preserve calls and emit per-tool spans. The mistake to avoid: wiring only the OpenAI side. Tool calls then bypass the gateway entirely and per-tool observability is empty.
Is it safe to send source code through an MCP gateway?
Hosted gateways see the code. If compliance forbids the hosted path, pick self-hosted Future AGI (Apache 2.0 Go binary), self-hosted LiteLLM, or self-hosted Kong AI Gateway.
How is Future AGI different from Portkey for Codex CLI MCP traffic?
Portkey gives you a dashboard. Future AGI gives you a dashboard plus a loop plus a dedicated MCP Security scanner inline on every call. The Palo Alto Networks acquisition of Portkey (April 30, 2026) is also a procurement signal.
Did the April 2026 MCP RCE class change how teams wire MCP gateways?
Yes. Before April 15, 2026 the gateway was an observability convenience. After OX Security's disclosure and the CVE inventory that followed (CVE-2026-30623 in LiteLLM plus assignments across Agent Zero, Upsonic, Flowise, Windsurf), it became the practical place to enforce least-privilege tool access, OAuth 2.1, and Streamable HTTP transport.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.