Best 5 MCP Gateways for Claude Code in 2026
Five MCP gateways for Claude Code in 2026, scored on per-tool latency, MCP server auth, tool-description scanning, session correlation, and what each gateway misses after the April STDIO RCE.
Table of Contents
A Claude Code session that registers a dozen MCP servers and runs two hours easily issues 150 tool calls. Most are filesystem reads. A few are git diffs, a few postgres lookups, and one or two actually moved the work forward. The Anthropic dashboard shows tokens. Claude Code’s logs show a one-line summary per call. Neither shows you which server returned a 4-second response, which tool was invoked with a path traversal in its arguments, or whether the description text the agent saw at discovery had been mutated by a poisoned server.
MCP support in Claude Code shipped late 2024 and matured through 2025. By May 2026 it’s the part of the agent most teams under-instrument. Token observability is solved; MCP tool observability isn’t. Add the April 15, 2026 OX Security disclosure of the Anthropic STDIO RCE class and the MCP path stops being an observability convenience and becomes a security boundary production teams enforce at a gateway.
The five gateways below all speak MCP on the tool side and parse tool discovery and invocation rather than treating MCP traffic as opaque HTTP. Only one pipes the per-tool spans it captures into evaluation and routing optimization. This is the 2026 cohort scored on the seven axes that matter when MCP is the workload and Claude Code is the client.
TL;DR: pick by workload
| Workload | Pick | Why |
|---|---|---|
| Tool-level traces wired into eval and self-improving routing | Future AGI Agent Command Center | Only entry that pipes traceAI MCP spans into fi.evals and agent-opt, with a dedicated MCP Security scanner inline |
| Pure-play MCP gateway with Go-native throughput | Maxim Bifrost | Single Go binary that ships MCP gateway plus LLM routing; lowest published gateway overhead in the cohort |
| Hosted MCP gateway with mature virtual keys and RBAC | Portkey | Polished hosted UI for per-developer MCP attribution; verify roadmap after the Palo Alto Networks acquisition |
| API-gateway-grade SLA on top of an existing Kong stack | Kong AI Gateway | If platform already runs Kong, the AI Proxy plus OAuth plugins extend the existing operational story |
| Linux Foundation OSS MCP gateway with declarative policy | agentgateway.dev | The vendor-neutral OSS option for teams that want governance built around a foundation-hosted project |
Why Claude Code needs an MCP gateway in front of it
Claude Code reads MCP servers out of a JSON config (~/.claude/mcp.json plus per-project overrides). At session start the client connects to every registered server, calls tools/list, and pulls each description into the model’s available-tool inventory. Three properties make the workload hard to monitor.
Tool calls outnumber model calls by an order of magnitude. A bug-fix session with filesystem, git, and a custom search server issues 50 to 200 tool calls against 30 to 60 model turns. Token cost is concentrated in model calls; failure modes, latency tails, and security exposure are concentrated in tool calls. Instrument only the model side and 90 percent of where things go wrong is invisible.
Cost is non-obvious. Every MCP call consumes input tokens twice, once when the model serialises the call, and again when Claude Code re-serialises the result into the next turn. A postgres.query returning a 12,000-token table quietly pushes the next turn’s input over budget. Claude Code’s log shows the result returned; it doesn’t show the cost propagates through the rest of the context window.
Each MCP server is its own auth surface. Some need API keys, some OAuth 2.1, some run in-process via STDIO with no auth. Without a gateway the Claude Code process holds direct credentials for every server, audit logs sit on each server separately, and the agent inherits whatever scope the broadest token carries. The April 15, 2026 STDIO RCE class (OX Security; affects the official Python, TypeScript, Java, and Rust SDKs; arbitrary command execution through process names passed to STDIO) made centralizing this at a gateway the practical production requirement.
An MCP gateway sits between Claude Code and the registered servers, intercepts every discovery and invocation, attaches span attributes, and forwards after policy and guardrail checks. Claude Code points at the gateway by rewriting each server URL in mcp.json to the federation endpoint plus an OAuth identity.
The 7 axes we score on
The default MCP gateway axes (transport, OAuth, policy, federation, audit, deployment, license) are the right starting point. For Claude Code specifically we tightened them into seven coding-agent-aware axes.
| Axis | What it measures |
|---|---|
| 1. Per-tool latency capture | Does each MCP tool call get its own span with start, end, duration, server, tool, arguments? |
| 2. Tool-call success and cost aggregation | Can you slice success rate and re-serialised token cost by tool, server, and Claude Code session? |
| 3. Tool-description scanning | Does the gateway scan tool descriptions for the prompt-injection patterns the post-April 2026 RCE class surfaced? |
| 4. MCP server registration and auth | Per-agent OAuth 2.1 boundary, scope rewriting, per-agent tool allowlists |
| 5. Claude Code session correlation | Are MCP spans linked to the parent Anthropic API call via span_id so the session is one tree? |
| 6. STDIO posture | Does the gateway block or sanitize the April 2026 STDIO RCE class by default? |
| 7. Self-host posture | Can the gateway run in your VPC so code, tool arguments, and tool outputs never leave? |
Verdict line at the end of each pick scores all seven.
How we picked
We started from the universe of public MCP gateways that, as of May 2026, ship a documented federation endpoint and at least one of the four post-RCE control planes. We removed gateways whose MCP support is STDIO-only without sanitization (April 2026 made this disqualifying). We removed Helicone after its March 3, 2026 Mintlify acquisition shifted roadmap toward documentation. We removed LiteLLM from the headline list because of the March 24, 2026 PyPI supply-chain incident plus CVE-2026-30623, pinned and patched it remains viable, but the version-hygiene tax disqualifies it for teams wiring something up this week.
Future AGI is first because the loop is the wedge.
1. Future AGI Agent Command Center: Best for closing the loop on Claude Code MCP
Verdict. Future AGI is the only gateway here that takes captured MCP traces and pipes them into evaluation and routing optimization. The other four are observation layers. Agent Command Center is an observation layer wired to a self-improving loop with a dedicated MCP Security scanner inline on every call.
Per-tool latency capture lives in traceAI (Apache 2.0): each MCP invocation becomes an OpenTelemetry span with mcp.tool.name, mcp.server.id, full argument payload, and response, flowing into Grafana, Datadog, or any OTLP backend. Tool-call success and cost aggregate via native span-attribute slicing, group by tool, server, session, developer. Re-serialisation token cost (the input tokens the next turn pays for the previous tool’s result) is captured as a span attribute, so “filesystem.read on paths over 280 characters fails 4.2 percent of the time and adds 18K input tokens to the next turn” is a fix, not a vague gut feeling.
Tool-description scanning runs through the Future AGI Protect model family at discovery and at each invocation. Protect is FAGI’s own fine-tuned model family built on Google’s Gemma 3n with specialized adapters across four safety dimensions (content moderation, bias detection, security/prompt-injection, data privacy/PII), natively multi-modal across text, image, and audio, a model family, not a plugin chain of third-party detectors. The adapters scan descriptions and arguments for prompt injection, secrets, PII, and the MCP tool-poisoning patterns the April 2026 disclosure surfaced. Latency is ~67 ms p50 text and ~109 ms p50 image (arXiv 2510.13351); the same dimensions are reusable as offline eval metrics so the prod policy and the eval rubric stay in sync. MCP server auth uses OAuth 2.1 at the boundary plus per-agent allowlists and scope rewriting; Claude Code never holds raw downstream tokens. Session correlation works through span_id, the Anthropic API call is the parent, every MCP tool call is a child, and the eval verdict is stitched into the same trace. STDIO is allowlisted per agent and routes through the sanitizer; Streamable HTTP is the default. Self-host is the Apache 2.0 Go binary on Docker, Kubernetes, AWS, GCP, Azure, or air-gapped, with a hosted endpoint at gateway.futureagi.com/v1.
The loop. Every trace is scored by fi.evals (faithfulness, tool-call accuracy, code-correctness). traceAI instruments 35+ frameworks OpenInference-natively, and Error Feed (FAGI’s “Sentry for AI agents”) sits alongside as the zero-config error monitor: auto-clusters related MCP tool-call failures (50 traces → 1 issue), auto-writes the root cause from the span evidence plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue so a regressing tool surfaces like an exception rather than buried in trace search. Low-scoring sessions cluster by failure mode. agent-opt (Apache 2.0; ProTeGi, Bayesian, GEPA) rewrites the system prompt or adjusts allowlist policy. Typical Claude Code rewrite from our usage: stop registering 14 of 38 tools the agent never calls but each consumes ~180 input tokens at discovery. Saving across a team running 22 sessions a day: roughly 12 percent of input tokens, no developer behaviour change.
Where it falls short. agent-opt is opt-in, start with traceAI + ai-evaluation for one-week pilots and light up the optimizer once eval baselines stabilize and Claude Code is at scale. The managed MCP catalog is smaller than Composio’s, pair with Composio when integration breadth is the binding constraint.
Pricing. Free tier 100K traces/month. Scale tier $99/month. Enterprise custom with SOC 2 Type II, HIPAA BAA, AWS Marketplace.
Score: 7/7 axes.
2. Maxim Bifrost: Best for MCP-native Go throughput
Verdict. Bifrost is the Apache 2.0 Go binary from Maxim that runs LLM routing and MCP gateway in one process. The vendor-published benchmark is roughly 11 microsecond P50 at 5,000 RPS on t3.xlarge (Maxim’s own harness with a mock 60 ms upstream, treat as gateway overhead, not end-to-end). Right pick when MCP tool concurrency is the binding constraint and your platform team is happy operating a Go binary.
Per-tool latency capture is native via Bifrost’s OTel exporter; dashboard polish is thinner than Future AGI’s, so teams pipe spans into Datadog or Tempo. Per-tool metrics in the Bifrost console unify MCP plus LLM in one view. Tool-description scanning is partial, not a 18+ scanner library and not a dedicated MCP Security scanner; poisoning detection is assembled from plugins. OAuth 2.1 is supported; per-agent allowlists are documented but UX is less polished than Portkey or Future AGI. Session correlation rides OTel context. Streamable HTTP is the default. Self-host on Docker or Kubernetes is the point.
Bifrost also ships “Code Mode,” a vendor-claimed MCP token-reduction feature with up to 92.8 percent input-token reduction across 508 tools on 16 MCP servers in Maxim’s own harness. Worth reproducing on your fleet before underwriting; teams report meaningful savings on tool-heavy Claude Code workflows but rarely the upper-bound figure.
Where it falls short. MCP guardrail library is thin, teams needing tool-poisoning detection across many shapes wire more glue. Maxim’s own listicles rank Bifrost number one without publishing a limitations block, a trust signal worth weighing. No optimizer; throughput is one axis, “contain the blast radius and bend the cost curve” is a different problem.
Pricing. Apache 2.0. Commercial cloud tier on request.
Score: 5.5/7 axes (missing: deep MCP-path guardrails, optimizer).
3. Portkey: Best for hosted MCP gateway with mature RBAC
Verdict. Portkey is the most polished hosted-only product in this category. Speaks MCP on the tool side, ships per-tool span attributes through its trace API, and has the cleanest virtual-key story for per-developer MCP attribution. No optimizer. The April 30, 2026 Palo Alto Networks acquisition (close PANW fiscal Q4 2026; AI gateway roadmap merging into Prisma AIRS) is a procurement signal.
Per-tool latency lives in Portkey’s MCP trace surface with a histogram rendered natively, the prettiest dashboard in the cohort. Tool-call success and cost aggregation extend the analytics view to MCP, with group-by-tool, group-by-server, and re-serialisation token cost captured per call. Tool-description scanning runs through Portkey’s Guardrails plugin set, narrower than Future AGI’s 18+ scanner library. MCP server auth uses virtual servers and OAuth 2.1; per-agent allowlists are configured in the console. Session correlation rides a trace_id header the Claude Code wrapper has to set. Streamable HTTP is the default. Self-host through Portkey’s BYOC option is good for compliance, not strictly air-gapped.
Where it falls short. The Palo Alto Networks acquisition is the elephant. AI gateway roadmap is merging into Prisma AIRS, and the cadence of MCP-specific feature work is the open question. Verify continuity before signing past PANW fiscal Q4 2026. No optimizer. Pricing escalates above 5M MCP calls/month faster than self-hosted alternatives.
Pricing. Free tier 10K requests/day. Scale tier $99/month. Enterprise custom with SOC 2 Type II.
Score: 6/7 axes (missing: feedback loop / optimizer).
4. Kong AI Gateway: Best if you already run Kong
Verdict. Kong AI Gateway is the pick when the platform team already operates Kong for REST APIs. Strengths: SLA, plugin ecosystem, ops familiarity. Weakness: MCP-specific shallowness, observability happens via the OTel plugin plus AI Proxy plus a separate dashboard, not natively.
Per-tool latency capture comes through the OTel plugin plus the AI Proxy MCP extension introduced in Kong 3.6. Spans land in your OTel backend; the Kong console shows the API-gateway view, not the MCP-cost view. Tool-call success and cost aggregation use Kong consumer plus tag patterns; the slice-by-tool dashboard lives in Grafana. Tool-description scanning runs through Kong’s AI plugin library, with MCP poisoning detection assembled from plugins. Auth uses Kong consumers plus the OAuth 2.0 plugin. STDIO posture is set by transport policy. Self-host is the entire point of Kong.
Where it falls short. MCP observability is plugin-driven, not native, plan two weeks of platform-team time for the chargeback-grade dashboard. No optimizer. MCP scanner depth is shallower than Future AGI or Portkey; if your CISO wants a named MCP Security scanner, the answer is “we built one out of plugins.”
Pricing. Open source. Kong Konnect starts free. Enterprise from ~$1.5K/month.
Score: 5/7 axes (missing: native MCP dashboard, optimizer, polished tool-call view).
5. agentgateway.dev: Best for Linux Foundation MCP gateway
Verdict. agentgateway.dev is the Linux Foundation-hosted, vendor-neutral MCP gateway. Right pick when your buying constraint is governance, a foundation-hosted project with a transparent maintainer mix, open contribution, and no acquisition risk. Trade-off is feature surface: guardrail library and dashboard are both lighter than Future AGI’s or Portkey’s.
Per-tool latency runs through OTel exporters on the MCP path with standard OTLP spans. Tool-call success and cost aggregation use OTel attributes; the cost-by-tool view is a Grafana dashboard you build. Tool-description scanning lives in agentgateway’s declarative policy engine, policy as code covers allowlists, scope rewriting, and a baseline scanner set, lighter than Future AGI’s named scanners. OAuth 2.1 is enforced at the boundary; per-agent allowlists are first-class. Session correlation rides OTel context. Streamable HTTP is default with STDIO opt-in. Self-host on Apache 2.0 is the project’s identity.
Where it falls short. Dashboard story is thin, most teams pair agentgateway with their own observability stack. No optimizer. Plugin and scanner library is smaller than longer-running OSS MCP gateway projects. Roadmap velocity is foundation-paced; vendor-led projects ship MCP-path features faster.
Pricing. Apache 2.0, Linux Foundation-hosted. Commercial support via member companies.
Score: 5/7 axes (missing: polished dashboard, deep scanner library, optimizer).
Capability matrix
| Axis | Future AGI ACC | Maxim Bifrost | Portkey | Kong AI Gateway | agentgateway.dev |
|---|---|---|---|---|---|
| Per-tool latency capture | Native (traceAI) | Native | Native | OTel plugin | OTel exporter |
| Tool-call success + cost aggregation | Native span slicing | Per-tool metrics | Native analytics | Grafana on OTel | Grafana on OTel |
| Tool-description scanning | 18+ scanners + MCP Security | Partial | Guardrails plugin set | AI plugin library | Declarative policy |
| MCP server auth + allowlists | OAuth 2.1 + scope rewrite | OAuth 2.1 | OAuth 2.1 + console | Consumer + OAuth plugin | Policy as code |
| Claude Code session correlation | span_id linkage | OTel context | trace_id header | OTel context | OTel context |
| STDIO posture | Streamable HTTP default + scanner | Streamable HTTP default | Streamable HTTP default | Transport policy | Streamable HTTP default |
| Self-host | Apache 2.0 Go binary | Apache 2.0 Go binary | BYOC | OSS Kong | Apache 2.0 LF project |
| Feedback loop / optimizer | agent-opt (Apache 2.0) | No | No | No | No |
Decision framework: Choose X if
Choose Future AGI Agent Command Center if the gateway should do more than observe, if MCP traces should drive prompt rewrites and routing-policy updates over time, and a dedicated MCP Security scanner inline on every call is part of the requirement. Pick this when Claude Code’s MCP surface is significant enough that the cost-and-quality curve genuinely matters.
Choose Maxim Bifrost if MCP tool concurrency is the binding constraint and gateway overhead matters per call. Trade-off: shallower MCP guardrail depth and a vendor narrative without a published limitations block.
Choose Portkey if you want a hosted gateway with mature RBAC, virtual keys, and the prettiest UI for MCP traces, and you don’t need the optimizer. Verify multi-year roadmap continuity given the Palo Alto Networks acquisition.
Choose Kong AI Gateway if you already operate Kong for REST APIs and platform-team familiarity outweighs AI-specific shallowness. Plan two weeks for the MCP dashboard work explicitly.
Choose agentgateway.dev if Linux Foundation governance, Apache 2.0 across the board, and acquisition-independence outrank dashboard polish or scanner depth.
Common mistakes when wiring Claude Code through an MCP gateway
| Mistake | Fix |
|---|---|
| Pointing only the Anthropic API at the AI gateway, leaving MCP direct | Wire ANTHROPIC_BASE_URL and rewrite each MCP server URL in mcp.json to the federation endpoint |
| Leaving STDIO as the default MCP transport | Set Streamable HTTP as the default; allowlist STDIO only for known-good local servers |
| Sharing one MCP token across all Claude Code users | Issue per-agent OAuth 2.1 identities; the gateway rewrites scopes downstream |
| Not pinning the LiteLLM version when LiteLLM is in the path | Pin to 1.82.6 or upgrade past 1.83.7-stable; version hygiene is permanent |
Tagging only user_id, not session_id plus tool_name plus mcp.server.id | Tag user, session, tool, server; the four-tuple makes failure clusters legible |
| Skipping argument-level guardrails because LLM-side guardrails are on | Run a scanner on the MCP path inspecting descriptions at discovery and arguments at invocation |
| Buffering streaming tool responses | Confirm SSE pass-through on the MCP path without buffer-and-batch |
| Setting tool allowlists too narrowly on day one | Start in audit-mode (log without blocking), watch for a week, then enforce |
How Future AGI closes the loop on Claude Code MCP
The other four treat MCP observability as an end state: capture the span, render it, alert when an SLO trips. Future AGI treats it as input to a feedback loop.
- Trace.
traceAI(Apache 2.0). Parent: Anthropic API call. Children: every MCP invocation, guardrail check, retry. Spans capture inputs, outputs, arguments, responses, model, server, identity, and re-serialisation token cost. - Evaluate.
ai-evaluation(Apache 2.0) scores each span. FAGI ships a 50+ built-in rubric catalog (task-completion, faithfulness, tool-call accuracy, code-correctness, structured-output, hallucination, agentic surfaces, instruction-following, groundedness), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code and MCP schema, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (Galileo Luna-2 cost economics, rubric-flexible). Catalog is the floor, not the ceiling. - Cluster. Low-scoring sessions cluster by failure mode. Common Claude Code patterns: the agent called
filesystem.listrecursively from the project root and dropped 28K tokens ofnode_modulesinto the next turn;postgres.queryretried four times on a transient 5xx because the MCP client didn’t back off;git.diffran against the wrong base branch. - Optimize.
agent-opt(Apache 2.0; ProTeGi, Bayesian, GEPA) rewrites the system prompt or adjusts allowlist policy. Typical rewrites: drop 14 of 38 registered tools the agent never invokes; routepostgres.querycalls under a row-limit to a read-replica; tightenfilesystem.listso the agent stops recursing into vendor directories. - Route. Agent Command Center applies the updated policy on the next session. Protect guardrails (~67 ms text, arXiv 2510.13351) run inline; the MCP Security scanner inspects discovery responses and argument payloads.
- Re-deploy. Prompt and allowlist versions are stored. If the eval score regresses on the next batch, deployment auto-rolls back.
Net effect on a Claude Code team running 22 sessions a day across 12 engineers: input-token spend trends down 12 to 18 percent within four weeks, MCP tool-call failure rate drops from around 12 percent to 3 to 4 percent. No developer behaviour change.
Apache 2.0 building blocks: traceAI, ai-evaluation, agent-opt at github.com/future-agi. Hosted Agent Command Center adds failure-cluster views, live Protect guardrails, the MCP Security scanner, RBAC, SOC 2 Type II, HIPAA, GDPR, and CCPA all certified with BAA available, and AWS Marketplace listing.
What we did not include
- Helicone is strong on LLM observability but post-Mintlify acquisition (March 3, 2026) the MCP roadmap is downstream of a documentation product.
- LiteLLM remains a viable self-hosted MCP gateway pinned to 1.82.6 or past 1.83.7-stable, but the March 2026 PyPI supply-chain incident plus CVE-2026-30623 make version hygiene a permanent operational task. For teams choosing fresh in May 2026, the five above ship with less tax.
- Composio is outstanding when integration-catalog breadth is the binding constraint, 200+ managed MCP servers. Pair with one of the five picks for the guardrail and OAuth 2.1 layer.
All three are worth a second look in Q3 2026.
Related reading
- Best 5 MCP Gateways in 2026: Post-RCE Production Picks
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- Best 5 AI Gateways for MCP Tool-Level Observability with Codex CLI in 2026
- What Is an AI Gateway? The 2026 Definition
Sources
- Anthropic Claude Code MCP documentation, claude.ai/docs/claude-code/mcp
- Model Context Protocol specification 2025-11-25, modelcontextprotocol.io/specification/2025-11-25
- OX Security advisory on MCP STDIO RCE class (April 15, 2026), ox.security/blog/mcp-supply-chain-advisory-rce-vulnerabilities-across-the-ai-ecosystem
- Future AGI Agent Command Center docs, docs.futureagi.com/docs/command-center
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Maxim Bifrost benchmarks, getmaxim.ai/bifrost/resources/benchmarks
- Portkey AI gateway, portkey.ai
- Palo Alto Networks Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey
- Kong AI Gateway, konghq.com/products/kong-ai-gateway
- agentgateway.dev, agentgateway.dev (Linux Foundation project page)
- LiteLLM advisory for CVE-2026-30623, docs.litellm.ai/blog/mcp-stdio-command-injection-april-2026
- LiteLLM PyPI supply-chain advisory (March 2026), docs.litellm.ai/blog/security-update-march-2026
Frequently asked questions
What is the cheapest way to monitor Claude Code MCP tool calls?
Does Claude Code support MCP through a config file?
Can I route Claude Code MCP calls through a different transport?
How do I track Claude Code MCP cost per developer when everyone shares one credential?
What happens to tool calls when Claude Code runs through an MCP gateway?
How is Future AGI Agent Command Center different from Portkey for Claude Code MCP?
Did the April 2026 MCP RCE class change how teams wire MCP gateways?
A 2026 architecture essay on why MCP traffic blows up coding-agent token bills in Claude Code and Codex CLI — and the five named mechanisms by which an MCP gateway compresses the cost.
How an MCP gateway in front of Claude Code can cut input-token spend by 50 percent in 2026 — compiled tool execution, semantic caching, selective registration, and description compression, scored across five real gateways.
Practical guide to using an MCP gateway with Claude Code in 2026. Daily workflows, five common operations with code, four production patterns, and gateway picks. Operations-focused.