Guides

Best 5 MCP Gateways for Claude Code in 2026

Five MCP gateways for Claude Code in 2026, scored on per-tool latency, MCP server auth, tool-description scanning, session correlation, and what each gateway misses after the April STDIO RCE.

·
18 min read
ai-gateway 2026 claude-code mcp
Editorial cover image for Best 5 MCP Gateways for Claude Code in 2026
Table of Contents

A Claude Code session that registers a dozen MCP servers and runs two hours easily issues 150 tool calls. Most are filesystem reads. A few are git diffs, a few postgres lookups, and one or two actually moved the work forward. The Anthropic dashboard shows tokens. Claude Code’s logs show a one-line summary per call. Neither shows you which server returned a 4-second response, which tool was invoked with a path traversal in its arguments, or whether the description text the agent saw at discovery had been mutated by a poisoned server.

MCP support in Claude Code shipped late 2024 and matured through 2025. By May 2026 it’s the part of the agent most teams under-instrument. Token observability is solved; MCP tool observability isn’t. Add the April 15, 2026 OX Security disclosure of the Anthropic STDIO RCE class and the MCP path stops being an observability convenience and becomes a security boundary production teams enforce at a gateway.

The five gateways below all speak MCP on the tool side and parse tool discovery and invocation rather than treating MCP traffic as opaque HTTP. Only one pipes the per-tool spans it captures into evaluation and routing optimization. This is the 2026 cohort scored on the seven axes that matter when MCP is the workload and Claude Code is the client.


TL;DR: pick by workload

WorkloadPickWhy
Tool-level traces wired into eval and self-improving routingFuture AGI Agent Command CenterOnly entry that pipes traceAI MCP spans into fi.evals and agent-opt, with a dedicated MCP Security scanner inline
Pure-play MCP gateway with Go-native throughputMaxim BifrostSingle Go binary that ships MCP gateway plus LLM routing; lowest published gateway overhead in the cohort
Hosted MCP gateway with mature virtual keys and RBACPortkeyPolished hosted UI for per-developer MCP attribution; verify roadmap after the Palo Alto Networks acquisition
API-gateway-grade SLA on top of an existing Kong stackKong AI GatewayIf platform already runs Kong, the AI Proxy plus OAuth plugins extend the existing operational story
Linux Foundation OSS MCP gateway with declarative policyagentgateway.devThe vendor-neutral OSS option for teams that want governance built around a foundation-hosted project

Why Claude Code needs an MCP gateway in front of it

Claude Code reads MCP servers out of a JSON config (~/.claude/mcp.json plus per-project overrides). At session start the client connects to every registered server, calls tools/list, and pulls each description into the model’s available-tool inventory. Three properties make the workload hard to monitor.

Tool calls outnumber model calls by an order of magnitude. A bug-fix session with filesystem, git, and a custom search server issues 50 to 200 tool calls against 30 to 60 model turns. Token cost is concentrated in model calls; failure modes, latency tails, and security exposure are concentrated in tool calls. Instrument only the model side and 90 percent of where things go wrong is invisible.

Cost is non-obvious. Every MCP call consumes input tokens twice, once when the model serialises the call, and again when Claude Code re-serialises the result into the next turn. A postgres.query returning a 12,000-token table quietly pushes the next turn’s input over budget. Claude Code’s log shows the result returned; it doesn’t show the cost propagates through the rest of the context window.

Each MCP server is its own auth surface. Some need API keys, some OAuth 2.1, some run in-process via STDIO with no auth. Without a gateway the Claude Code process holds direct credentials for every server, audit logs sit on each server separately, and the agent inherits whatever scope the broadest token carries. The April 15, 2026 STDIO RCE class (OX Security; affects the official Python, TypeScript, Java, and Rust SDKs; arbitrary command execution through process names passed to STDIO) made centralizing this at a gateway the practical production requirement.

An MCP gateway sits between Claude Code and the registered servers, intercepts every discovery and invocation, attaches span attributes, and forwards after policy and guardrail checks. Claude Code points at the gateway by rewriting each server URL in mcp.json to the federation endpoint plus an OAuth identity.


The 7 axes we score on

The default MCP gateway axes (transport, OAuth, policy, federation, audit, deployment, license) are the right starting point. For Claude Code specifically we tightened them into seven coding-agent-aware axes.

AxisWhat it measures
1. Per-tool latency captureDoes each MCP tool call get its own span with start, end, duration, server, tool, arguments?
2. Tool-call success and cost aggregationCan you slice success rate and re-serialised token cost by tool, server, and Claude Code session?
3. Tool-description scanningDoes the gateway scan tool descriptions for the prompt-injection patterns the post-April 2026 RCE class surfaced?
4. MCP server registration and authPer-agent OAuth 2.1 boundary, scope rewriting, per-agent tool allowlists
5. Claude Code session correlationAre MCP spans linked to the parent Anthropic API call via span_id so the session is one tree?
6. STDIO postureDoes the gateway block or sanitize the April 2026 STDIO RCE class by default?
7. Self-host postureCan the gateway run in your VPC so code, tool arguments, and tool outputs never leave?

Verdict line at the end of each pick scores all seven.


How we picked

We started from the universe of public MCP gateways that, as of May 2026, ship a documented federation endpoint and at least one of the four post-RCE control planes. We removed gateways whose MCP support is STDIO-only without sanitization (April 2026 made this disqualifying). We removed Helicone after its March 3, 2026 Mintlify acquisition shifted roadmap toward documentation. We removed LiteLLM from the headline list because of the March 24, 2026 PyPI supply-chain incident plus CVE-2026-30623, pinned and patched it remains viable, but the version-hygiene tax disqualifies it for teams wiring something up this week.

Future AGI is first because the loop is the wedge.


1. Future AGI Agent Command Center: Best for closing the loop on Claude Code MCP

Verdict. Future AGI is the only gateway here that takes captured MCP traces and pipes them into evaluation and routing optimization. The other four are observation layers. Agent Command Center is an observation layer wired to a self-improving loop with a dedicated MCP Security scanner inline on every call.

Per-tool latency capture lives in traceAI (Apache 2.0): each MCP invocation becomes an OpenTelemetry span with mcp.tool.name, mcp.server.id, full argument payload, and response, flowing into Grafana, Datadog, or any OTLP backend. Tool-call success and cost aggregate via native span-attribute slicing, group by tool, server, session, developer. Re-serialisation token cost (the input tokens the next turn pays for the previous tool’s result) is captured as a span attribute, so “filesystem.read on paths over 280 characters fails 4.2 percent of the time and adds 18K input tokens to the next turn” is a fix, not a vague gut feeling.

Tool-description scanning runs through the Future AGI Protect model family at discovery and at each invocation. Protect is FAGI’s own fine-tuned model family built on Google’s Gemma 3n with specialized adapters across four safety dimensions (content moderation, bias detection, security/prompt-injection, data privacy/PII), natively multi-modal across text, image, and audio, a model family, not a plugin chain of third-party detectors. The adapters scan descriptions and arguments for prompt injection, secrets, PII, and the MCP tool-poisoning patterns the April 2026 disclosure surfaced. Latency is ~67 ms p50 text and ~109 ms p50 image (arXiv 2510.13351); the same dimensions are reusable as offline eval metrics so the prod policy and the eval rubric stay in sync. MCP server auth uses OAuth 2.1 at the boundary plus per-agent allowlists and scope rewriting; Claude Code never holds raw downstream tokens. Session correlation works through span_id, the Anthropic API call is the parent, every MCP tool call is a child, and the eval verdict is stitched into the same trace. STDIO is allowlisted per agent and routes through the sanitizer; Streamable HTTP is the default. Self-host is the Apache 2.0 Go binary on Docker, Kubernetes, AWS, GCP, Azure, or air-gapped, with a hosted endpoint at gateway.futureagi.com/v1.

The loop. Every trace is scored by fi.evals (faithfulness, tool-call accuracy, code-correctness). traceAI instruments 35+ frameworks OpenInference-natively, and Error Feed (FAGI’s “Sentry for AI agents”) sits alongside as the zero-config error monitor: auto-clusters related MCP tool-call failures (50 traces → 1 issue), auto-writes the root cause from the span evidence plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue so a regressing tool surfaces like an exception rather than buried in trace search. Low-scoring sessions cluster by failure mode. agent-opt (Apache 2.0; ProTeGi, Bayesian, GEPA) rewrites the system prompt or adjusts allowlist policy. Typical Claude Code rewrite from our usage: stop registering 14 of 38 tools the agent never calls but each consumes ~180 input tokens at discovery. Saving across a team running 22 sessions a day: roughly 12 percent of input tokens, no developer behaviour change.

Where it falls short. agent-opt is opt-in, start with traceAI + ai-evaluation for one-week pilots and light up the optimizer once eval baselines stabilize and Claude Code is at scale. The managed MCP catalog is smaller than Composio’s, pair with Composio when integration breadth is the binding constraint.

Pricing. Free tier 100K traces/month. Scale tier $99/month. Enterprise custom with SOC 2 Type II, HIPAA BAA, AWS Marketplace.

Score: 7/7 axes.


2. Maxim Bifrost: Best for MCP-native Go throughput

Verdict. Bifrost is the Apache 2.0 Go binary from Maxim that runs LLM routing and MCP gateway in one process. The vendor-published benchmark is roughly 11 microsecond P50 at 5,000 RPS on t3.xlarge (Maxim’s own harness with a mock 60 ms upstream, treat as gateway overhead, not end-to-end). Right pick when MCP tool concurrency is the binding constraint and your platform team is happy operating a Go binary.

Per-tool latency capture is native via Bifrost’s OTel exporter; dashboard polish is thinner than Future AGI’s, so teams pipe spans into Datadog or Tempo. Per-tool metrics in the Bifrost console unify MCP plus LLM in one view. Tool-description scanning is partial, not a 18+ scanner library and not a dedicated MCP Security scanner; poisoning detection is assembled from plugins. OAuth 2.1 is supported; per-agent allowlists are documented but UX is less polished than Portkey or Future AGI. Session correlation rides OTel context. Streamable HTTP is the default. Self-host on Docker or Kubernetes is the point.

Bifrost also ships “Code Mode,” a vendor-claimed MCP token-reduction feature with up to 92.8 percent input-token reduction across 508 tools on 16 MCP servers in Maxim’s own harness. Worth reproducing on your fleet before underwriting; teams report meaningful savings on tool-heavy Claude Code workflows but rarely the upper-bound figure.

Where it falls short. MCP guardrail library is thin, teams needing tool-poisoning detection across many shapes wire more glue. Maxim’s own listicles rank Bifrost number one without publishing a limitations block, a trust signal worth weighing. No optimizer; throughput is one axis, “contain the blast radius and bend the cost curve” is a different problem.

Pricing. Apache 2.0. Commercial cloud tier on request.

Score: 5.5/7 axes (missing: deep MCP-path guardrails, optimizer).


3. Portkey: Best for hosted MCP gateway with mature RBAC

Verdict. Portkey is the most polished hosted-only product in this category. Speaks MCP on the tool side, ships per-tool span attributes through its trace API, and has the cleanest virtual-key story for per-developer MCP attribution. No optimizer. The April 30, 2026 Palo Alto Networks acquisition (close PANW fiscal Q4 2026; AI gateway roadmap merging into Prisma AIRS) is a procurement signal.

Per-tool latency lives in Portkey’s MCP trace surface with a histogram rendered natively, the prettiest dashboard in the cohort. Tool-call success and cost aggregation extend the analytics view to MCP, with group-by-tool, group-by-server, and re-serialisation token cost captured per call. Tool-description scanning runs through Portkey’s Guardrails plugin set, narrower than Future AGI’s 18+ scanner library. MCP server auth uses virtual servers and OAuth 2.1; per-agent allowlists are configured in the console. Session correlation rides a trace_id header the Claude Code wrapper has to set. Streamable HTTP is the default. Self-host through Portkey’s BYOC option is good for compliance, not strictly air-gapped.

Where it falls short. The Palo Alto Networks acquisition is the elephant. AI gateway roadmap is merging into Prisma AIRS, and the cadence of MCP-specific feature work is the open question. Verify continuity before signing past PANW fiscal Q4 2026. No optimizer. Pricing escalates above 5M MCP calls/month faster than self-hosted alternatives.

Pricing. Free tier 10K requests/day. Scale tier $99/month. Enterprise custom with SOC 2 Type II.

Score: 6/7 axes (missing: feedback loop / optimizer).


4. Kong AI Gateway: Best if you already run Kong

Verdict. Kong AI Gateway is the pick when the platform team already operates Kong for REST APIs. Strengths: SLA, plugin ecosystem, ops familiarity. Weakness: MCP-specific shallowness, observability happens via the OTel plugin plus AI Proxy plus a separate dashboard, not natively.

Per-tool latency capture comes through the OTel plugin plus the AI Proxy MCP extension introduced in Kong 3.6. Spans land in your OTel backend; the Kong console shows the API-gateway view, not the MCP-cost view. Tool-call success and cost aggregation use Kong consumer plus tag patterns; the slice-by-tool dashboard lives in Grafana. Tool-description scanning runs through Kong’s AI plugin library, with MCP poisoning detection assembled from plugins. Auth uses Kong consumers plus the OAuth 2.0 plugin. STDIO posture is set by transport policy. Self-host is the entire point of Kong.

Where it falls short. MCP observability is plugin-driven, not native, plan two weeks of platform-team time for the chargeback-grade dashboard. No optimizer. MCP scanner depth is shallower than Future AGI or Portkey; if your CISO wants a named MCP Security scanner, the answer is “we built one out of plugins.”

Pricing. Open source. Kong Konnect starts free. Enterprise from ~$1.5K/month.

Score: 5/7 axes (missing: native MCP dashboard, optimizer, polished tool-call view).


5. agentgateway.dev: Best for Linux Foundation MCP gateway

Verdict. agentgateway.dev is the Linux Foundation-hosted, vendor-neutral MCP gateway. Right pick when your buying constraint is governance, a foundation-hosted project with a transparent maintainer mix, open contribution, and no acquisition risk. Trade-off is feature surface: guardrail library and dashboard are both lighter than Future AGI’s or Portkey’s.

Per-tool latency runs through OTel exporters on the MCP path with standard OTLP spans. Tool-call success and cost aggregation use OTel attributes; the cost-by-tool view is a Grafana dashboard you build. Tool-description scanning lives in agentgateway’s declarative policy engine, policy as code covers allowlists, scope rewriting, and a baseline scanner set, lighter than Future AGI’s named scanners. OAuth 2.1 is enforced at the boundary; per-agent allowlists are first-class. Session correlation rides OTel context. Streamable HTTP is default with STDIO opt-in. Self-host on Apache 2.0 is the project’s identity.

Where it falls short. Dashboard story is thin, most teams pair agentgateway with their own observability stack. No optimizer. Plugin and scanner library is smaller than longer-running OSS MCP gateway projects. Roadmap velocity is foundation-paced; vendor-led projects ship MCP-path features faster.

Pricing. Apache 2.0, Linux Foundation-hosted. Commercial support via member companies.

Score: 5/7 axes (missing: polished dashboard, deep scanner library, optimizer).


Capability matrix

AxisFuture AGI ACCMaxim BifrostPortkeyKong AI Gatewayagentgateway.dev
Per-tool latency captureNative (traceAI)NativeNativeOTel pluginOTel exporter
Tool-call success + cost aggregationNative span slicingPer-tool metricsNative analyticsGrafana on OTelGrafana on OTel
Tool-description scanning18+ scanners + MCP SecurityPartialGuardrails plugin setAI plugin libraryDeclarative policy
MCP server auth + allowlistsOAuth 2.1 + scope rewriteOAuth 2.1OAuth 2.1 + consoleConsumer + OAuth pluginPolicy as code
Claude Code session correlationspan_id linkageOTel contexttrace_id headerOTel contextOTel context
STDIO postureStreamable HTTP default + scannerStreamable HTTP defaultStreamable HTTP defaultTransport policyStreamable HTTP default
Self-hostApache 2.0 Go binaryApache 2.0 Go binaryBYOCOSS KongApache 2.0 LF project
Feedback loop / optimizeragent-opt (Apache 2.0)NoNoNoNo

Decision framework: Choose X if

Choose Future AGI Agent Command Center if the gateway should do more than observe, if MCP traces should drive prompt rewrites and routing-policy updates over time, and a dedicated MCP Security scanner inline on every call is part of the requirement. Pick this when Claude Code’s MCP surface is significant enough that the cost-and-quality curve genuinely matters.

Choose Maxim Bifrost if MCP tool concurrency is the binding constraint and gateway overhead matters per call. Trade-off: shallower MCP guardrail depth and a vendor narrative without a published limitations block.

Choose Portkey if you want a hosted gateway with mature RBAC, virtual keys, and the prettiest UI for MCP traces, and you don’t need the optimizer. Verify multi-year roadmap continuity given the Palo Alto Networks acquisition.

Choose Kong AI Gateway if you already operate Kong for REST APIs and platform-team familiarity outweighs AI-specific shallowness. Plan two weeks for the MCP dashboard work explicitly.

Choose agentgateway.dev if Linux Foundation governance, Apache 2.0 across the board, and acquisition-independence outrank dashboard polish or scanner depth.


Common mistakes when wiring Claude Code through an MCP gateway

MistakeFix
Pointing only the Anthropic API at the AI gateway, leaving MCP directWire ANTHROPIC_BASE_URL and rewrite each MCP server URL in mcp.json to the federation endpoint
Leaving STDIO as the default MCP transportSet Streamable HTTP as the default; allowlist STDIO only for known-good local servers
Sharing one MCP token across all Claude Code usersIssue per-agent OAuth 2.1 identities; the gateway rewrites scopes downstream
Not pinning the LiteLLM version when LiteLLM is in the pathPin to 1.82.6 or upgrade past 1.83.7-stable; version hygiene is permanent
Tagging only user_id, not session_id plus tool_name plus mcp.server.idTag user, session, tool, server; the four-tuple makes failure clusters legible
Skipping argument-level guardrails because LLM-side guardrails are onRun a scanner on the MCP path inspecting descriptions at discovery and arguments at invocation
Buffering streaming tool responsesConfirm SSE pass-through on the MCP path without buffer-and-batch
Setting tool allowlists too narrowly on day oneStart in audit-mode (log without blocking), watch for a week, then enforce

How Future AGI closes the loop on Claude Code MCP

The other four treat MCP observability as an end state: capture the span, render it, alert when an SLO trips. Future AGI treats it as input to a feedback loop.

  1. Trace. traceAI (Apache 2.0). Parent: Anthropic API call. Children: every MCP invocation, guardrail check, retry. Spans capture inputs, outputs, arguments, responses, model, server, identity, and re-serialisation token cost.
  2. Evaluate. ai-evaluation (Apache 2.0) scores each span. FAGI ships a 50+ built-in rubric catalog (task-completion, faithfulness, tool-call accuracy, code-correctness, structured-output, hallucination, agentic surfaces, instruction-following, groundedness), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code and MCP schema, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (Galileo Luna-2 cost economics, rubric-flexible). Catalog is the floor, not the ceiling.
  3. Cluster. Low-scoring sessions cluster by failure mode. Common Claude Code patterns: the agent called filesystem.list recursively from the project root and dropped 28K tokens of node_modules into the next turn; postgres.query retried four times on a transient 5xx because the MCP client didn’t back off; git.diff ran against the wrong base branch.
  4. Optimize. agent-opt (Apache 2.0; ProTeGi, Bayesian, GEPA) rewrites the system prompt or adjusts allowlist policy. Typical rewrites: drop 14 of 38 registered tools the agent never invokes; route postgres.query calls under a row-limit to a read-replica; tighten filesystem.list so the agent stops recursing into vendor directories.
  5. Route. Agent Command Center applies the updated policy on the next session. Protect guardrails (~67 ms text, arXiv 2510.13351) run inline; the MCP Security scanner inspects discovery responses and argument payloads.
  6. Re-deploy. Prompt and allowlist versions are stored. If the eval score regresses on the next batch, deployment auto-rolls back.

Net effect on a Claude Code team running 22 sessions a day across 12 engineers: input-token spend trends down 12 to 18 percent within four weeks, MCP tool-call failure rate drops from around 12 percent to 3 to 4 percent. No developer behaviour change.

Apache 2.0 building blocks: traceAI, ai-evaluation, agent-opt at github.com/future-agi. Hosted Agent Command Center adds failure-cluster views, live Protect guardrails, the MCP Security scanner, RBAC, SOC 2 Type II, HIPAA, GDPR, and CCPA all certified with BAA available, and AWS Marketplace listing.


What we did not include

  • Helicone is strong on LLM observability but post-Mintlify acquisition (March 3, 2026) the MCP roadmap is downstream of a documentation product.
  • LiteLLM remains a viable self-hosted MCP gateway pinned to 1.82.6 or past 1.83.7-stable, but the March 2026 PyPI supply-chain incident plus CVE-2026-30623 make version hygiene a permanent operational task. For teams choosing fresh in May 2026, the five above ship with less tax.
  • Composio is outstanding when integration-catalog breadth is the binding constraint, 200+ managed MCP servers. Pair with one of the five picks for the guardrail and OAuth 2.1 layer.

All three are worth a second look in Q3 2026.



Sources

  • Anthropic Claude Code MCP documentation, claude.ai/docs/claude-code/mcp
  • Model Context Protocol specification 2025-11-25, modelcontextprotocol.io/specification/2025-11-25
  • OX Security advisory on MCP STDIO RCE class (April 15, 2026), ox.security/blog/mcp-supply-chain-advisory-rce-vulnerabilities-across-the-ai-ecosystem
  • Future AGI Agent Command Center docs, docs.futureagi.com/docs/command-center
  • Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
  • Maxim Bifrost benchmarks, getmaxim.ai/bifrost/resources/benchmarks
  • Portkey AI gateway, portkey.ai
  • Palo Alto Networks Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey
  • Kong AI Gateway, konghq.com/products/kong-ai-gateway
  • agentgateway.dev, agentgateway.dev (Linux Foundation project page)
  • LiteLLM advisory for CVE-2026-30623, docs.litellm.ai/blog/mcp-stdio-command-injection-april-2026
  • LiteLLM PyPI supply-chain advisory (March 2026), docs.litellm.ai/blog/security-update-march-2026

Frequently asked questions

What is the cheapest way to monitor Claude Code MCP tool calls?
agentgateway.dev or self-hosted Kong AI Gateway with the OTel plugin are the cheapest self-hosted routes. Future AGI's free tier (100K traces/month) is the cheapest hosted route with per-tool slicing, the MCP Security scanner, and the optimizer loop.
Does Claude Code support MCP through a config file?
Yes. Servers live in `~/.claude/mcp.json` plus per-project overrides. Pointing at a gateway is a URL rewrite plus an OAuth identity.
Can I route Claude Code MCP calls through a different transport?
Streamable HTTP is the 2026 default; STDIO is supported for local servers but is the transport the April 15, 2026 OX Security disclosure targeted. Pin Streamable HTTP and allowlist STDIO only for known-good local processes.
How do I track Claude Code MCP cost per developer when everyone shares one credential?
Use a gateway with virtual keys or per-agent OAuth (Future AGI, Portkey, Maxim Bifrost). Each developer gets an identity that fans out to downstream MCP credentials with scope rewriting.
What happens to tool calls when Claude Code runs through an MCP gateway?
All five preserve tool calls and emit per-tool spans. The mistake to avoid: wiring only the Anthropic side — tool calls bypass the gateway and per-tool observability is empty.
How is Future AGI Agent Command Center different from Portkey for Claude Code MCP?
Portkey gives you a hosted dashboard, virtual keys, and a Guardrails plugin set. Future AGI gives you the dashboard plus a dedicated MCP Security scanner inline on every call plus a closed loop from trace through evaluation and optimization back into routing. The PANW acquisition of Portkey (April 30, 2026) is also a procurement signal.
Did the April 2026 MCP RCE class change how teams wire MCP gateways?
Yes. Before April 15, 2026 the MCP gateway was an observability convenience. After OX Security's disclosure and the CVE inventory that followed (CVE-2026-30623 in LiteLLM plus assignments across Agent Zero, Upsonic, Flowise, Windsurf, MCP Inspector, and LibreChat), it became the practical place to enforce least-privilege tool access, OAuth 2.1, Streamable HTTP transport, and tool-description scanning.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.