How to Connect Claude Code to an MCP Gateway in 2026
Step-by-step walkthrough for wiring Claude Code to an MCP gateway in 2026: mcp.json config, routing rules, per-server auth scoping, and verification. With production checklist and gateway picks.
Table of Contents
A Claude Code session with eight MCP servers registered opens eight long-lived connections, pulls every tool description into the model’s inventory, and is one prompt-injection away from invoking filesystem.write with arguments inherited from a poisoned description. Token spend you can see. MCP tool calls you can’t, not without a gateway between client and servers.
This is the implementation walkthrough. For the scored comparison see the sibling listicle. What you get here: four steps that produce a working connection (mcp.json edits, routing rules, per-server auth scoping, verification) plus the production checklist. Written for Claude Code 1.x as of May 2026. Examples use Future AGI Agent Command Center; the shape is identical for the other four gateways at the end.
The problem: what breaks without an MCP gateway in front of Claude Code
Three things break.
Tool calls disappear from observability. Claude Code logs one summary line per MCP invocation. The Anthropic dashboard records nothing about MCP, execution happens client-side. Every failure (a 4-second postgres.query, a filesystem.list that dropped 28K tokens of node_modules into the next turn, a 500 silently retried) is invisible to any production dashboard unless something between client and server captures it.
Each MCP server holds its own credential. API keys for one, OAuth for another, a STDIO process with no auth for the third. The Claude Code process is the keyring. After OX Security’s April 15, 2026 disclosure of the STDIO RCE class, arbitrary command execution through process names passed to STDIO across the official Python, TypeScript, Java, and Rust SDKs, centralising stopped being a convenience and became the production line.
Cost is non-obvious. Every MCP result is re-serialised into the next turn’s input. A tool returning a 12,000-token table adds 12,000 tokens to every subsequent turn for the rest of the session.
A gateway fixes all three. Topology: Claude Code → federation endpoint → registered servers. The gateway parses tools/list and tools/call, attaches span attributes, enforces auth, and forwards.
Prereqs
- Claude Code 1.3+. Earlier versions don’t support per-server URL rewrites in
mcp.json. Check withclaude --version. - An MCP gateway endpoint. Hosted or self-hosted. Walkthrough uses
https://gateway.futureagi.com/v1/mcp. - Gateway identity credentials. Future AGI uses
FI_API_KEYandFI_SECRET_KEY. Others use the equivalent. Portkey virtual key, Bifrost API key, Kong consumer credentials, agentgateway OAuth client. - Downstream MCP server credentials. Whatever the old
mcp.jsonheld directly. Postgres connection string, GitHub PAT, S3 keys. These move to the gateway in step 3. - Network egress to the gateway. If self-hosted in your VPC, confirm the developer laptop can reach it.
Set the environment variables once in your shell profile.
# ~/.zshrc or ~/.bashrc
export FI_API_KEY="fi_live_..."
export FI_SECRET_KEY="..."
export FI_PROJECT="claude-code-prod"
export ANTHROPIC_BASE_URL="https://gateway.futureagi.com/v1/anthropic"
ANTHROPIC_BASE_URL handles the LLM-side gateway hop. It’s separate from the MCP gateway path; the two wire independently. A common mistake is configuring only one side and missing half the traffic.
Architecture overview
Claude Code talks to one federation endpoint; the gateway fans out to each MCP server with the right credential per call.
+-------------------+
| Claude Code |
| (developer |
| laptop) |
+--------+----------+
|
| HTTPS / Streamable HTTP
| + OAuth 2.1 bearer (per-agent identity)
|
v
+-------------------+
| MCP Gateway | <-- traceAI spans
| (FAGI ACC, | MCP Security scanner
| Portkey, etc.) | OAuth 2.1 boundary
+--+----+----+----+-+ per-agent allowlist
| | | |
| | | +-------> filesystem MCP server
| | +------------> postgres MCP server
| +-----------------> github MCP server
+----------------------> custom search MCP server
The gateway holds the downstream credentials; Claude Code doesn’t. Each session arrives with a per-agent identity; the gateway rewrites scopes so the search server only sees a search-scoped token, never filesystem writes. Every tool call becomes an OpenTelemetry span keyed by mcp.tool.name, mcp.server.id, user.id, and session.id.
Step 1: register the gateway in Claude Code’s mcp.json
Claude Code reads MCP servers from ~/.claude/mcp.json plus a per-project override. The federation pattern is one entry pointing at the gateway; the gateway returns every downstream server through tools/list.
Edit ~/.claude/mcp.json:
{
"mcpServers": {
"fagi-gateway": {
"transport": "streamable-http",
"url": "https://gateway.futureagi.com/v1/mcp",
"headers": {
"Authorization": "Bearer ${FI_API_KEY}",
"X-FI-Secret": "${FI_SECRET_KEY}",
"X-FI-Project": "${FI_PROJECT}",
"X-FI-Agent": "claude-code",
"X-FI-User": "${USER}@futureagi.com"
},
"timeoutMs": 30000
}
}
}
Three things matter.
transport: streamable-http is the 2026 default. STDIO is the transport the April 15, 2026 disclosure targeted, don’t use it across the network. Local STDIO servers (a desktop file picker) get allowlisted server-side after the gateway is wired.
Header substitution with ${VAR}. Claude Code 1.3+ resolves env vars in mcp.json at load time, keeping secrets out of the file. If your team commits mcp.json to the repo, only variable names ship.
X-FI-User and X-FI-Agent. These headers drive per-developer attribution. X-FI-User is the identity claim, usually the SSO email. X-FI-Agent distinguishes Claude Code traffic from Cursor or notebook traffic. Without both, the dashboard collapses to aggregate numbers per project.
After saving, restart Claude Code (/quit then re-open). On startup it calls tools/list against the gateway, and the gateway returns the merged inventory across every downstream server. To the agent it looks like one big server with filesystem.read, pg_query, gh_create_issue, etc., all in one namespace.
Common pitfall: if downstream servers were registered directly in mcp.json previously, remove those entries before adding the gateway. Otherwise Claude Code holds two paths to the same tools and the dashboard double-counts.
Step 2: configure gateway routing rules
Tell the gateway which downstream MCP servers to federate and how to route tools/call requests. Shape varies by gateway. Future AGI uses YAML, either in the hosted console or in routes.yaml for self-hosted. A three-server federation:
# routes.yaml
mcp:
federation_id: claude-code-prod
transport: streamable-http
servers:
- name: filesystem
transport: streamable-http
url: https://filesystem-mcp.internal.example.com
tool_prefix: ""
auth:
type: bearer
secret_ref: FILESYSTEM_MCP_TOKEN
allowlist:
- filesystem.read
- filesystem.list
- filesystem.write
timeout_ms: 5000
- name: postgres
transport: streamable-http
url: https://postgres-mcp.internal.example.com
tool_prefix: pg_
auth:
type: oauth2_client_credentials
token_url: https://idp.example.com/oauth/token
client_id_ref: POSTGRES_MCP_CLIENT_ID
client_secret_ref: POSTGRES_MCP_CLIENT_SECRET
scopes: [read]
allowlist:
- pg_query
row_limit: 500
timeout_ms: 8000
- name: github
transport: streamable-http
url: https://api.github.com/mcp
tool_prefix: gh_
auth:
type: bearer
secret_ref: GITHUB_PAT
allowlist:
- gh_create_issue
- gh_list_prs
- gh_get_file
timeout_ms: 10000
scanners:
discovery:
- prompt_injection
- tool_poisoning
- secrets_in_description
invocation:
- path_traversal
- sql_injection
- command_injection
- pii_in_arguments
retries:
max: 2
backoff_ms: 250
on_status: [502, 503, 504]
rate_limits:
per_user_per_minute: 120
per_session_per_minute: 600
Four things to call out.
tool_prefix prevents namespace collisions when two servers expose a query or a list. Postgres query becomes pg_query.
allowlist stops the agent from invoking tools you forgot were registered. Start narrow, audit for a week, expand. Don’t use allow_all: true “just to test”, that’s what most teams forget to tighten before prod.
row_limit on pg_query is the kind of per-server guardrail that doesn’t exist in vanilla MCP. The gateway truncates oversized result sets, preventing a 50,000-row table from blowing the next turn’s context. In our usage, this single rule cuts Claude Code input-token spend by about 8% week-over-week.
scanners runs at both discovery and invocation. At discovery, the gateway scans tool description text the model will see, this is where tool-poisoning lives. At invocation, it scans arguments, where path traversal, SQL injection, and command injection live. The Future AGI Protect model family runs inline at ~67 ms p50 text and ~109 ms p50 image per arXiv 2510.13351. FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain. Error Feed sits alongside as FAGI’s “Sentry for AI agents”, auto-clustering related scanner-blocked or argument-validation failures into named issues (50 traces → 1 issue) with auto-written root cause plus quick fix plus long-term recommendation per issue.
After saving, reload (fagi gateway reload for self-hosted; automatic for hosted). The next tools/list reflects the new inventory.
Step 3: wire auth scoping per MCP server
Step 2 declared which credential each downstream server uses. Step 3 makes sure those credentials never widen the agent’s effective scope.
The pattern: the developer’s per-agent identity is the only credential that crosses the laptop boundary. The gateway holds every downstream secret. When it calls a downstream server, it presents a credential scoped to exactly the operations the per-agent identity is allowed to perform. The agent never sees the downstream credential.
Three concrete cases.
Case 1, wide bearer token (filesystem). The filesystem MCP token grants read-write to the entire mounted volume. You don’t want every Claude Code session to inherit “delete anything.” Use per-agent allowlist plus path scoping:
- name: filesystem
auth:
type: bearer
secret_ref: FILESYSTEM_MCP_TOKEN
per_agent_scope:
"claude-code":
allowlist: [filesystem.read, filesystem.list, filesystem.write]
path_prefix: /workspace/
deny_paths:
- /workspace/secrets/
- /workspace/.env
- /workspace/.git/config
The downstream server still receives a wide token; the gateway enforces the narrow scope before forwarding. If Claude Code asks filesystem.write on /workspace/.env, the gateway returns a structured error and never touches the server.
Case 2. OAuth 2.1 client credentials with downscoping (Postgres). The IdP supports scope claims. The gateway requests a token with only the scopes the per-agent identity is allowed to use:
- name: postgres
auth:
type: oauth2_client_credentials
token_url: https://idp.example.com/oauth/token
client_id_ref: POSTGRES_MCP_CLIENT_ID
client_secret_ref: POSTGRES_MCP_CLIENT_SECRET
scopes: [read]
per_agent_scope:
"claude-code":
allowlist: [pg_query]
scope_override: [read]
schema_allowlist: [public, analytics]
The gateway’s token could be issued with [read, write, admin]. The per-agent override clamps to [read] before the call. Claude Code can’t escalate.
Case 3. STDIO with sanitization (local-only). If you must use STDIO for a local server, run the sanitizer on process invocation:
- name: local_desktop
transport: stdio
command: /usr/local/bin/desktop-mcp
args: ["--readonly"]
sanitize_stdio: true
per_agent_scope:
"claude-code":
allowlist: [desktop.list_apps]
sanitize_stdio: true blocks the April 15, 2026 RCE class by validating that command and args exactly match the config. Claude Code can’t spawn arbitrary processes through tool arguments.
Audit-mode tip: run with enforce: false for a week before enforcing. The audit log surfaces legitimate calls an over-tight policy would block. Then flip enforce: true. Skipping this is how a Friday deploy breaks half the developers Monday morning.
Step 4: verify tool calls are traced
Three checks. Run them in order.
Check 1, gateway sees the connection. From the Claude Code CLI:
$ claude
> /mcp list
One entry, fagi-gateway, status connected, with merged tools including filesystem.read, pg_query, gh_create_issue. Status error is almost always a missing or expired FI_API_KEY.
Check 2, tool call produces a span. Trigger an MCP call:
> read the package.json in this repo
Claude Code invokes filesystem.read. In the Agent Command Center dashboard, the session appears under “Recent traces” within seconds. The tree:
session_id: abc123
└── anthropic.messages.create (model: claude-opus-4-7)
└── mcp.tool.call (tool: filesystem.read, server: filesystem)
duration: 142ms
attributes:
mcp.tool.name: filesystem.read
mcp.server.id: filesystem
mcp.arguments: {"path": "package.json"}
user.id: nikhil@futureagi.com
fi.scanner.verdict: pass
If the model call is there but the MCP child span is missing, the gateway intercepted LLM traffic but not tool traffic. Re-check mcp.json, transport must be streamable-http.
Check 3, failures are captured. Trigger one on purpose:
> read the file at /etc/passwd
Path is outside /workspace/. Gateway returns 403, span records fi.scanner.verdict: blocked. Claude Code surfaces a tool error; agent recovers. The blocked call appears in the audit log with full argument payload.
If all three pass, the gateway is wired. Every MCP call now produces spans you can slice by tool, server, user, session, verdict.
Production checklist
Six operational concerns that bite in production.
| Concern | What to check |
|---|---|
| Latency overhead | Measure p50 and p95 gateway hop. For text-only scanners, Protect adds ~67ms per arXiv 2510.13351; image scanning adds ~109ms. End-to-end MCP call latency should stay under ~250ms p95 for in-region gateway deployments; flag anything over 400ms. |
| Failure isolation | What happens if the gateway is down? Set Claude Code’s timeoutMs to fail fast (30s) and provide a documented fallback flow (“re-enable direct MCP for read-only servers under incident”). Do not silently fall back without auditing. |
| Cost attribution | Confirm X-FI-User and X-FI-Agent are populated on every request. Spot-check the dashboard — group-by-user should show every developer, not “unknown” buckets. |
| Audit log retention | Where do blocked and error spans live? For SOC 2, you want 90 days minimum; for HIPAA-adjacent workloads, 6 years. Configure span export to your long-term store, not just the live dashboard. |
| Cold start after deploy | The first call after a gateway restart will be 200-400ms slower while connection pools warm up. If your CI runs Claude Code on every PR, this matters; pre-warm the gateway with a synthetic tools/list call as part of the deploy script. |
| Rollback in 60 seconds | Have a procedure to disable the gateway hop in one command. For Future AGI, that is unset ANTHROPIC_BASE_URL; mv ~/.claude/mcp.json ~/.claude/mcp.json.gateway; cp ~/.claude/mcp.json.direct ~/.claude/mcp.json. Keep the direct-connection mcp.json checked in so you can restore in under a minute if the gateway has a bad day. |
Tool-call success rate is the headline metric. Without a gateway, teams typically see 88-91% (timeouts, transient 5xx, arg-format issues). With a gateway plus retries and per-server timeouts, the same team climbs to 96-98%. Track it; roll back any config change that drops the number below baseline.
Brief gateway picks
Five gateways that implement the federation pattern. The walkthrough works against any with small changes to header names and routing-config syntax. For the scored comparison see the sibling listicle.
Future AGI Agent Command Center. Apache 2.0 traceAI plus hosted Agent Command Center. Only entry piping MCP spans into fi.evals and agent-opt for a self-improving loop, with dedicated MCP Security scanner inline at ~67ms text per arXiv 2510.13351.
Maxim Bifrost. Apache 2.0 Go binary doing LLM gateway plus MCP federation in one process. Vendor benchmark ~11µs P50 at 5K RPS (gateway overhead, not end-to-end). Pick when raw throughput is the binding constraint.
Portkey. Hosted with the most polished UI, virtual keys, mature RBAC. April 30, 2026 Palo Alto Networks acquisition merges roadmap into Prisma AIRS, verify multi-year continuity before signing.
Kong AI Gateway. Right pick if you already run Kong. AI Proxy plus OAuth plugins extend the existing stack. Plan two weeks of platform-team time for the MCP dashboard.
agentgateway.dev. Linux Foundation-hosted, Apache 2.0, vendor-neutral. Pick when governance is the binding constraint. Dashboard is thinner; pair with your own Grafana stack.
Where this fits in the FAGI loop
The walkthrough above is a one-time configuration. To make it self-improving, wire fi.evals to score every MCP span on tool-call accuracy and task-completion; feed low-score traces into agent-opt (Apache 2.0; ProTeGi, Bayesian, GEPA). Typical Claude Code rewrite: drop 14 of 38 registered tools the agent never invokes, each costs ~180 input tokens per session at discovery. Net effect across a team running 22 sessions a day: input-token spend trends down 12-18% in four weeks, tool-call failure rate drops from ~12% to 3-4%. No developer behaviour change.
Apache 2.0 building blocks: traceAI, fi.evals, agent-opt at github.com/future-agi. Hosted Agent Command Center adds failure-cluster views, Protect guardrails, MCP Security scanner, RBAC, SOC 2 Type II, HIPAA BAA, AWS Marketplace.
Related reading
- Best 5 MCP Gateways for Claude Code in 2026
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- Best 5 MCP Gateways in 2026: Post-RCE Production Picks
- What Is an AI Gateway? The 2026 Definition
Sources
- Anthropic Claude Code MCP documentation, claude.ai/docs/claude-code/mcp
- Model Context Protocol specification 2025-11-25, modelcontextprotocol.io/specification/2025-11-25
- OX Security advisory on MCP STDIO RCE class (April 15, 2026), ox.security/blog/mcp-supply-chain-advisory-rce-vulnerabilities-across-the-ai-ecosystem
- Future AGI Agent Command Center docs, docs.futureagi.com/docs/command-center
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Future AGI traceAI, github.com/future-agi/traceAI (Apache 2.0)
- Future AGI agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
- Maxim Bifrost benchmarks, getmaxim.ai/bifrost/resources/benchmarks
- Portkey AI gateway, portkey.ai
- Kong AI Gateway, konghq.com/products/kong-ai-gateway
- agentgateway.dev, agentgateway.dev (Linux Foundation project page)
Frequently asked questions
Do I have to use the Future AGI MCP gateway, or can I use a generic OSS one?
How do I roll back if the gateway breaks something?
How much latency does the gateway add?
Is it safe to run this in production?
What about MCP servers that only support STDIO?
A practical 2026 how-to for cutting MCP token spend on Claude Code at fleet scale: five levers, the mcp.json + gateway config that wires them, the metrics that prove the cut held.
How to run Claude Code against OpenAI GPT-5 and GPT-4 via a translation gateway in 2026. Setup walkthrough, ENV vars, config snippets, then five gateways scored on translation fidelity.
A practitioner's guide to cutting Claude Code token spend with five stackable levers — native cache_control, MCP-tool compilation, semantic caching, model right-sizing, and context pruning — with worked math and an honest read on where the 90 percent claim holds.