Best AI Gateway to Use with Claude Code in 2026
Five AI gateways scored against Claude Code in 2026: provider breadth, routing, fallback, observability, cost, security, deployment. Opinionated picks with tradeoffs.
Table of Contents
Claude Code is an agent. The gateway you put under it’s plumbing. The plumbing decides whether the agent is auditable, whether it stays online when Anthropic throttles, whether the codebase ever leaves your VPC, and whether the monthly bill is something finance will accept.
This is the generalist buyer’s guide. Not “which gateway watches tokens best”, not “which one caches hardest”. You’re evaluating gateways to sit between Claude Code and api.anthropic.com, you want the full surface (observability, routing, fallback, cost, security, access control, deployment) and you want an opinion. Below: five gateways scored across the seven axes that actually matter.
TL;DR
Future AGI Agent Command Center is the strongest pick for an AI gateway to use with Claude Code because it ships per-developer virtual keys via the ANTHROPIC_BASE_URL swap, cross-developer semantic cache that lifts hit rate across a 25-developer monorepo team, parallel tool-use translation that preserves Claude Code’s tool_use and tool_result blocks intact, and Bedrock alongside Anthropic behind one base URL. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Per-developer attribution, cross-developer cache, parallel tool-use translation, and SOC 2 Type II + HIPAA + GDPR + CCPA certified.
- Portkey — Best for hosted polish with mature RBAC and virtual keys out of the box. Fastest setup if you want a managed product (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- Helicone — Best for the cheapest drop-in if all you need is the dashboard. Lightweight per-call observability with one config line (treat as planned migration after the March 3, 2026 Mintlify acquisition).
- LiteLLM — Best when compliance forbids hosted gateways. Source-available proxy that runs entirely inside your VPC; pin commits after the March 24, 2026 PyPI compromise.
- Kong AI Gateway — Best when the platform team already operates Kong. AI policies layered on an existing Kong stack.
Five gateways, five jobs they do best. The rest of this post explains how each handles Claude Code, where it breaks, and which tradeoff you accept.
Why Claude Code specifically needs a gateway
Claude Code isn’t a chat API client. It’s a long-running coding agent that packs project context, calls Anthropic, parses tool-use blocks, executes bash, edits files, and loops. That shape forces three things on the gateway underneath.
The workload is bursty. A PR-review session might fire 40 turns in 20 minutes, inputs ranging 8K-180K tokens per turn, then nothing for two hours. Static rate limits break this. Anthropic 429s mid-session freeze the developer, and the gateway has to retry without losing the conversation.
Tool calls are first-class. Claude Code’s value comes from calling bash, edit_file, and MCP servers. Gateways that re-serialize Anthropic’s content blocks as plain text, two were in the wild as recently as Q4 2025, silently corrupt tool-use. The developer wastes 45 minutes debugging what looks like a model regression.
The code itself is the prompt. Source code, internal APIs, sometimes secrets. Claude Code packs whatever the developer points it at into context. The gateway sees all of it. Hosted vs self-hosted matters more here than for almost any other workload because the data is the company’s IP.
For the rest of this post, “gateway” means an AI gateway that speaks the Anthropic API. All five picks support ANTHROPIC_BASE_URL redirection.
The seven axes: generalist edition
Sibling posts narrow into one axis (token monitoring, caching, governance). This post stays generalist and scores every pick across all seven.
| Axis | What it measures for Claude Code |
|---|---|
| 1. Provider breadth | Swap Anthropic for Bedrock, Vertex, Azure, or OSS clones without wrapper rewrites? |
| 2. Routing | Route turns to haiku-4-5 vs sonnet-4-6 vs opus-4-7 by context, latency, or cost rule? |
| 3. Fallback | On 429s or 5xxs, does the gateway fail over without dropping the session? |
| 4. Observability | Per-session traces, tool-call visibility, latency percentiles, OTel export? |
| 5. Cost | Per-developer / per-repo attribution, budget caps with auto-pause, chargeback exports? |
| 6. Security | PII redaction, prompt-injection detection, secret scanning, audit logs? |
| 7. Deployment | Hosted, BYOC, self-hosted, air-gapped — and the operational story for each? |
How we picked
We started from public AI gateways shipping an Anthropic-compatible endpoint as of May 2026. Removed: two early proxies that re-serialize content blocks (breaking tool-use), gateways without per-key metadata pass-through, OpenRouter (chargeback doesn’t fit teams), TrueFoundry (Claude Code integration unstable in May 2026). The five left handle tool-use correctly, preserve SSE streaming, and expose session metadata.
1. Future AGI Agent Command Center: Best for closing the loop
Verdict: Future AGI is the only gateway here that uses captured Claude Code traces to improve the routing and prompts it then deploys. Everyone else is an observation layer; Agent Command Center is an observation layer wired to an optimizer and back to the routing path. That difference compounds.
Across the seven axes. Anthropic native, Bedrock-Claude, Vertex, OpenAI, Gemini, and OSS clones behind one endpoint. Routing is policy-driven, the default sends turns under 10K tokens to claude-haiku-4-5, 10K-60K to claude-sonnet-4-6, the rest to claude-opus-4-7, and the router updates the policy automatically from eval scores. Fallback chains fire on 429s and 5xxs within the same session. Per-session traces with the session ID as top-level span, each turn a child span, tool calls (bash, edits, MCP) as their own spans, OTel export via traceAI (Apache 2.0). Per-developer attribution via fi.attributes.user.id, per-repo via span attributes; budget caps with soft-alert at 80%, hard-pause at 110%. Live Protect guardrails inline. PII, prompt-injection, secret, toxicity classifiers at ~67ms median for text (arXiv 2510.13351); immutable audit log; SOC 2 Type II certified; BAA available. SaaS US/EU, BYOC, or fully self-hosted, traceAI, ai-evaluation, and agent-opt are Apache 2.0.
The loop. Traces feed fi.evals (faithfulness, code-correctness, tool-use). Low-scoring sessions cluster by failure mode. fi.opt.optimizers (ProTeGi, Bayesian, GEPA) rewrite the prompt or adjust routing against those failures. Teams typically see 15-30% spend reduction within four weeks because the router gets better at choosing the cheaper model for easy turns.
Where it falls short:
-
agent-opt is opt-in, for small teams or low-volume usage, start with traceAI + ai-evaluation and turn the optimizer on once eval baselines stabilize.
-
The prompt-library UI is less mature than Portkey’s. If a shared versioned prompt library is your daily tool, Portkey wins on that one feature.
-
Protect adds ~67ms to first-token latency, invisible for interactive coding, but worth profiling if you care about sub-100ms.
-
BYOC requires a Future AGI solutions engineer for the first week. Not pure self-serve at the enterprise tier.
Pricing: Free tier with 100K traces / month. Scale from $99/month. Enterprise custom with SOC 2 Type II, BAA, dedicated support. AWS Marketplace listing for procurement.
Score: 7/7 axes.
2. Portkey: Best for hosted polish + mature RBAC
Verdict: Portkey is the most polished hosted-only gateway here. If your priority is per-developer virtual keys, prompt versioning, and a clean dashboard without standing up infra, it’s the fastest path to production. No optimizer, and the metadata-header model means your wrapper has to set tags. But in its lane it’s the cleanest product on this list.
Across the seven axes. Anthropic, Bedrock, Vertex, OpenAI, Gemini, Mistral, Cohere, most OSS. Strategy-based routing, fallback, load-balancing, conditional, round-robin, all versioned in the dashboard. Strong fallback chains (Anthropic 429 → Bedrock-Claude → haiku-4-5 is typical). Per-request traces with latency, cost, errors; session grouping requires the wrapper to set trace_id; tool-call visibility preserved but the UI flattens nested calls more than Future AGI. Virtual-key system is the strongest here, each developer gets a virtual key fanning out to one Anthropic key, keeping bulk pricing while slicing chargeback per developer. Guardrails plugin marketplace (third-party PII / injection / secret scanners run inline); SOC 2 Type II; workspace-scoped RBAC. Hosted US/EU; BYOC heavier than Future AGI’s.
Where it falls short:
- No optimizer. Traces inform humans, not the gateway; routing policy stays static.
- The metadata-header model requires wrapper changes. Without that, you get key-level aggregation, not session-level.
- Pricing escalates above 5M requests/month faster than the lighter alternatives.
- Guardrails are plugin-marketplace, not native, paying a third party or running their containers; latency adds.
- BYOC is mature but not air-gapped. Isolated environments still need LiteLLM or Future AGI’s full self-host.
Pricing: Free with 10K requests/day. Scale from $99/month. Enterprise custom with SOC 2 Type II and SSO.
Score: 6/7 axes (missing: feedback loop / optimizer).
3. Helicone: Best for lightweight, drop-in observability
Verdict: Helicone is the right pick when you want per-request observability for Claude Code and nothing else, configured before lunch. Replace api.anthropic.com with Helicone’s Anthropic endpoint, add the auth header, done. Tradeoff: everything beyond observation is shallow or absent.
Across the seven axes. Anthropic, OpenAI, Gemini, Bedrock, Vertex, OSS via custom integration. Routing is basic, round-robin and failover, no conditional routing by token count. Simple cross-provider failover, sufficient for “if Anthropic is down, try Bedrock”. Observability is the strength: per-request traces with cost, latency, token counts; per-session via Helicone-Session-Id, per-developer via Helicone-User-Id, custom properties for repo or team. Usage alerts and rate-limit policies; budget caps less expressive than Portkey’s auto-pause; chargeback UX is “download CSV, build the dashboard yourself”. Basic prompt-tracking, no inline guardrails. Hosted by default; OSS self-host available but operational scale-out past a few hundred RPS is real work.
Where it falls short:
- No optimizer, no prompt library, no native guardrails.
- Routing intelligence is the weakest of the five. Claude-Code-specific routing has to live in a wrapper your team owns.
- OSS self-host is functional but maintainer cadence is uneven; releases ship in bursts, ops automation is thin.
- Per-session attribution requires the wrapper to set the header. Without it, session grouping is hand-wavy.
- Cohort views by repo or team require dashboard configuration and sometimes SQL exports.
Pricing: Free with 10K requests/month. Pro from $25/month. Enterprise custom.
Score: 4.5/7 axes (missing: feedback loop, mature alerting, native guardrails).
4. LiteLLM: Best for self-hosted, source-available control
Verdict: LiteLLM is the pick when Claude Code traffic can’t leave your VPC and the security team wants to read every line of code that touches a prompt. Python-native, MIT-licensed, runs inside your infra. Widest provider coverage here. Native observability is thin, plan to wire traceAI, Langfuse, or another OTel sink behind it. But the self-host posture is unmatched.
Across the seven axes. Highest provider breadth, 100+ including every commercial vendor and a long tail of self-hosted inference (vLLM, TGI, Ollama, Together, Modal). Strong router: fallback, load-balancing, conditional, retry-with-different-model, latency-based, cost-based, YAML or Python, hot-reloaded. Best-in-class fallback chains, cross-provider (Anthropic → Bedrock-Claude → haiku-4-5 → Vertex-Claude). Native observability is thin, per-request logs, spend tracking, virtual-key usage; anything richer needs an external OTel sink, and most LiteLLM teams pair it with Future AGI traceAI or Langfuse. Per-key, per-team, per-user budgets; per-repo needs custom metadata and a downstream dashboard. Configurable PII redaction, community prompt-injection plugins, OIDC SSO, audit log. Deployment is strongest here: container, Kubernetes, fly.io, or Mac for local dev; no telemetry leaves the VPC; air-gapped works.
Where it falls short:
- UI is functional, not polished. Slicing by developer or repo means a SQL dashboard.
- “Observability included” is thinner than Portkey or Helicone.
- No optimizer.
- Python-native means a Python process in the critical path, if your platform team is allergic to that, Kong’s Lua-based stack is easier to swallow.
- Plugin ecosystem maturity varies. Some guardrails and provider integrations are community-maintained and lag upstream API changes.
Pricing: Open source under MIT. Enterprise tier with SLA, SSO, audit, managed support; published pricing starts around $250/month for small teams.
Score: 5.5/7 axes (missing: native polished observability, optimizer; partial: mature guardrails).
5. Kong AI Gateway: Best if you already run Kong
Verdict: Kong AI Gateway is the right pick when the platform team already operates Kong for REST APIs and the path of least resistance is extending that stack. Strengths: SLA, plugin ecosystem, ops familiarity. Weakness: AI-specific shallowness, observability and chargeback happen through plugins. If you don’t already run Kong, don’t adopt it for AI alone.
Across the seven axes. Anthropic, OpenAI, Bedrock, Vertex, Azure OpenAI, Cohere, Mistral via AI Proxy plugin (Kong 3.6+). Routing through the admin API is less expressive than Portkey or LiteLLM. Fallback via circuit-breaker and retry plugins, not native to the AI plugin. OpenTelemetry plugin captures the request lifecycle; AI-specific span attributes need Lua. Spend tracking is plugin-driven; consumer + tag patterns give per-developer and per-team attribution with Grafana on top. Security inherited from Kong’s general stack (mTLS, OAuth2, JWT, rate limiting); AI-specific guardrails plugin-marketplace with varying maturity. Deployment: anywhere Kong runs. Konnect, OSS, air-gapped.
Where it falls short:
- AI-specific observability is plugin-driven, not native. Expect two weeks of platform-team time to wire the chargeback view finance will accept.
- No optimizer.
- Default dashboard is API-gateway-shaped (4xx/5xx, latency, throughput). Translating into “Claude Code cost per developer” is a Grafana exercise.
- Spend-tracking requires multiple plugins wired together.
- AI Proxy plugin is newer than the rest of Kong, edge cases (multi-turn tool-use, specific Anthropic SDK headers) occasionally surface as plugin bugs needing upstream fixes.
Pricing: Kong is open source. Konnect (managed) free tier exists. Enterprise plans from $1.5K/month.
Score: 5/7 axes (missing: native AI observability, optimizer, polished cost dashboard).
Capability matrix
| Axis | Future AGI | Portkey | Helicone | LiteLLM | Kong |
|---|---|---|---|---|---|
| Provider breadth | Anthropic + 30+ | Anthropic + 25+ | Anthropic + 15+ | Anthropic + 100+ | Anthropic + 20+ via plugin |
| Routing | Policy + adaptive | Strategy-versioned | Round-robin + failover | YAML router | Plugin-driven |
| Fallback | Deterministic chain | Strong chain | Simple failover | Best-in-class chain | Circuit-breaker plugins |
| Observability | Native + OTel | Native + plugin | Native dashboard | Thin; pair with OTel | OTel plugin |
| Cost | Native + auto-pause | Virtual keys + Slack | Custom properties | Per-key budgets | Tags + Grafana |
| Security | Live Protect ~67ms | Guardrails marketplace | Tracking only | Configurable plugins | Kong stack + AI plugins |
| Deployment | Hosted + BYOC + OSS | Hosted + BYOC | Hosted + OSS | OSS + Enterprise | OSS + Konnect + air-gap |
| Feedback loop | Yes via fi.opt | No | No | No | No |
Decision framework
Choose Future AGI when traces should drive routing and prompt updates that compound, Claude Code is a meaningful line item ($10K+/month), and you want the cost-quality curve to bend downward. Also if you need inline guardrails at sub-100ms.
Choose Portkey when you want a hosted gateway with mature RBAC, virtual keys, and a polished dashboard, and you don’t yet need an optimizer.
Choose Helicone when you want the lightest drop-in for per-call observability and don’t need routing intelligence or budget enforcement. Good for teams under 10 developers, knowing you’ll likely outgrow it within a year.
Choose LiteLLM when compliance requires Claude Code traffic to never leave the VPC. Source-availability beats hosted polish, and you have engineering capacity to layer an observability sink behind it.
Choose Kong AI Gateway when you already operate Kong for REST APIs. Platform familiarity outweighs AI-specific shallowness, and you have budget for two weeks of plugin work upfront.
Common mistakes when wiring Claude Code through a gateway
| Mistake | What goes wrong | Fix |
|---|---|---|
| Pointing only the IDE plugin at the gateway | Terminal CLI usage hits Anthropic direct; chargeback misses half the traffic | Set ANTHROPIC_BASE_URL in the shell profile, not just IDE |
| Sharing one team key across developers | All sessions look identical to the dashboard | Issue virtual keys per developer |
Not preserving anthropic-version header | Tool-use behavior silently differs from expected | Pin the version explicitly in the forwarding rule |
| Buffering streaming responses | Claude Code’s progress UI freezes mid-turn | Confirm gateway forwards SSE without buffer-and-batch |
Tagging only user_id, not session_id | Session-level attribution impossible | Tag both |
| Treating tool-call passthrough as “probably fine” | Two proxies still re-serialize blocks as text | Run a tool-use regression test before rolling out |
| Budget caps too low without alerts | Claude Code pauses mid-conversation | Soft-alert at 80%, hard-pause at 110% |
| Forgetting MCP traffic | MCP server calls bypass the proxy unless wired | Configure MCP endpoints through the same gateway base URL |
How Future AGI closes the loop on Claude Code
The other four gateways treat the trace as an end state, capture, dashboard, alert. Future AGI treats it as input to a six-stage loop.
- Trace. Every turn becomes a span tree via
traceAI. Inputs, outputs, tool calls, model, latency, cost, session ID, captured. - Evaluate.
fi.evalsscores every turn against task-completion, faithfulness, code-correctness, and tool-use rubrics. Scores live alongside cost: more than “$4.20 for this session” but “$4.20 for code-correctness 0.41, it burned tokens to fail.” - Cluster. Low-scoring sessions cluster by failure mode. Common ones: “Opus called when Sonnet would have been enough” and “Tool-use accuracy collapses past 120K tokens.”
- Optimize.
fi.opt.optimizers(ProTeGi, Bayesian, GEPA) rewrite the prompt or adjust routing against the clustered failures. - Route. Agent Command Center applies the updated policy on the next request, hot-reload, no redeploy.
- Re-deploy. Prompt and route versioned; if score regresses on the next eval window, automatic rollback.
Compound effect: teams starting at $40K/month on Claude Code typically trend down 15-30% within four weeks without changing developer behavior. The three open-source building blocks (traceAI, ai-evaluation, agent-opt) are all Apache 2.0. Hosted Agent Command Center adds the failure-cluster view, Live Protect, RBAC, SOC 2 Type II certified, BAA, and AWS Marketplace.
FAQ
What is an AI gateway and why does Claude Code need one? A proxy between Claude Code and Anthropic that intercepts requests, adds metadata, applies routing or fallback, captures observability, and can enforce guardrails. Claude Code benefits because the workload is bursty, tool-use is easy to break, and source code flows through it.
Does Claude Code support OpenAI-compatible endpoints? Claude Code speaks the Anthropic API natively. All five gateways here support ANTHROPIC_BASE_URL, no OpenAI shim needed.
Can I route Claude Code through multiple model providers? Yes, with care. Claude Code is tuned for Claude, routing to non-Claude often degrades tool-use. Safe pattern: route between haiku-4-5, sonnet-4-6, opus-4-7 by token budget, and use cross-provider routing only for fallback.
How do I track Claude Code cost per developer when everyone shares one API key? Use a gateway with virtual keys (Future AGI, Portkey, LiteLLM). Each developer gets a virtual key fanning out to the team key, preserving bulk pricing while letting the gateway aggregate spend per developer.
What happens to tool calls when Claude Code runs through a gateway? All five gateways here preserve tool-use blocks intact as of May 2026. Two excluded proxies re-serialized blocks as plain text. Run a regression test before adopting.
Is it safe to send source code through an AI gateway? For hosted gateways, the flow is gateway → Anthropic; both endpoints see the code. If compliance forbids both, the safe options are self-hosted LiteLLM, fully self-hosted Future AGI, or Kong air-gapped, all inside your VPC.
How is Future AGI Agent Command Center different from Portkey or Helicone? Portkey and Helicone observe and route. Future AGI adds an optimization layer, trace data feeds back into prompt rewrites and routing updates, so the gateway improves over time.
What we did not include
Three gateways that show up in other 2026 listicles but didn’t make the cut: OpenRouter (consumer-facing, chargeback doesn’t fit teams), Cloudflare AI Gateway (Claude Code integration thin, observability lacks per-developer slicing without custom code), TrueFoundry (Claude Code integration unstable in our May 2026 testing).
Related reading
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- Best LLM Gateways in 2026
- Best AI Gateways for Agentic AI in 2026
Sources
- Anthropic Claude Code documentation, claude.ai/docs/claude-code
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms median text guardrail latency, 109ms image)
- Portkey AI gateway, portkey.ai
- Helicone proxy, helicone.ai
- LiteLLM proxy, github.com/BerriAI/litellm
- Kong AI Gateway, konghq.com/products/kong-ai-gateway
Five AI gateways scored on caching Claude Code calls in 2026: cross-developer cache scope, semantic-match thresholds, hit-rate observability, TTL controls, and what each one misses.
Five tools for Claude Code cost management in 2026 — four gateways plus the native Anthropic dashboard and a FinOps platform — scored on attribution, chargeback, caps, routing, cache observability, FinOps integration, and audit trail.
Five AI gateways scored on Claude Code token monitoring in 2026: per-developer attribution, per-repo budgets, session traces, alert routing, and what each gateway misses.