Best 5 AI Gateways for Routing Claude Code Requests in Production in 2026
Five AI gateways scored on routing Claude Code requests in production: deterministic policy expressiveness, per-region routing, failover semantics, P99 overhead, observability for routing decisions, cache interaction, and burst resilience.
Table of Contents
The first time Anthropic returned a sustained wall of 529s on a Friday afternoon, the SRE on call learned three things. One, the Claude Code fleet had no deterministic fallback. Two, the gateway in front was making routing decisions by calling another LLM to classify each turn, and that classifier was itself a billable Anthropic call stuck behind the same 529s. Three, the dashboard reported how many requests had failed but not why each had been routed where it was routed. The post-mortem ran four weeks.
Routing Claude Code in production isn’t monitoring Claude Code in production. Monitoring is attribution after the fact. Routing is decisions on the hot path, every turn, under a P99 budget of 50ms gateway overhead and zero tolerance for the gateway becoming the dependency that takes the workload down. All five gateways here advertise routing. Only some route in a way an SRE will sign off on at 3 AM.
TL;DR
Future AGI Agent Command Center is the strongest pick for routing Claude Code in production because the hot routing path stays deterministic at sub-millisecond decisions (no LLM in the loop), with per-region pools, sticky session routing via consistent-hash on session ID, explicit fallback graphs (primary region → secondary region → Bedrock → fail-fast 503), and OpenTelemetry-native MTTR telemetry per route. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Deterministic hot path (P50 of 8ms, P99 of 34ms), per-region pools, sticky session routing, and explicit fallback graphs.
- Portkey — Best for mature config-driven fallback chains. Config-as-code fallback graphs and per-virtual-key region routing (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- Kong AI Gateway — Best for real per-region routing on top of Kong’s existing data planes. Self-hosted enterprise API gateway with global anycast.
- LiteLLM — Best for source-available Python-native routing that runs in your VPC. Deterministic fallback config; pin commits after the March 24, 2026 PyPI compromise.
- Cloudflare AI Gateway — Best for the lowest gateway-overhead P99 at the edge. Sub-15ms overhead, but routing policy is shallow.
Why routing Claude Code in production is its own problem
A coding agent in production isn’t a chatbot. Each Claude Code invocation opens a session that may span hundreds of turns, with the same session needing claude-haiku-4-5 for a one-line lint and claude-opus-4-7 for a 180K-token refactor. The session must survive a regional Anthropic incident, and each routing decision has to happen in single-digit milliseconds. Four properties make the routing layer hard:
-
Routing decisions are hot-path. The gateway has roughly 50ms of overhead before a developer notices it exists. That covers the routing decision, header rewriting, TLS termination, observability emission, and the regional-pool lookup. No room for an LLM-judge route decision: we have measured Claude-Haiku-classifier routing add 380-450ms to P50 and 900ms+ to P99.
-
Failover isn’t retry. When Anthropic’s us-east cluster returns a wall of 529s, retrying the same upstream three times amplifies the incident. The right behavior is a deterministic fallback graph: Anthropic us-east → Anthropic us-west → Bedrock claude-sonnet-4-6 → fail-fast, with an observable event per hop. Three of the five gateways below do this; two pretend to.
-
Per-region routing affects cache hit ratio. Claude Code packs project files into each turn’s context, and Anthropic’s prompt cache is region-local. A naive anycast router that routes turn N to us-east and turn N+1 to us-west misses cache on every turn and the session runs 4-5x more expensive. Routing has to be sticky.
-
Burst traffic looks like a DDoS. A team running Claude Code in CI/CD can burst from 5 RPS to 200 RPS in a minute when a merge gate triggers across 40 branches. The gateway’s circuit breaker and queueing under burst is what determines whether the next merge succeeds or CI has to be paused.
The default “best AI gateway for coding agents” listicle doesn’t score on any of this. For the rest of this post, “gateway” means an AI gateway between Claude Code clients and one or more model provider regions, configured via ANTHROPIC_BASE_URL.
The 7 routing axes we score on
| Axis | What it measures |
|---|---|
| 1. Deterministic policy expressiveness | Can the gateway encode routing as a deterministic function of request attributes (token count, model hint, region tag, repo, developer tier) without calling another LLM in the hot path? |
| 2. Per-region routing | Can the gateway route to a specific Anthropic region (us-east, us-west, eu-central, apac) and stick a session to a region for cache locality? |
| 3. Failover semantics | When Anthropic returns 5xx, does the gateway have a configurable fallback graph with deterministic ordering, or is it round-robin/random/best-effort? |
| 4. P99 latency overhead | What is the gateway’s own overhead at P99, measured under a Claude-Code-shaped workload (long context, streaming, tool calls)? |
| 5. Routing-decision observability | For any given turn, can you answer the question “why was this turn routed to claude-haiku-4-5 and not claude-opus-4-7” in the dashboard, without grepping logs? |
| 6. Cache-hit policy interaction | Does the routing policy preserve prompt-cache locality, or does anycast load-balancing destroy the hit ratio? |
| 7. Traffic-shape resilience | Under burst load (10x baseline RPS in under a minute), does the gateway shed gracefully or fall over? |
Verdict line at the end of each pick scores all seven.
How we picked
We started from the universe of public AI gateways advertising Anthropic-compatible routing with documented region pinning or fallback chains. We dropped two whose “fallback” was a single retry against the same upstream, and one that used Claude itself to classify route decisions on the hot path. The remaining five each have a different theory of how routing should work; the comparison below is honest about which theory holds up at 200 RPS.
1. Future AGI Agent Command Center: Best for deterministic routing with a self-improving offline policy
Verdict: Future AGI is the only gateway here that separates the hot-path routing decision (deterministic, sub-millisecond, no LLM in the loop) from an offline learner that updates the policy. The other four are static routers or hosted black boxes. Agent Command Center treats routing as a versioned artifact: hot path stays deterministic, policy improves between deploys.
What it does for routing Claude Code in production:
- Deterministic policy expressiveness. Routes are a typed policy, predicate-to-pool mappings evaluated in order, predicates as pure functions of request attributes (token count, model hint, region tag, developer tier, repo). A typical Claude Code policy routes ~62% of turns to claude-haiku-4-5 (under 8K input tokens), 31% to claude-sonnet-4-6 (8K-40K), and 7% to claude-opus-4-7 (over 40K or explicitly hinted). Every decision emits an audit span.
- Per-region routing. Pools support region pinning (
anthropic.us-east,anthropic.us-west,anthropic.eu-central). Sessions are sticky via consistent-hash on session ID, so cache locality survives across turns. - Failover semantics. Fallback graphs are explicit and per-pool. Default Claude Code shape: primary region → secondary region → Bedrock claude-sonnet-4-6 → fail-fast 503. Failover RTT P50 of 110ms (warm DNS), P99 of 320ms (cold TLS rehandshake). Every hop emits a failover span.
- P99 latency overhead. P50 of 8ms, P99 of 34ms on a Claude-Code-shaped workload (300 RPS, mixed turn sizes, SSE streaming). The Future AGI Protect model family runs inline at ~67 ms p50 text and ~109 ms p50 image (arXiv 2510.13351) when enabled. FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain.
- Routing-decision observability. Every turn produces a
fi.routespan: policy version, predicate fired, pool selected, failover hops. Group-by-pool and group-by-region are first-class slices. - Cache-hit policy interaction. Session-sticky routing preserves prompt-cache locality. Dashboard exposes cache hit ratio per pool so a policy change that destroys hits surfaces immediately.
- Traffic-shape resilience. Stateless Go data plane behind a token-bucket admission controller. Tested at 10x burst (50 RPS → 500 RPS over 45s); P99 climbed to 71ms during burst, recovered within 30s.
The loop. Every routing decision feeds an offline learner that scores it against fi.evals rubrics (did the route produce a good output? did haiku fail a turn that should have escalated to sonnet?). traceAI (35+ framework integrations, OpenInference-native) emits spans; Error Feed (FAGI’s “Sentry for AI agents”) sits alongside as the zero-config error monitor: auto-clusters related misroutings into named issues (50 traces → 1 issue), auto-writes the root cause plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue. fi.opt.optimizers proposes a policy update, typically a threshold adjustment, that you review, deploy, and version. The hot path never calls the learner.
Where it falls short:
-
agent-opt is opt-in, for one-week routing pilots focused on static fallback graphs, start with traceAI + ai-evaluation and turn the optimizer on once eval baselines stabilize.
-
The policy DSL is opinionated. Teams that want freeform Lua or JavaScript routing hooks (Kong-style) will find Future AGI’s predicate language more constrained, also why the hot path is fast.
-
Bedrock and Vertex fallback support is solid for sonnet and opus; claude-haiku-4-5 on those clouds is still GA-pending in some regions.
-
Free tier (100K traces / month) covers a developer’s daily workload but not production CI burst. Plan on Scale or Enterprise if production routing is the wedge.
Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II, BAA, and BYOC. AWS Marketplace listing available.
Score: 7/7 axes.
2. Portkey: Best for hosted gateway with mature config-driven fallback chains
Verdict: Portkey has the most polished hosted routing product in this category. The config language for fallback graphs is genuinely good, readable, versioned, reviewable. Trade-off: if your security team won’t allow Claude Code traffic through a SaaS data plane, this is the wrong pick.
What it does for routing Claude Code in production:
- Deterministic policy expressiveness. “Strategy” config supports
fallback,loadbalance, andconditionalmodes. Theconditionalstrategy encodes “ifmetadata.tokens_estimated > 40000then opus, else haiku”, deterministic, no LLM in the hot path. One rung below Future AGI’s typed DSL. - Per-region routing. Virtual keys pinned to a region by configuring the underlying Anthropic key with a regional endpoint. Session stickiness implicit through virtual key reuse.
- Failover semantics. Ordered targets with per-target retry counts and on-status conditions. Failover RTT P99 around 380ms in our test.
- P99 latency overhead. P50 of 14ms, P99 of 52ms from a us-east client. EU teams see higher P99 because the primary data plane is US-resident as of May 2026.
- Routing-decision observability. Dashboard names the strategy and target per request. Detail is one layer below a top-level slice, answerable but not first-class.
- Cache-hit policy interaction. Virtual-key stickiness preserves cache locality unless you load-balance across regions within a key;
loadbalancewill destroy hits if misconfigured. - Traffic-shape resilience. Tested at 10x burst with P99 climbing to ~110ms, recovering within 60 seconds.
Where it falls short:
- No offline learner.
- Hosted-only by default. BYOC is mature but requires procurement and a Kubernetes operator install.
conditionalgets verbose for deeply nested metadata predicates.- Pricing escalates above 5M requests/month faster than the lighter alternatives.
Pricing: Free tier with 10K requests/day. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II.
Score: 6/7 axes (missing: offline policy improvement).
3. Kong AI Gateway: Best for self-hosted enterprise routing with global anycast
Verdict: Kong AI Gateway is the pick when your platform team already runs Kong for REST APIs across multiple regions. Strengths: real per-region routing (because the data plane already runs in every region) and operator familiarity. Weaknesses: AI-specific primitives are newer, and routing observability is plugin-driven.
What it does for routing Claude Code in production:
- Deterministic policy expressiveness. Kong route rules plus the AI Proxy plugin. Match on headers, JSON body fields, and developer tier. Token-count predicate routing typically requires a pre-route Lua plugin, expressive but heavier to author.
- Per-region routing. Where Kong shines. The data plane already runs in every region your APIs run; routing Claude Code to the nearest Anthropic region is configuring the upstream pool with regional endpoints. Session stickiness through the consumer-affinity plugin.
- Failover semantics. Upstream pools with health checks and ordered targets, the pattern Kong has shipped for years. Failover RTT P99 measured at 290ms.
- P99 latency overhead. P50 of 11ms, P99 of 42ms (3 replicas, US-east). AI Proxy adds 3-6ms on the base proxy.
- Routing-decision observability. OTel plugin + your own sink. Default Kong dashboard names the upstream served; AI-specific detail is plugin-driven.
- Cache-hit policy interaction. Region-pinned upstreams preserve cache locality.
- Traffic-shape resilience. Tested at 10x burst with P99 climbing to ~95ms, recovering within 25 seconds.
Where it falls short:
- No offline learner.
- AI Proxy primitives are newer than the rest of Kong;
route_typewas still adding features through Q1 2026. - Observability story requires wiring multiple plugins. Expect two weeks of platform-team time.
- Out-of-the-box, failover graph is one level deep. Multi-hop failovers (region A → region B → Bedrock) require chained upstream definitions and careful health-check tuning.
Pricing: Kong is open source. Kong Konnect (managed) starts free. Enterprise plans for SLA, plugins, and support start around $1.5K/month.
Score: 5/7 axes (missing: native AI routing observability, offline learner).
4. LiteLLM: Best for Python-native self-hosted proxy with explicit failover lists
Verdict: LiteLLM is the pick when Claude Code traffic can’t leave the VPC and the security team wants to read every line of code that touches a prompt. Source-available, Python-native. Routing config is explicit and failover semantics are correct out of the box. Trade-off: it’s a Python process, tune Gunicorn workers and care about GIL behavior under burst.
What it does for routing Claude Code in production:
- Deterministic policy expressiveness. YAML
model_listentries plus routing strategies (simple-shuffle,least-busy,usage-based-routing,latency-based-routing). Token-count predicate routing typically goes through a Python pre-call hook. - Per-region routing. Per-model-list region pinning. Session stickiness through
user_idandusage-based-routing. Easy to misconfigure:simple-shuffledestroys cache hits,usage-based-routingpreserves them. - Failover semantics. Explicit fallback lists per model with
num_retriesandallowed_fails. Deterministic, well-documented. Failover RTT P99 measured at 340ms. - P99 latency overhead. P50 of 22ms, P99 of 78ms (4 Gunicorn workers, 2 CPU each, US-east). Higher because of Python serialization. If you need sub-50ms P99, LiteLLM is the wrong pick.
- Routing-decision observability. Dashboard shows the model served and fallback hops. Deeper detail requires inspecting request metadata or wiring a traceAI sink behind LiteLLM.
- Cache-hit policy interaction. Preserved with
usage-based-routing+ sticky user_id; broken with the wrong strategy. - Traffic-shape resilience. Python’s GIL is the limit. P99 climbed to ~145ms during a 10x burst, recovery 90 seconds.
Where it falls short:
- No offline learner.
- Python proxy means higher P99 and longer burst recovery than Go-based or edge-based alternatives.
- Dashboard is functional, not polished. Routing audit means SQL or an external sink.
- YAML config grows long fast, large fleets end up with hundreds of model-list entries managed via Helm.
Pricing: Open source under MIT. LiteLLM Enterprise tier with SLA + SSO + audit starts around $250/month for small teams.
Score: 5/7 axes (missing: offline learner, sub-50ms P99 ceiling).
5. Cloudflare AI Gateway: Best for edge-routed gateway with the lowest P99 overhead
Verdict: Cloudflare AI Gateway is the pick when latency is the dominant constraint and you accept a shallower routing policy in exchange for sub-15ms gateway overhead at the edge. Cheapest to run, fastest to deploy. For raw speed and lightweight observability, genuinely good. For a typed policy DSL or offline learner, look elsewhere.
What it does for routing Claude Code in production:
- Deterministic policy expressiveness. Rules in the Cloudflare dashboard plus Workers AI binding. Predicate routing on token count or developer tier requires writing a Worker in front; the gateway is closer to a smart proxy with caching and rate limits than a full policy engine. Sufficient for “primary plus one fallback;” not enough for a five-tier predicate ladder without a Worker.
- Per-region routing. Anycast routes to the nearest PoP automatically. Downside for cache locality: the same session may hit different PoPs across turns unless you pin the upstream Anthropic endpoint.
- Failover semantics. “Fallback providers” config, ordered list on configurable error codes. Failover RTT P99 of 220ms, the fastest of the five because the edge has the secondary’s connection state cached. One level deep without a Worker.
- P99 latency overhead. P50 of 4ms, P99 of 13ms at the edge against a US-east client. Lowest by a wide margin.
- Routing-decision observability. Lightweight. Dashboard shows request, upstream, fallback. Deeper detail requires Worker logging or an external sink.
- Cache-hit policy interaction. Cloudflare’s response cache is excellent for fully cacheable requests (rare for Claude Code). Anthropic prompt-cache locality requires explicit region pinning; default anycast destroys hits.
- Traffic-shape resilience. Tested at 10x burst with P99 climbing to 28ms, recovering within 10 seconds.
Where it falls short:
- No offline learner.
- Routing policy is shallow without a Worker on top.
- Region pinning required for cache locality; default anycast destroys hits.
- Claude Code integration documented mostly through community examples as of May 2026; first-party docs lean on OpenAI-compatible mode.
- Routing-decision observability is hand-rolled.
Pricing: Free tier with 100K requests / month. Paid tiers integrated into Workers pricing, typically $5/month minimum plus per-request fees that scale linearly.
Score: 4.5/7 axes (missing: deep policy expressiveness, offline learner, native routing observability).
Capability matrix
| Axis | Future AGI | Portkey | Kong AI Gateway | LiteLLM | Cloudflare AI Gateway |
|---|---|---|---|---|---|
| Deterministic policy DSL | Typed predicates | YAML conditional | Lua + plugin | YAML + Python hook | Worker-authored |
| Per-region routing | Native, sticky | Per virtual key | Native multi-region | Per model-list entry | Anycast (needs pinning) |
| Failover semantics | Multi-hop graph | Ordered list | Health-check pool | Explicit list | One-level + Worker |
| P99 latency overhead | 34ms | 52ms | 42ms | 78ms | 13ms |
| Routing-decision span | First-class | Detail view | OTel plugin | Metadata only | Worker logs |
| Cache-locality safe | Sticky default | Sticky per key | Region-pinned | Strategy-dependent | Needs pinning |
| Burst resilience | 71ms P99 at 10x | 110ms P99 at 10x | 95ms P99 at 10x | 145ms P99 at 10x | 28ms P99 at 10x |
| Offline policy learner | fi.opt | None | None | None | None |
Decision framework: Choose X if
Choose Future AGI if you want the routing policy to keep improving without an SRE babysitting it. Deterministic hot path, offline learner, policy updates landing in change management. Strongest when Claude Code is the dominant production workload.
Choose Portkey if you want a hosted gateway with a polished config-driven fallback graph and your security posture allows a SaaS data plane.
Choose Kong AI Gateway if you already operate Kong for REST APIs across regions and the right answer is to extend the existing stack.
Choose LiteLLM if Claude Code traffic can’t leave the VPC and source-availability beats P99 latency. Tune Gunicorn before you tune the policy.
Choose Cloudflare AI Gateway if latency is the dominant constraint and the policy is simple (primary + one fallback), or you’re happy to author it in a Worker.
Common mistakes when wiring Claude Code routing in production
| Mistake | What goes wrong | Fix |
|---|---|---|
| Using an LLM-judge on the hot path to classify route decisions | Adds 400-900ms to P99 latency and creates a circular failure mode when the same provider is rate-limited | Move classification offline; ship deterministic predicates derived from the offline analysis |
| One global pool with no region pinning | Anycast load-balancer destroys Anthropic prompt-cache hits and the bill triples | Pin sessions to a region; consistent-hash by session ID |
| Fallback configured as retry against the same upstream | A regional incident becomes an amplified incident as retries pile up | Configure ordered fallback graphs across regions; cap per-target retries at 1 |
| Tagging routing decisions with no policy version | ”Why was this routed here last Tuesday” becomes unanswerable after the policy changes | Emit policy-version on every routing-decision span; treat the policy as a versioned artifact |
| Setting a circuit breaker without testing under burst | Burst from CI triggers an avalanche of opens and the developer-facing fleet stalls | Synthetic-test the gateway at 10x baseline RPS before production rollout |
| Buffering streaming through a custom Worker on the edge | Claude Code’s progress UI freezes mid-turn and the developer experience regresses | Pass SSE through unbuffered; test from the actual CLI, not from curl |
| Sharing one virtual key across many developers in routing | Per-developer routing rules become uninforceable and observability collapses | Issue per-developer virtual keys that fan out to one underlying account |
How Future AGI closes the loop on routing Claude Code in production
The other four gateways treat routing as a static config: write the policy, deploy it, leave it. Future AGI treats routing as a versioned artifact whose hot-path execution is deterministic but whose policy gets updated by an offline learner between deploys. Six stages:
- Trace. Every Claude Code turn produces a span tree via
traceAI(Apache 2.0). The routing decision is a first-class span: policy version, predicate matched, pool selected, region targeted, fallback hops, final upstream, prompt-cache hit ratio. - Evaluate.
ai-evaluation(Apache 2.0) scores every turn. FAGI ships a 50+ built-in rubric catalog (task completion, faithfulness, code-correctness, tool-use accuracy, structured-output, hallucination, groundedness, instruction-following, agentic surfaces), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family that runs continuous high-volume per-route scoring at very low cost-per-token (Galileo Luna-2 cost economics, rubric-flexible). Scores join routing-decision data so the loop can answer “did the route produce a good output?” Catalog is the floor, not the ceiling. - Cluster. Misroutings cluster by failure mode. Patterns surface: “turns over 12K input tokens routed to haiku regress on faithfulness,” or “turns with tool-call density above 4 regress on opus at us-west.”
- Optimize.
fi.opt.optimizers(ProTeGi, Bayesian, GEPA) propose a policy update, a threshold adjustment or new predicate. The proposal is a diff against the current policy, reviewable like any code change. - Route. Hot path stays deterministic. The approved policy version becomes the next deploy; predicates evaluate in sub-millisecond time, no LLM in the loop.
- Re-deploy and observe. New policy is versioned, deployed, observable. If the score regresses, automatic rollback.
Net effect: a team starting with a hand-written policy typically sees misroutings drop 35-55% within six weeks, with no change in P99 because the learner runs offline. The hot path stays deterministic, that’s what makes this safe to run on call.
The three building blocks are open source under Apache 2.0, traceAI, ai-evaluation, agent-opt (github.com/future-agi). The hosted Agent Command Center adds the failure-cluster view, live Protect guardrails (~67ms text-only, per arXiv 2510.13351), RBAC, SOC 2 Type II, and AWS Marketplace for procurement.
What we did not include
Three gateways that show up in other 2026 listicles but didn’t fit here:
- OpenRouter. Fantastic for model exploration but routing primitives are consumer-facing; per-region pinning and SRE-grade failover graphs aren’t first-class.
- Helicone. Strong observability but routing is basic (round-robin + simple failover). Right pick for monitoring-first workloads (see our token-monitoring post).
- TrueFoundry. Solid MLOps gateway but the routing-policy DSL was still maturing through Q1 2026; revisit in Q3.
Related reading
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- What Is an AI Gateway? The 2026 Definition
- Best LLM Gateways in 2026
- Best AI Gateways for Agentic AI in 2026
Sources
- Anthropic Claude Code documentation, claude.ai/docs/claude-code
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Portkey AI gateway, portkey.ai
- Kong AI Gateway, konghq.com/products/kong-ai-gateway
- LiteLLM proxy, github.com/BerriAI/litellm
- Cloudflare AI Gateway, developers.cloudflare.com/ai-gateway
Frequently asked questions
What is the lowest-latency AI gateway for Claude Code in production?
Can the gateway make routing decisions by calling another LLM in the hot path?
How do I keep Anthropic prompt-cache hits when routing across regions?
What is a reasonable failover RTT P99 budget?
How do I make the gateway observability story SRE-grade?
Should we route Claude Code turns to non-Claude models as a fallback?
How is Future AGI Agent Command Center different from Portkey for routing?
Five AI gateways scored on caching Claude Code calls in 2026: cross-developer cache scope, semantic-match thresholds, hit-rate observability, TTL controls, and what each one misses.
Five tools for Claude Code cost management in 2026 — four gateways plus the native Anthropic dashboard and a FinOps platform — scored on attribution, chargeback, caps, routing, cache observability, FinOps integration, and audit trail.
Five AI gateways scored on Claude Code token monitoring in 2026: per-developer attribution, per-repo budgets, session traces, alert routing, and what each gateway misses.