Best AI Gateway for Claude Code Cost Management in 2026
Five AI gateways scored on Claude Code cost management in 2026: budget caps with auto-pause, alert routing, route-by-budget, per-team auto-cutoff, and post-budget fallback to cheaper models.
Table of Contents
It’s the 23rd of the month. Slack pings: “Why did the platform squad already burn 140% of their Claude Code budget?” You open the Anthropic console. Aggregate spend, no per-squad split, no way to pause one team without paging the whole org. You draft an email that begins, “Effective immediately, please be mindful of your Claude Code usage.” Everyone ignores it. Next month, same story.
Monitoring tells you where the spend went. Claude Code cost management is the layer that prevents the breach in the first place: budget caps that auto-pause at threshold, alerts routed to the manager who can act, requests automatically downgraded to a cheaper model when a team is over budget, and the policy that makes “soft brake at 80% / hard brake at 110%” enforceable without a human in the loop. Not every gateway on the market actually ships that.
This is the 2026 cohort, scored on the management-control axes a Director, EM, or platform lead needs when Claude Code spend has stopped being a curiosity and started being a board-deck line item.
TL;DR
Future AGI Agent Command Center is the strongest pick for an AI gateway for Claude Code cost management because it ships per-developer virtual keys with soft (80%) and hard (110%) budget thresholds, an automatic Opus → Sonnet → Haiku downgrade cascade on breach, dry-run policy testing against 30 days of traffic, and Bedrock alongside Anthropic both reachable behind one OpenAI-compatible base URL. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Three-stage budget thresholds, per-cost-center alert routing, and the downgrade cascade that keeps Claude Code working until the wallet is fully exhausted.
- Portkey — Best for cleanest virtual-key budgets and Slack/Teams alerting out of the box. Mature managed dashboard (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- Helicone — Best for teams under 10 developers who want minimal infra. Lightweight per-key rate limiting (treat as planned migration after the March 3, 2026 Mintlify acquisition).
- LiteLLM — Best when Claude Code traffic cannot leave the VPC. Self-hosted Python-native proxy with team-level budgets and webhooks; pin commits after the March 24, 2026 PyPI compromise.
- Kong AI Gateway — Best when the platform team already runs Kong. The AI plugin extends existing governance to the LLM route.
Why Claude Code cost management is different from cost tracking
Tracking tells finance what happened. Management changes what happens next. Three properties make Claude Code hard for the management half of that pair:
-
Cost concentrates in long sessions, not RPS. A single Opus-heavy pairing session at 180K-token context can run $30 to $80. A requests-per-minute rate limit won’t catch it; the session was three requests. The right cap is dollars-per-developer-per-day with a hard cutover when the wallet hits zero.
-
Auto-pause mid-session breaks developer flow. A cap that kills Claude Code in the middle of a function rewrite is worse than overspending by 5%. Good cost management triggers a soft alert at 80% (developer self-throttles), a downgrade at 95% (turns route to Sonnet or Haiku transparently), and a hard pause at 110% (gateway returns a 429 with a “your manager has been alerted” body). Anything cruder destroys trust.
-
The cheap-model substitute is a real lever. Roughly half of a typical Claude Code session is short, well-bounded turns: file renames, simple completions, lint fixes. Those run fine on
claude-haiku-4-5at roughly 1/12th the per-token cost ofclaude-opus-4-7. A budget-aware router that downgrades easy turns when the team is over budget recovers spend without anyone noticing. A router that ignores budget keeps firing Opus at the wall.
Tracking is a SELECT SUM(cost) FROM requests GROUP BY developer query. Management is a feedback loop with policy actuators and an opinion about which model handles which turn under which budget condition.
For the rest of this post, “gateway” means an AI gateway that speaks the Anthropic API natively and can sit in front of Claude Code via ANTHROPIC_BASE_URL. All five picks meet that floor.
The 7 axes we score on for Claude Code cost management
The generic “best AI gateway” axis set (provider breadth, routing, observability, security) is the wrong shape for this question. We scored each pick on seven axes specifically about management actions, not metrics.
| Axis | What it measures |
|---|---|
| 1. Budget caps with auto-pause | Can the gateway enforce a hard dollar cap per developer / team / day, and pause Claude Code when breached? |
| 2. Alert routing | When a budget is at 80% / 95% / 110%, can the gateway page the right person — the EM, not a #general firehose? |
| 3. Route-by-budget | Can the gateway downgrade easy turns to a cheaper model automatically when a team is over budget, without a human edit? |
| 4. Per-team auto-cutoff | Can one squad’s wallet be exhausted without affecting another squad? |
| 5. Post-budget fallback model | Is there a configured cheaper Claude variant the gateway falls back to (Haiku) instead of just hard-failing? |
| 6. Pre-budget policy testing | Can the policy be dry-run against last month’s traffic before it goes live, so the EM knows what would have paused? |
| 7. Self-improving budget policy | Does the gateway learn which turns can be safely routed cheap, or does the EM have to manually re-tune thresholds every month? |
Verdict line at the end of each pick scores all seven.
How we picked
We started from public AI gateways that advertise an Anthropic-compatible endpoint as of May 2026. Three cuts for this post: gateways that only do RPM rate limiting (no dollar-denominated caps), gateways without per-key budget enforcement (alert-only), and gateways whose auto-pause doesn’t return a structured error Claude Code can render gracefully. The five below survived.
1. Future AGI Agent Command Center: Best for per-developer Claude Code budget caps with auto-downgrade routing
Verdict: Future AGI ships three-stage budget thresholds (soft alert at 80%, automatic downgrade at 95%, hard pause at 110%), per-cost-center alert routing that pages the right EM rather than the #general firehose, an Opus → Sonnet → Haiku cascade that keeps Claude Code working until the wallet is fully exhausted, dry-run policy testing against 30 days of traffic before going live, and Bedrock alongside Anthropic both reachable behind one OpenAI-compatible base URL.
What it does for Claude Code cost management:
- Budget caps with auto-pause at three thresholds, soft alert at 80%, automatic downgrade at 95%, hard pause with structured 429 at 110%. Configurable; the three-stage default is what teams ship with.
- Alert routing through
fi.alertswith per-cost-center rules. Platform squad’s overage pages the platform EM in DM; data squad’s overage pages the data EM. The #general firehose pattern is opt-in, not default. - Route-by-budget through the budget-aware routing policy. When a team’s wallet drops below a configured threshold (e.g., $200 remaining for the day), turns under 10K input tokens auto-route to
claude-haiku-4-5instead ofclaude-opus-4-7. Hard turns continue on Opus; easy turns get cheaper. - Per-team auto-cutoff through the cost-center attribution layer. Each team has an independent wallet; one squad burning out doesn’t throttle another.
- Post-budget fallback model with a cascade: Opus → Sonnet → Haiku. Once a team blows the Opus budget, the gateway falls back to Sonnet until midnight; once Sonnet also blows, Haiku takes over. Claude Code never sees a hard “no” until the cascade exhausts.
- Pre-budget policy testing through dry-run mode. The EM runs the proposed cap against the previous 30 days of traffic and sees exactly which sessions would have paused. Adjustments happen before the policy goes live.
- Self-improving budget policy through
fi.opt.optimizers. After a month of running, the optimizer reads the eval scores of downgraded turns and tightens or loosens the threshold per team. The manual re-tune meeting becomes a quarterly review.
The loop, applied to cost management. Every turn produces a span via traceAI (Apache 2.0) capturing model, tokens, cost, and eval score. When a budget breaches, the breach event is co-located with the trace history. fi.opt ingests the breach + trace + eval scores and outputs a routing-policy diff: “for turns under 10K input tokens in this cost center, route to claude-haiku-4-5, code-correctness regression is 0.3%, under the 2% tolerance.” The diff ships on the next deploy. Protect guardrails enforce policy at ~67ms text, 109ms image (per arXiv 2510.13351), so safety adds no perceptible latency tax.
Where it falls short:
- The three-stage threshold (80 / 95 / 110) is opinionated by design, fewer knobs to misconfigure, faster setup for the typical team. Teams that prefer a single hard cap with no soft brake can configure it; the three-stage default is the recommended setup, not a forced default.
agent-optis opt-in and learns from live traces. Start withtraceAI+ai-evaluationon day one for the budget caps and per-team alerting; the “learn which turns to downgrade” loop comes online once a traffic baseline exists (typically a month). The optimizer gets stronger as production traffic accumulates, that’s the design, not a setup tax.
Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom, with SOC 2 Type II, HIPAA, GDPR, and CCPA all certified, BAA available, and an AWS Marketplace listing for procurement-friendly commits.
Score: 7/7 axes.
2. Portkey: Best for hosted Claude Code cost management with Slack-native alerting
Verdict: Portkey is the most polished hosted-only product for the enforce-budgets-and-alert slice of Claude Code cost management. The virtual-key budgets are the cleanest in the cohort, Slack and Teams hooks are first-class, and the dashboard is the one EMs actually log into. It doesn’t close the loop on the routing policy; it enforces what you write and tells you when traffic hits a cap.
What it does for Claude Code cost management:
- Budget caps with auto-pause per virtual key. Hard cap disables the key when limit hits; soft alerts fire at configurable thresholds. Pause behaviour is a 429 Claude Code renders as a budget-exceeded message.
- Alert routing through native Slack and Teams integrations. Per-key alerts target specific channels or DM users, the platform-EM-versus-#general pattern is first-class.
- Route-by-budget is partial. Portkey supports conditional routing on metadata, but “when this team is over budget, downgrade Opus to Sonnet” requires wiring custom conditions. Not turnkey.
- Per-team auto-cutoff through one virtual key per team. Each key has its own budget; one team’s exhaustion doesn’t throttle another.
- Post-budget fallback model through fallback policies. Configure Opus → Sonnet → Haiku; Portkey routes to the next on rate-limit or quota errors. Pairs well if you wire the budget as a quota-error trigger.
- Pre-budget policy testing is limited. No built-in dry-run; teams clone a virtual key into staging and replay traffic.
- Self-improving budget policy isn’t present.
Where it falls short:
- No optimizer. This month’s breach pattern doesn’t influence next month’s policy unless an EM manually re-tunes.
- Route-by-budget is wireable but not turnkey. Expect a sprint of metadata-conditional rule work if you want budget-aware routing.
- Pricing escalates above 5M requests/month faster than the lighter alternatives. For a 100-developer team, that ceiling matters.
Pricing: Free tier with 10K requests/day. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II.
Score: 5.5/7 axes (missing: turnkey route-by-budget, self-improving policy).
3. Helicone: Best for lightweight Claude Code cost management on small teams
Verdict: Helicone is the right pick when the management story is “10 developers, $5K/month, I want a daily cap per developer and a Slack ping when it trips.” Drop in the proxy, set the rate-limit policies, and the management overhead matches the budget. Once the team grows past 20 developers, the cracks in the policy expressiveness show.
What it does for Claude Code cost management:
- Budget caps with auto-pause through rate-limit policies. Helicone is RPM-first; dollar caps are achievable through usage alerts plus a webhook that flips the key off. Works; less clean than Portkey’s virtual-key budgets.
- Alert routing through usage alerts. Slack hook is the standard destination; per-EM routing requires per-policy channel configuration.
- Route-by-budget isn’t present out of the box. Routing is round-robin / failover; budget-aware logic has to be coded upstream.
- Per-team auto-cutoff through per-key policies. Each key gets its own quota.
- Post-budget fallback model through failover routing. Configure a cheaper model as the failover target; quota errors trigger fallback.
- Pre-budget policy testing isn’t present. Authoring is edit-and-deploy.
- Self-improving budget policy isn’t present.
Where it falls short:
- No optimizer.
- Policy expressiveness is below Portkey and Future AGI. The three-stage pattern is hand-wired through webhooks, not first-class.
- Self-host beyond a few hundred RPS gets operational. Smaller teams are the sweet spot.
Pricing: Free tier with 10K requests/month. Pro tier starts at $25/month. Enterprise is custom.
Score: 4/7 axes (missing: route-by-budget, pre-budget testing, self-improving policy).
4. LiteLLM: Best for self-hosted Claude Code cost management inside the VPC
Verdict: LiteLLM is the pick when Claude Code traffic can’t leave the VPC and you need real per-team budgets enforced by a proxy whose source code you can read line-by-line. The budget primitives are real, team budgets, user budgets, virtual keys, webhook alerts. The polish is below the hosted alternatives; the UI is functional and the route-by-budget story is something you wire, not something you toggle.
What it does for Claude Code cost management:
- Budget caps with auto-pause through team and user budgets. Hard cap returns 429 once limit hits. Configurable per team, user, or virtual key.
- Alert routing through webhook hooks. Wire your incident tool (PagerDuty, Opsgenie, Slack) yourself. LiteLLM gives the trigger; you bring the destination.
- Route-by-budget is wireable.
pre_call_checkhooks let you inject budget-aware logic before the upstream call. Teams write a 20-line Python hook that checks remaining budget and rewrites the model to Haiku if under threshold. - Per-team auto-cutoff through team_id-scoped budgets.
- Post-budget fallback model through the fallback list. Rate-limit / quota errors trigger the next model.
- Pre-budget policy testing isn’t a feature. The pattern is a staging proxy with replayed traffic.
- Self-improving budget policy isn’t present.
Where it falls short:
- No optimizer.
- Observability is thinner than Portkey or Future AGI. Most enterprises pair LiteLLM with traceAI or an OTel sink for budget-breach forensics.
- Day-one setup is heavier than hosted alternatives. Plan a platform-team sprint for LiteLLM + webhook destinations + the dashboard finance will look at.
Pricing: Open source under MIT. LiteLLM Enterprise starts around $250/month for small teams; custom above 100 seats.
Score: 5/7 axes (missing: pre-budget testing, self-improving policy).
5. Kong AI Gateway: Best for Claude Code cost management on top of an existing Kong estate
Verdict: Kong AI Gateway is the pick when the platform team already runs Kong for the company’s REST APIs and the path of least resistance is to extend that stack with AI policies. SLA and ops familiarity are the wins. The budget enforcement is real through rate-limiting and consumer policies; the dollar-denominated cost-management UX is plugin-driven, not native.
What it does for Claude Code cost management:
- Budget caps with auto-pause through Kong’s rate-limiting plugins and the AI Proxy spend-tracking plugin (3.6+). Dollar caps require wiring the spend plugin to a state store; out of the box Kong is RPM-first.
- Alert routing through Kong’s event hooks. PagerDuty, Slack, Teams integrations are mature on the REST-API side; the AI extension uses the same hooks.
- Route-by-budget is plugin-driven. AI Proxy supports model rewriting on response headers or upstream errors; budget-aware routing needs a custom plugin or conditional rule with a state store.
- Per-team auto-cutoff through Kong consumers and consumer-scoped policies. Each team is a consumer.
- Post-budget fallback model through the AI Proxy fallback list. Configure Opus → Sonnet → Haiku; quota errors trigger fallback.
- Pre-budget policy testing isn’t first-class. Pattern is a Kong staging environment with replayed traffic.
- Self-improving budget policy isn’t present.
Where it falls short:
- AI-specific cost management is plugin-driven. Default dashboard shows API-gateway metrics; the LLM-cost view is what your platform team builds on top of OTel + Grafana.
- No optimizer.
- Out-of-the-box management requires a platform-team sprint. If Kong already has a 30-person team, the labor is free; if Kong is a side project, the labor is hidden.
Pricing: Kong is open source. Kong Konnect (managed) starts free. Enterprise plans start around $1,500/month.
Score: 4.5/7 axes (missing: native dollar-budget UX, pre-budget testing, self-improving policy).
Capability matrix
| Axis | Future AGI | Portkey | Helicone | LiteLLM | Kong AI Gateway |
|---|---|---|---|---|---|
| Budget caps with auto-pause | Native 3-stage (80/95/110) | Per-key hard cap | RPM + webhook | Team + user budgets | RPM + spend plugin |
| Alert routing | Per-cost-center routing | Slack/Teams native | Webhook | Webhook (BYO destination) | Event hooks |
| Route-by-budget | Native budget-aware policy | Wireable conditional | Not present | Wireable via hooks | Plugin-driven |
| Per-team auto-cutoff | Native cost-center scope | One virtual key per team | Per-key | Team_id-scoped | Consumer-scoped |
| Post-budget fallback model | Opus → Sonnet → Haiku cascade | Fallback policies | Failover | Fallback list | AI Proxy fallback |
| Pre-budget policy testing | Dry-run vs. 30-day traffic | Manual staging | Manual staging | Manual staging | Manual staging |
| Self-improving budget policy | fi.opt auto-tunes thresholds | Not present | Not present | Not present | Not present |
Decision framework: Choose X if
Choose Future AGI Agent Command Center if budget breaches are a recurring monthly conversation and the goal is to make next month’s policy automatically tighter. The dry-run plus self-improving loop turn cost management from a reactive habit into a quarterly review.
Choose Portkey if you want hosted polish, the virtual-key budgets are enough for your story, and you can wire the route-by-budget rules yourself. Pick when “monthly EM time” is the constraint, not “yearly optimizer ROI.”
Choose Helicone if the team is under 10 developers on Claude Code, the story is “daily cap + Slack ping,” and policy gets re-tuned manually in standup.
Choose LiteLLM if compliance forbids Claude Code traffic leaving the VPC and the platform team will write the route-by-budget hook. Pick when source-availability is non-negotiable.
Choose Kong AI Gateway if the platform team already operates Kong for REST APIs and extending the stack is cheaper than introducing a new product. The cost-management view is yours to build; SLA and ops familiarity carry the value.
Common mistakes when wiring Claude Code cost management
| Mistake | What goes wrong | Fix |
|---|---|---|
| Setting a single hard cap at 100% | Claude Code pauses mid-conversation; developer flow shatters | Three-stage cap: alert at 80%, downgrade at 95%, hard pause at 110% |
| Routing alerts to a shared channel | Notifications get muted; the next breach goes unnoticed for two weeks | Route per-cost-center alerts to the EM responsible, with a fallback to a managed channel |
| Treating budget cap as a static rule | Engineers route around the cap (paste into Claude.ai web), spend moves off-ledger | Pair the cap with a route-by-budget downgrade so Claude Code keeps working at lower quality, removing the incentive to bypass |
Not preserving anthropic-version header through the gateway | Tool-use behavior silently differs between Claude Code’s expected version and the gateway’s default; budget alerts misclassify the model | Pin the version explicitly in the gateway forwarding rule |
| Tagging by developer email instead of cost center | Per-team auto-cutoff collapses to per-developer auto-cutoff; one heavy engineer pauses themselves while a quiet teammate has budget left over | Tag by cost center as primary; developer as secondary attribute |
| Skipping the dry-run before shipping a new cap | First production day pauses three sessions the EM did not predict; trust in the gateway drops | Run the proposed cap against the last 30 days of traffic before going live; iterate offline |
| Hardcoding the cheap-model substitute | The fallback model gets deprecated; the gateway starts returning errors at 95% threshold | Wire the cascade as a list (Opus → Sonnet → Haiku), and update the list quarterly |
| Treating the cap as final policy | Engineers complain; cap gets raised once, then again, then again — the policy is dead | Treat the cap as a learning loop: every breach is data; tune monthly (manually) or quarterly (with an optimizer) |
How Future AGI closes the loop on Claude Code cost management
The other four gateways treat cost management as a static policy: write the cap, enforce the cap, re-tune the cap when reality drifts. Future AGI treats the cap as the input to a self-improving policy. Six stages, each producing evidence the EM and the optimizer can both consume:
-
Trace. Every Claude Code turn produces a span via
traceAI(Apache 2.0) capturing inputs, outputs, tool calls, model used, session ID, cost-center attribution, and budget state at request time. Spans are immutable. -
Evaluate.
ai-evaluation(Apache 2.0) scores every turn. FAGI ships a 50+ built-in rubric catalog (code-correctness, faithfulness, tool-use accuracy, task completion, structured-output, hallucination, groundedness, instruction-following, agentic surfaces), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family that runs continuous high-volume per-turn scoring at very low cost-per-token (Galileo Luna-2 cost economics, rubric-flexible). The score is co-located with the cost record, the gateway already knows which turns were “Opus called when Sonnet would have been enough.” Catalog is the floor, not the ceiling. -
Cluster. Low-scoring sessions cluster by failure mode. The cost-management cluster is “high-cost, low-difficulty turns”, the routine work that ate the budget.
-
Optimize.
fi.opt.optimizers(ProTeGi, Bayesian, GEPA) reads breach + trace + eval history and outputs a policy diff: “for cost centerplatform-squad, route turns under 10K input tokens toclaude-haiku-4-5between 9am and 5pm, code-correctness regression is 0.3%, under the 2% tolerance. Estimated monthly saving: $4,200.” The math is shown. -
Route. The gateway applies the updated policy on the next request, logged with the policy version. The squad’s budget goes from “we breach by 40%” to “we run at 87% with headroom.”
-
Re-deploy. New rule is versioned and signed. Roll forward; if scores regress, automatic rollback. Each deploy is logged as an audit event tied to the approver’s IdP claim.
Net effect: a team starting at $40,000/month with monthly breaches settles into a steady-state where the cap holds, the loop tunes downgrade thresholds quarterly, and the EM stops getting paged at 9pm on the 23rd. The cap stops being the constraint and becomes the input to a policy that tightens itself.
The three building blocks are open source:
traceAI, github.com/future-agi/traceAI (Apache 2.0)ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
Agent Command Center adds the budget-policy UI, dry-run mode, per-cost-center alert routing, Protect guardrails (~67ms text, 109ms image, per arXiv 2510.13351), RBAC with EM-versus-finance role separation, SOC 2 Type II certified, alongside HIPAA, GDPR, CCPA, and an AWS Marketplace listing.
What we did not include
We deliberately left out three gateways that show up in adjacent 2026 listicles:
- OpenRouter. Excellent for model exploration, but the routing is consumer-facing. The Claude Code cost-management story (per-team caps, EM alerts, dry-run) isn’t the product’s shape.
- Cloudflare AI Gateway. Strong primitives but the budget-cap UX is rate-limit-first; dollar-denominated caps with auto-downgrade still require worker code as of May 2026.
- TrueFoundry. Solid ML-platform gateway with tenant-level cost centers, but the budget-actuator story (auto-pause, route-by-budget) is less mature than the chargeback story. Better as a tracking pick than a management pick.
All three are worth a second look in Q3 2026.
Related reading
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- Top 5 Enterprise AI Gateways to Track Claude Code Costs in 2026
- What Is an AI Gateway? The 2026 Definition
- Best LLM Gateways in 2026
- Best AI Gateways for Agentic AI in 2026
Sources
- Anthropic Claude Code documentation, claude.ai/docs/claude-code
- Anthropic Usage API, docs.anthropic.com/en/api/usage
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI traceAI, github.com/future-agi/traceAI
- Future AGI ai-evaluation, github.com/future-agi/ai-evaluation
- Future AGI agent-opt, github.com/future-agi/agent-opt
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Portkey AI gateway, portkey.ai
- Helicone proxy, helicone.ai
- LiteLLM proxy, github.com/BerriAI/litellm
- Kong AI Gateway, konghq.com/products/kong-ai-gateway
Frequently asked questions
What is Claude Code cost management and how is it different from cost tracking?
How do I set a budget cap on Claude Code without breaking developer flow?
Can the gateway automatically route Claude Code to a cheaper model when a team is over budget?
How do I alert the right manager when a team breaches their Claude Code budget?
What happens if I set the Claude Code budget cap too low?
How does the budget-aware optimizer in Future AGI know which turns are safe to downgrade?
Is it safe to send Claude Code source code through an AI gateway for cost management?
How is Future AGI Agent Command Center different from Portkey for Claude Code cost management?
Five AI gateways scored on caching Claude Code calls in 2026: cross-developer cache scope, semantic-match thresholds, hit-rate observability, TTL controls, and what each one misses.
Five tools for Claude Code cost management in 2026 — four gateways plus the native Anthropic dashboard and a FinOps platform — scored on attribution, chargeback, caps, routing, cache observability, FinOps integration, and audit trail.
Five AI gateways scored on Claude Code token monitoring in 2026: per-developer attribution, per-repo budgets, session traces, alert routing, and what each gateway misses.