Guides

Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026

Five AI gateways scored on Claude Code token monitoring in 2026: per-developer attribution, per-repo budgets, session traces, alert routing, and what each gateway misses.

·
16 min read
ai-gateway 2026 claude-code
Editorial cover image for Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
Table of Contents

A team of 30 engineers on Claude Code can burn through $40,000 of Anthropic tokens in a month and have nothing to show finance except the aggregate invoice. The CLI doesn’t natively split usage by developer, by repository, or by feature branch. By the time the bill arrives, the audit trail is gone.

An AI gateway in front of Claude Code fixes this. It intercepts the Anthropic API calls, attaches per-developer and per-repo metadata, and produces the chargeback table finance is asking for. The five gateways in this post all do that. They don’t all do it the same way, and only one of them turns the same trace into a feedback loop that reduces tokens over time.

This is the 2026 cohort, scored on the seven monitoring axes that matter when Claude Code is the workload.


TL;DR

Future AGI Agent Command Center is the strongest pick for monitoring Claude Code token usage because it ships per-developer virtual keys, per-repo span attributes, session-level cost aggregation rolled up across multi-turn Claude Code conversations, OpenTelemetry-native finance exports, and Bedrock alongside Anthropic both behind one OpenAI-compatible base URL. The other four picks below win on specific edges.

  1. Future AGI Agent Command Center — Best overall. Per-developer + per-repo attribution, session-level aggregation, OTel-native finance exports, and provider-mixed routing under one base URL.
  2. Portkey — Best for the hosted product with mature RBAC and the deepest prompt library. Fastest setup with mature controls (verify the Palo Alto Networks acquisition timeline before signing multi-year).
  3. Helicone — Best for the simplest drop-in proxy when you don’t need routing intelligence. Lightweight per-request observability (treat as planned migration after the March 3, 2026 Mintlify acquisition).
  4. LiteLLM — Best when Claude Code traffic cannot leave your VPC. Self-hosted, source-available Python-native routing; pin commits after the March 24, 2026 PyPI compromise.
  5. Kong AI Gateway — Best when you already run Kong for REST. The AI extension fits the existing operational story.

Why Claude Code needs a gateway in front of it

Claude Code is an interactive coding agent. Each invocation opens a session that may span hundreds of turns, each one issuing one or more calls to the Anthropic API. Three properties make the workload hard to monitor without help:

  1. Sessions are long. A single bug-fix conversation can produce 30 to 50 turns, each turn with input context around 30K to 200K tokens (Claude Code aggressively packs project files into context). Per-call cost telemetry doesn’t tell you what a session cost.

  2. Cost is concentrated in a few power users. In our own usage data across 22 engineering teams in Q1 2026, the top 10% of Claude Code users accounted for 51% of token spend. Average is a useless number.

  3. The Anthropic dashboard groups by API key. If every developer shares one team key, every developer looks the same to the dashboard. If each developer has their own key, the team loses bulk-discount pricing and the procurement story gets worse.

A gateway sits between the Claude Code client and api.anthropic.com. It intercepts each call, applies a metadata header (developer ID, repo, feature flag), and forwards the request. The metadata makes the per-call cost attributable. The interception point makes caps, alerts, and routing possible.

For the rest of this post, “gateway” means an AI gateway that speaks the Anthropic API (or wraps it via OpenAI-compatible mode). All five picks support pointing Claude Code at them via the ANTHROPIC_BASE_URL environment variable.


The 7 axes we score on

The default “best AI gateway” axes (provider breadth, routing, fallback, observability, cost, security, deployment) are too generic for Claude Code. We scored each pick on seven axes that specifically affect coding-agent monitoring.

AxisWhat it measures
1. Per-session cost attributionCan the gateway group cost by Claude Code session ID, not just by API key?
2. Per-developer chargebackCan it tag and aggregate by developer email or SSO claim?
3. Per-repository taggingCan it attribute usage to a repository or workspace?
4. Tool-call passthroughDoes Claude Code’s tool-use (bash, file edits) survive the gateway hop intact?
5. Streaming continuityDoes token streaming work without buffer-and-batch behaviour that breaks the CLI’s UX?
6. Alert + budget hooksCan you set a $X cap per developer per day and get paged when it trips?
7. Self-host postureCan the gateway run in your VPC so prompts and code never leave?

Verdict line at the end of each pick scores all seven.


How we picked

We started from the universe of public AI gateways that advertise an Anthropic-compatible endpoint as of May 2026. We removed gateways that don’t preserve tool calls (which excluded two early proxies that batched streaming). We removed gateways without per-key metadata pass-through (which excluded one major API gateway whose AI extension is still in beta for Anthropic). The remaining five are the cohort below.


1. Future AGI Agent Command Center: Best for per-developer / per-repo token attribution

Verdict: Future AGI ships per-developer virtual keys, per-repo span attributes, session-level cost aggregation, and OpenTelemetry-native exports out of the box, with Bedrock and Anthropic both reachable behind one OpenAI-compatible base URL. The chargeback table finance is asking for is one group-by away in the dashboard; the rest of the cohort either reports per-call cost without session aggregation or asks the platform team to wire a separate observability sink to recover it.

What it does for Claude Code token monitoring:

  • Per-session traces with the session ID exposed as a top-level span attribute. Each turn of a Claude Code conversation becomes a child span under the session, so you can see exactly which turn ballooned context to 180K tokens.
  • Per-developer aggregation through the fi.attributes.user.id span attribute. Set it once in the gateway config; group-by-developer in the dashboard is built-in.
  • Per-repository tagging through arbitrary span attributes. Wire repo=<git remote url> into the gateway forwarding rule and the same dashboard groups by repo.
  • Tool-call passthrough preserved because the gateway parses Anthropic’s tool-use blocks rather than re-serialising them as text.
  • Streaming continuity maintained, the SSE pass-through doesn’t buffer.
  • Alert and budget hooks through fi.alerts on rolling-window cost thresholds.
  • Self-host posture through the BYOC deployment of Agent Command Center plus the Apache 2.0 traceAI library.

The loop. Every captured trace gets scored by ai-evaluation (Apache 2.0), which ships a 50+ built-in rubric catalog covering task completion, faithfulness, code-correctness, tool-use accuracy, structured-output, hallucination, and groundedness, plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code, plus self-improving evaluators that learn from live production traces (the rubric sharpens as Claude Code traffic flows), plus FAGI’s proprietary classifier model family that runs continuous high-volume per-session scoring at very low cost-per-token (Galileo Luna-2 cost economics, rubric-flexible). traceAI instruments 35+ frameworks OpenInference-natively, and Error Feed (FAGI’s “Sentry for AI agents”) sits alongside as the zero-config error monitor: auto-clusters related per-developer Claude Code failures into named issues (50 traces → 1 issue), auto-writes the root cause from span evidence plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue so emerging session-cost regressions surface like exceptions rather than buried in dashboards. Low-scoring sessions become a failure dataset that agent-opt (ProTeGi, Bayesian, GEPA) uses to rewrite the system prompt or adjust the model-routing policy. Next deploy, the gateway uses the updated route. Catalog is the floor, not the ceiling. This is the wedge no other gateway here implements.

Where it falls short:

  • agent-opt is opt-in, start with traceAI + ai-evaluation for one-week pilots and turn the optimizer on once eval baselines stabilize. If you only want per-developer cost numbers, you don’t need the loop yet.

  • The prompt-library UI is less mature than Portkey’s. If your team relies on a shared prompt library, Portkey wins on that single feature.

Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II, HIPAA, GDPR, and CCPA all certified. BAA available. AWS Marketplace listing for procurement.

Score: 7/7 axes.


2. Portkey: Best for hosted gateway with mature RBAC

Verdict: Portkey is the most polished hosted-only product in this category. If your team is using Claude Code through one shared workspace and wants per-developer keys + RBAC out of the box, Portkey is the fastest path. It doesn’t optimize prompts back; it observes and routes.

What it does for Claude Code token monitoring:

  • Per-session traces through Portkey’s trace_id request header. The Claude Code wrapper has to set it; if your team uses a single shared key, sessions blend together unless you wire the header.
  • Per-developer chargeback through Portkey’s virtual-key system. Each developer gets their own virtual key, all of which fan out to one underlying Anthropic key, so you keep bulk pricing.
  • Per-repository tagging through metadata headers. Same caveat, the client has to set them.
  • Tool-call passthrough confirmed working as of May 2026 with claude-opus-4-7 and claude-sonnet-4-6.
  • Streaming continuity works for SSE; gRPC pass-through is on the roadmap.
  • Alert and budget hooks through per-key budget caps and Slack alerts.
  • Self-host posture through Portkey’s BYOC option, which is good but not air-gapped.

Where it falls short:

  • No optimizer. The traces inform humans, not the gateway.
  • The metadata-header model requires Claude Code wrapper changes. Without that, you only get key-level aggregation.
  • Pricing escalates above 5M requests/month faster than the lighter-weight alternatives.

Pricing: Free tier with 10K requests/day. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II.

Score: 6/7 axes (missing: feedback loop / optimization).


3. Helicone: Best for lightweight observability

Verdict: Helicone is the right pick when you want per-request observability for Claude Code and nothing else. Drop the proxy URL in front of Anthropic, get a per-request cost table, move on with your week. If you also want budgets, routing, or guardrails, the other four on this list are deeper.

What it does for Claude Code token monitoring:

  • Per-session traces through Helicone’s Helicone-Session-Id header. Same client-side wiring caveat.
  • Per-developer chargeback through Helicone-User-Id. Aggregation in the dashboard is simple but the slicing is shallower than Portkey or Future AGI.
  • Per-repository tagging through custom properties.
  • Tool-call passthrough confirmed.
  • Streaming continuity works.
  • Alert and budget hooks through usage alerts and rate-limit policies. Less expressive than budget caps with auto-pause.
  • Self-host posture through Helicone’s open-source self-host. Good for low-volume teams; the team admits scale-out beyond a few hundred RPS gets operational.

Where it falls short:

  • No optimizer.
  • No prompt library.
  • Routing intelligence is basic (round-robin / failover). Claude-Code-specific routing, e.g., route easy turns to claude-haiku-4-5 and hard turns to claude-opus-4-7, has to be coded by your team upstream of the proxy.

Pricing: Free tier with 10K requests/month. Pro tier starts at $25/month. Enterprise is custom.

Score: 5/7 axes (missing: feedback loop, mature alerting).


4. LiteLLM: Best for self-hosted Python-native routing

Verdict: LiteLLM is the pick when Claude Code’s traffic can’t leave your VPC and the security team wants to read every line of code that touches a prompt. It’s source-available, Python-native, and runs as a proxy inside your infra. You get less observability than the hosted options out of the box, but the source code is yours.

What it does for Claude Code token monitoring:

  • Per-session traces through LiteLLM’s metadata pass-through. Wire metadata.session_id and metadata.trace_id in your proxy config.
  • Per-developer chargeback through team_id and user_id on virtual keys. The team_id maps to the SSO claim if you wire your IdP.
  • Per-repository tagging through custom metadata. Bring your own dashboard. LiteLLM’s UI lists transactions but slicing by repo requires exporting to your analytics warehouse.
  • Tool-call passthrough confirmed working for Anthropic tool-use blocks.
  • Streaming continuity works.
  • Alert and budget hooks through LiteLLM’s spend tracking and per-key budgets. Webhook-based alerting; you wire the page-on-call yourself.
  • Self-host posture is the strongest in this list. Source-available, runs on your nodes, no telemetry leaves the VPC.

Where it falls short:

  • No optimizer.
  • The UI is functional, not polished. Slicing by developer or repo means going to a SQL dashboard.
  • The “observability included” story is thinner than Portkey or Helicone. Plan to wire Future AGI traceAI or another OTel sink behind LiteLLM if you want depth.

Pricing: Open source under MIT. LiteLLM also sells an Enterprise tier with SLA + SSO + audit; starts around $250/month for small teams.

Score: 5.5/7 axes (missing: native polished dashboard, optimizer).


5. Kong AI Gateway: Best if you already run Kong

Verdict: Kong AI Gateway is the pick when the API platform team already runs Kong for the company’s REST APIs and the path of least resistance is to extend that stack with AI-specific policies. The strengths are SLA, plugin ecosystem, and ops familiarity. The weaknesses are AI-specific shallowness, most observability happens via plugins, not natively.

What it does for Claude Code token monitoring:

  • Per-session traces through OpenTelemetry plugins. Kong’s OTel plugin captures the request lifecycle; you wire span attributes through Lua or the new AI Proxy plugin.
  • Per-developer chargeback through consumer + tag patterns. The chargeback dashboard is third-party (typically Grafana on top of the OTel sink).
  • Per-repository tagging through tags on the Kong consumer.
  • Tool-call passthrough works through the AI Proxy plugin (Kong 3.6+).
  • Streaming continuity is supported.
  • Alert and budget hooks through Kong’s rate-limiting plugins. Spend-based caps require plugin work.
  • Self-host posture is the entire point of Kong; it runs anywhere.

Where it falls short:

  • AI-specific observability is plugin-driven, not native. The default dashboard is the API-gateway view, not the LLM-cost view.
  • No optimizer.
  • Out of the box, the spend-tracking story requires wiring multiple plugins. Plan two weeks of platform-team time to set up the chargeback view your finance team will accept.

Pricing: Kong is open source. Kong Konnect (managed) starts free. Enterprise plans for SLA, plugins, and support start around $1.5K/month.

Score: 5/7 axes (missing: native AI observability, optimizer, polished cost dashboard).


Capability matrix

AxisFuture AGIPortkeyHeliconeLiteLLMKong AI Gateway
Per-session attribution✅ Native✅ Header✅ Header✅ Metadata✅ Plugin
Per-developer chargeback✅ Native✅ Virtual key✅ Header✅ Team/user✅ Consumer
Per-repo tagging✅ Span attr✅ Metadata✅ Custom prop✅ Metadata✅ Tags
Tool-call passthrough✅ (3.6+)
Streaming continuity
Alert + budget hooks✅ + auto-pause✅ Slack + cap⚠️ Rate-only✅ Webhook⚠️ Plugin
Self-host posture✅ BYOC✅ BYOC✅ OSS✅ OSS✅ OSS
Feedback loop / optimizerfi.opt

Decision framework: Choose X if

Choose Future AGI if you want the gateway to do more than monitor, if the traces should also drive prompt and route optimization over time. agent-opt is opt-in, turn it on once Claude Code has eval baselines and live traces flowing, and the cost curve compounds downward from there.

Choose Portkey if you want a hosted gateway with mature RBAC, virtual keys, and a polished UI, and you don’t need the optimizer yet. Pick this when the procurement story matters and the team is happy with monitoring as a one-time setup.

Choose Helicone if you want the lightest possible drop-in for per-request observability and you don’t need routing or budgets. Pick this for teams under 10 developers on Claude Code where the simpler product is the right fit.

Choose LiteLLM if your security or compliance team requires Claude Code traffic to never leave the VPC. Pick this when source-availability and self-host control beat the polish of the hosted alternatives.

Choose Kong AI Gateway if you already operate Kong for REST APIs and the path of least resistance is to extend the existing stack. Pick this when the platform team’s familiarity with Kong outweighs the AI-specific shallowness of the AI Proxy plugin.


Common mistakes when wiring Claude Code through a gateway

MistakeWhat goes wrongFix
Pointing only the IDE plugin at the gatewayThe terminal CLI usage still hits Anthropic directly; chargeback misses half the trafficSet ANTHROPIC_BASE_URL in the shell profile, not just the IDE
Sharing one team key across developersAll sessions look identical to the dashboardIssue virtual keys per developer (Portkey / Future AGI / LiteLLM)
Not preserving anthropic-version headerTool-use behaviour silently differs between Claude Code’s expected version and the gateway’s defaultPin the version explicitly in the gateway forwarding rule
Buffering streaming responsesClaude Code’s progress UI freezes mid-turnConfirm the gateway forwards SSE without buffer-and-batch
Tagging with only user_id, not session_idSession-level cost attribution is impossibleTag both; the session ID is what makes the cost story legible
Setting budget caps too lowClaude Code pauses mid-conversation, breaking the developer’s flowSet caps with a soft-alert at 80% and a hard-pause at 110% so engineering knows before pause

How Future AGI closes the loop on Claude Code spend

The other four gateways treat token monitoring as an end state: capture the trace, show it in a dashboard, send an alert when spend trips a threshold. Future AGI treats it as the input to a feedback loop. The loop has six stages:

  1. Trace. Every Claude Code turn produces a span tree via traceAI (Apache 2.0). Spans capture inputs, outputs, tool calls, model used, and the session ID.

  2. Evaluate. fi.evals scores every turn against task-completion, faithfulness, and code-correctness rubrics. Scores live alongside the cost data.

  3. Cluster. Low-scoring sessions get clustered by failure mode in the Agent Command Center. A common pattern is “Claude Opus called when Sonnet would have been enough”, the cost-quality mismatch becomes visible.

  4. Optimize. fi.opt.optimizers (ProTeGi, Bayesian, GEPA) rewrites the system prompt or adjusts the routing policy against the clustered failures. For Claude Code specifically, the typical optimization is a routing rule: route turns under 10K input tokens to claude-haiku-4-5, everything else to claude-opus-4-7.

  5. Route. Agent Command Center’s gateway applies the updated routing policy on the next request.

  6. Re-deploy. The new prompt + route are versioned. Teams roll forward; if the score regresses, automatic rollback to the previous version.

Net effect: a team that starts at $40,000/month on Claude Code typically sees costs trend down 15-30% within four weeks without changing any developer behaviour. The router gets better at choosing the cheaper model for the easy turns, the optimizer rewrites prompts that were over-prompting, and the eval data tells the loop where to focus next.

The three building blocks are open source:

  • traceAI, github.com/future-agi/traceAI (Apache 2.0)
  • ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)
  • agent-opt, github.com/future-agi/agent-opt (Apache 2.0)

The hosted Agent Command Center adds the failure-cluster view, live Protect guardrails (the Future AGI Protect model family. Gemma 3n fine-tuned adapters across Content Moderation, Bias Detection, Security, and Data Privacy Compliance; multi-modal text, image, and audio), RBAC, SOC 2 Type II certified, and AWS Marketplace for procurement.


What we did not include

We deliberately left out three gateways that show up in other 2026 listicles:

  • OpenRouter. Fantastic for model exploration but not the right shape for an enterprise-team chargeback story. The routing is consumer-facing.
  • Cloudflare AI Gateway. Strong primitives but the Claude Code integration story is thin as of May 2026; the worker-based observability doesn’t yet do per-developer slicing without custom code.
  • TrueFoundry. Solid MLOps gateway but the Claude Code specific integration wasn’t stable in our May 2026 testing.

If your situation is different, all three are worth a second look in Q3 2026.



Sources

  • Anthropic Claude Code documentation, claude.ai/docs/claude-code
  • Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
  • Portkey AI gateway, portkey.ai
  • Helicone proxy, helicone.ai
  • LiteLLM proxy, github.com/BerriAI/litellm
  • Kong AI Gateway, konghq.com/products/kong-ai-gateway
  • Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)

Frequently asked questions

What is the cheapest way to monitor Claude Code token usage?
Helicone's free tier or the LiteLLM open-source proxy. Both give you per-request cost. Per-developer or per-repo chargeback requires wiring custom headers from the Claude Code wrapper.
Does Claude Code support OpenAI-compatible endpoints?
Claude Code speaks the Anthropic API natively. The five gateways above all support pointing Claude Code at them via `ANTHROPIC_BASE_URL`. The OpenAI-compatible shim is not needed.
Can I route Claude Code through multiple model providers?
Yes, but with care. Claude Code is tuned for Claude models specifically; routing turns to non-Claude models often degrades the tool-use behaviour. The safe pattern is to route between `claude-haiku-4-5`, `claude-sonnet-4-6`, and `claude-opus-4-7` based on token budget, not to swap providers.
How do I track Claude Code cost per developer when everyone shares one API key?
Use a gateway with virtual keys (Future AGI, Portkey, LiteLLM). Each developer gets a virtual key that fans out to the team key, preserving bulk pricing.
What happens to tool calls when Claude Code runs through a gateway?
All five gateways pass tool calls through intact as of May 2026. Earlier in 2025, two proxies broke tool-use by re-serialising the content blocks. The five in this list have been tested with claude-opus-4-7 + tool-use; they preserve the blocks correctly.
Is it safe to send source code through an AI gateway?
For hosted gateways, the data flow is gateway → Anthropic; both endpoints already see the code. If your compliance regime forbids both, the only safe pick is self-hosted LiteLLM (or Future AGI's BYOC deployment) running inside your VPC, with Anthropic traffic egressing through your own network.
How is Future AGI Agent Command Center different from Portkey for Claude Code?
Portkey is a hosted observation and routing layer. Future AGI adds an optimization layer on top — the trace data feeds back into prompt rewrites and routing-policy updates, so the gateway gets better at its job over time. Portkey gives you a dashboard; Future AGI gives you a dashboard plus a loop.
Related Articles
View all
Top 5 Tools for Claude Code Cost Management in 2026
Guides

Five tools for Claude Code cost management in 2026 — four gateways plus the native Anthropic dashboard and a FinOps platform — scored on attribution, chargeback, caps, routing, cache observability, FinOps integration, and audit trail.

NVJK Kartik
NVJK Kartik ·
18 min
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.