Guides

Best 5 LLM Gateways for Monitoring Claude Code Token Spend in 2026

Five LLM gateways scored on Claude Code token-spend monitoring in 2026: session attribution, dev chargeback, dashboard slicing, alerts, BI exports.

January 15, 2026

19 min read

ai-gateway 2026 claude-code

Table of Contents

On the first business day of every month, somebody on the finance side opens the Anthropic billing PDF, sees the Claude Code line item, and emails engineering: who spent this. For most teams the honest answer is “we’ll get back to you.” That gap (between the invoice landing and someone reconstructing what happened) is what an LLM gateway in front of Claude Code is supposed to close.

Monitoring is the passive half of the cost story. Not the half that pauses traffic or downgrades models mid-session, just capture every Claude Code turn, attach the right metadata, store it long enough that finance and engineering can both query it, and surface the shape of the spend in dashboards and alerts before the invoice arrives. A good LLM gateway turns a $40,000 Anthropic line item into a queryable table. A great one keeps that table queryable a year later.

The category nomenclature matters. “AI gateway” is the broader marketing term bundling routing, governance, guardrails, and observability. “LLM gateway”, the framing used by Bifrost, LiteLLM, and increasingly Helicone, leans harder on the model-traffic-observability slice. The five picks below all market themselves as LLM gateways or have LLM-gateway-shaped product surfaces; each verdict is grounded in their monitoring story, not their action-layer ambitions.

TL;DR

Future AGI Agent Command Center is the strongest pick for an LLM gateway monitoring Claude Code token spend because it ships per-developer virtual keys, per-repo span attributes, session-level cost aggregation across multi-turn Claude Code conversations, OpenTelemetry-native finance exports, and the same span data is queryable a year later via tiered audit retention. The other four picks below win on specific edges.

Future AGI Agent Command Center — Best overall. Deep per-session attribution, OTel-native exports, tiered audit retention, and the depth a SOC 2 audit needs.
Portkey — Best for the cleanest dashboard slicing and per-key analytics out of the box. Hosted LLM gateway with polished UI (verify the Palo Alto Networks acquisition timeline before signing multi-year).
Helicone — Best for drop-in observability with the lowest-friction setup for small teams. Simplest proxy and fastest first dashboard view (treat as planned migration after the March 3, 2026 Mintlify acquisition).
LiteLLM — Best when traffic cannot leave the VPC. Self-hosted LLM gateway with first-class spend tables that pair cleanly with BYO BI tools; pin commits after the March 24, 2026 PyPI compromise.
Maxim Bifrost — Best for Go-native performance and OTel-first export discipline. High-throughput proxy with structured OTel export to any BI sink.

Monitoring is not management

Conflating the two is the most common procurement failure in this category. They share telemetry plumbing but answer different questions.

Monitoring is read-only: who spent what, when, on which session, repo, model. Consumers are finance, engineering leadership, and the on-call EM. Primitives are spans, attributes, dashboards, alert rules. Success metric: “can you reconstruct any session from the last 90 days in under 30 seconds.”

Management is the write-side: pause traffic, downgrade models, block tool calls, rewrite prompts. Primitives are budget caps, route rewrites, guardrails.

This post is exclusively about monitoring. A monitoring-first LLM gateway in front of Claude Code does four things every turn: intercept the call to api.anthropic.com into a span; attach metadata (developer, session, repo, cost center); persist into a queryable backend with a retention policy your audit team can live with; expose dashboards, alerts, and a BI export. All five picks do the first step; the differences live in steps two through four.

Why Claude Code monitoring is its own problem shape

Three properties make this materially harder than monitoring a RAG service or single-turn chatbot.

Sessions are long, asymmetric, and recursive. A bug-fix conversation runs 40 to 60 turns. The first turn pulls 80K-200K tokens of repo context; middle turns are short follow-ups; late turns re-read files Claude already edited. Per-call telemetry tells you almost nothing about session shape, per-session telemetry is the unit of work, and it requires the gateway to carry a session ID through every turn or reconstruct from temporal proximity.

Cost is concentrated and bimodal. In Future AGI’s anonymized cohort across 22 engineering teams in Q1 2026, the top 10% of Claude Code users accounted for 51% of token spend; a single monorepo migration accounted for 23% of one team’s monthly spend on its own. Mean and median per-developer numbers are useless. You want P90 and P99 by developer, repo, hour-of-day, session length, which requires multi-dimensional slicing depth.

Metadata has to be set somewhere. The Anthropic API has no concept of “developer email” or “repo name.” If you want those in your spans, the Claude Code wrapper or gateway forwarding rule has to inject them. Different gateways have different opinions (header, span attribute, virtual key) and the choice cascades into how clean the dashboard slicing feels later.

The 7 axes we score on

Axis	What it measures
1. Per-session token attribution	Native span hierarchy by Claude Code session ID, or only per-API-key?
2. Per-developer chargeback	Aggregate by SSO claim / developer email with one filter; produce a chargeback table finance accepts?
3. Per-repo tagging	First-class dimension, or reconstructed from prompt content?
4. Dashboard slicing depth	How many simultaneous filters? Can you ask “platform-squad, monorepo X, Opus, last 30 days, P95 tokens” without writing SQL?
5. Alert primitives	Absolute thresholds, rate-of-change, deviation-from-baseline, per-cost-center routing? What’s breach-to-page latency?
6. Export to BI tools	First-class Snowflake / BigQuery / S3 + Parquet / Looker sinks, or screenshot-and-CSV?
7. Retention policy	How long at full fidelity by default? At what cost? Hot-vs-cold tier?

Three axes (slicing depth, alert latency, retention) are where most differentiation in this cohort sits. The other four are table stakes some gateways still get wrong.

How we picked

We started with public LLM gateways advertising an Anthropic-compatible endpoint and a dashboard product as of May 2026. We removed gateways that re-serialise tool calls (which broke tool_use blocks for two early proxies), gateways whose dashboard is screenshot-only with no export hook, and gateways without first-class metadata pass-through on Anthropic streaming.

Numbers we cite for retention, dashboard load time, and alert latency come from a synthetic test bed: 30 engineers, 12,000 sessions per week, 380K turns, three repositories, eight cost centers, Opus + Sonnet + Haiku in the mix. Vendor pages cite different figures for their own benchmarks; both are real, neither is wrong, they measure different things.

1. Future AGI Agent Command Center: Best for monitoring depth that becomes a feedback loop

Verdict: The only LLM gateway in this cohort where the monitoring layer is structurally wired to an evaluation layer and an optimizer. For pure passive monitoring on day one it’s overkill. For teams that want the monitoring data to become a self-improving routing policy without ripping out the gateway later, it’s the only entry that gets there without a re-platform.

What it does for Claude Code monitoring:

Per-session attribution is native. traceAI (Apache 2.0) emits a span tree per session, with session ID as a top-level attribute and each turn a child span. The dashboard renders the tree as the runtime produced it, no reconstruction.
Per-developer chargeback via fi.attributes.user.id from the SSO claim or virtual-key owner; group-by-developer built-in.
Per-repo tagging is a free-form span attribute; the forwarding rule injects repo=<git remote> and the dashboard treats it as a first-class dimension.
Dashboard slicing depth is the strongest in the cohort: six concurrent filters first-class, server-side attribute filters, default 30-day / 380K-turn load time around 1.8 seconds in our test bed.
Alert primitives include absolute, rate-of-change, deviation from a 7-day baseline, and per-cost-center routing. Breach-to-page latency around 22 seconds, push-based.
Export to BI tools is first-class: native Snowflake and BigQuery sinks, S3 + Parquet cold tier, Postgres replica. Stable export schema across versions.
Retention: 30 days at full fidelity on Free, 90 on Scale, configurable to 365 on Enterprise. S3 + Parquet cold tier for indefinite retention.

The same traceAI spans (35+ framework integrations, OpenInference-native) that feed dashboards also feed fi.evals and fi.opt.optimizers. Teams adopting for monitoring on day one can turn on the loop in month three without re-instrumenting. The Future AGI Protect model family sits alongside as the inline guardrail layer at ~67 ms p50 text and ~109 ms p50 image (arXiv 2510.13351). FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain. Error Feed (FAGI’s “Sentry for AI agents”) sits alongside as the zero-config error monitor: auto-clusters related per-developer and per-session failures into named issues (50 traces → 1 issue), auto-writes the root cause plus a quick fix plus a long-term recommendation, and tracks trend per issue so emerging Claude Code regressions surface like exceptions rather than buried in chargeback rows.

Where it falls short:

Depth of the trace model is more than a pure monitoring shop needs on day one. For “monthly chargeback by developer, that’s it,” Helicone is faster to first dashboard.
Prompt-library UI is less mature than Portkey’s. If shared prompt management is a primary use case, Portkey wins on that one feature.

Pricing: Free 100K traces/month, 30-day retention. Scale starts at $99/month, 90-day retention. Enterprise custom with 365-day retention, BAA, SOC 2 Type II certified, AWS Marketplace.

Score: 7/7 axes.

2. Portkey: Best for hosted dashboard polish and per-key analytics

Verdict: The most polished hosted-only LLM gateway for teams whose primary monitoring story is “per-developer spend, clean RBAC, dashboard executives will actually log into.” Virtual-key analytics are first-class. Depth is below Future AGI’s; polish is real.

What it does for Claude Code monitoring:

Per-session attribution uses Portkey’s trace_id request header. The Claude Code wrapper has to set it explicitly; without it, sessions collapse to per-call metrics, a common silent failure on day one.
Per-developer chargeback is the strongest part of the product. Each developer gets a virtual key fanning out to one underlying Anthropic key (preserving bulk pricing). Per-virtual-key analytics (spend, RPS, tokens, error rate) are the default landing page.
Per-repo tagging uses metadata headers, same wrapper-side responsibility.
Dashboard slicing depth is competitive: four concurrent filters first-class; deeper combinations push toward SQL export. Default 30-day load time around 2.4 seconds.
Alert primitives are budget-cap-shaped (per-key limits) with native Slack and Teams hooks. Deviation-from-baseline isn’t a default but is constructable through conditional rules. Breach-to-Slack latency around 35 seconds.
Export to BI tools is supported through CSV and an API; first-class warehouse sinks aren’t present as of May 2026.
Retention: 30 days on Free, 90 on Scale, configurable on Enterprise. Long-term archive is less mature than the hot-tier dashboard.

Where it falls short:

No native warehouse sink. Teams piping into Snowflake / BigQuery schedule an API pull, fine, but an extra moving part.
Dashboard depth is good for two-to-four-dimension slicing; beyond that, the product nudges toward SQL export.
The metadata-header model assumes the wrapper sets the right headers. First-week setups commonly miss either repo or session_id until someone notices the slice is empty.

Pricing: Free 10K requests/day. Scale starts at $99/month. Enterprise custom with SOC 2 Type II.

Score: 5.5/7 axes.

3. Helicone: Best for lowest-friction Claude Code monitoring

Verdict: Right pick when the team is under 15 developers on Claude Code, the monitoring story is “per-request cost table plus a daily Slack digest,” and platform-engineering capacity is roughly zero. Drop the proxy URL in front of Anthropic, get a usable dashboard in 20 minutes. Slicing depth and retention are below the other four; setup time is the lowest.

What it does for Claude Code monitoring:

Per-session attribution via Helicone-Session-Id header (wrapper-side).
Per-developer chargeback via Helicone-User-Id. Aggregation is straightforward; slicing surface is shallower than Portkey or Future AGI.
Per-repo tagging via custom properties.
Dashboard slicing depth caps at two-to-three concurrent filters at default depth. Beyond that, teams write ClickHouse queries against the self-hosted backend, a real escape hatch but not a Slack-friendly one. Load time around 1.6 seconds on short ranges, climbing toward 4 seconds on 90-day views.
Alert primitives are rate-limit-first; “spend over $X in last hour” is wireable but the grammar is less expressive than Portkey’s per-key budgets or Future AGI’s per-cost-center conditions. Alert latency around 60 to 90 seconds, poll-based.
Export to BI tools via webhook and CSV. Self-hosted ClickHouse is the de facto BI tool for deep queries.
Retention: 30 days on Free, longer on paid; self-hosted retains as long as your disk allows.

Where it falls short:

Slicing depth caps out where the other four are just getting started. Multi-dimensional finance queries push toward ClickHouse SQL.
Alert latency on the hosted product is poll-based and noticeably slower than Future AGI or Portkey.
Self-hosted runs comfortably under a few hundred RPS; scale-out beyond that takes operational work the Helicone team is open about.

Pricing: Free 10K requests/month. Pro starts at $25/month. Enterprise custom.

Score: 4/7 axes.

4. LiteLLM: Best for self-hosted LLM gateway monitoring inside the VPC

Verdict: The pick when Claude Code traffic can’t leave the VPC and the team needs real spend tables they can join against everything else in the warehouse. OSS proxy is mature. Monitoring UI is functional but not Looker-grade; the architectural posture is “expose the data cleanly and let the team pick their own dashboard.”

What it does for Claude Code monitoring:

Per-session attribution via metadata.session_id pass-through.
Per-developer chargeback via team_id and user_id on virtual keys; team_id maps to the SSO claim if the IdP is wired in.
Per-repo tagging via custom metadata. BYO dashboard, the LiteLLM UI lists transactions; deep slicing happens in the warehouse.
Dashboard slicing depth in the UI is two-to-three concurrent filters. The export-to-warehouse pattern is where real slicing happens. Teams commonly pair LiteLLM with Future AGI traceAI as the deep-observability sink.
Alert primitives are spend-tracking webhooks. Wire the page-on-call destination yourself. Alert latency depends on polling interval, typically 60 seconds.
Export to BI tools is where LiteLLM shines among OSS options. Spend tables are exposed as Postgres / SQLite tables; teams hook Dagster, Airflow, or scheduled dumps to land data in Snowflake or BigQuery without any vendor-specific adapter.
Retention is “as long as your Postgres will hold it.” For most teams 12 months of full-fidelity spend tables on a modest RDS instance, the longest retention in this cohort, gated only by storage budget.

Where it falls short:

UI is functional, not polished. Finance teams accustomed to Looker or Mode want spend tables piped to their existing BI sink rather than using LiteLLM’s dashboard for monthly review.
Alert primitives are basic compared to hosted alternatives. The OSS product gives you the trigger; you build the destination logic.
Setup time is meaningfully higher than hosted options. Plan a platform-team sprint for deployment, IdP integration, and the export pipeline.

Pricing: OSS under MIT. LiteLLM Enterprise starts around $250/month for small teams; custom above 100 seats.

Score: 5.5/7 axes.

5. Maxim Bifrost: Best for Go-native throughput with OTel-first export discipline

Verdict: The entry that most explicitly markets itself as an “LLM gateway” rather than the broader “AI gateway.” Go-native proxy, strongest throughput-per-CPU in this cohort, export story leans on OpenTelemetry as the source of truth rather than a vendor-specific schema. Cleanest fit for teams whose monitoring stack is already built around an OTel collector and warehouse-native BI. For teams that want a polished dashboard out of the box, the other four are deeper.

What it does for Claude Code monitoring:

Per-session attribution via OTel span context. The session ID rides in traceparent or as a span attribute. Session reconstruction depends on the downstream OTel sink. Bifrost emits cleanly; the dashboard is whatever you point it at (Maxim’s hosted, Grafana Tempo, Honeycomb).
Per-developer chargeback via OTel resource attributes, user.id, user.email, or a custom attribute.
Per-repo tagging via OTel span attributes. Free-form, downstream-agnostic.
Dashboard slicing depth in Maxim’s hosted dashboard is competitive (four concurrent filters), but the philosophy is OTel-first, the canonical view is whatever your OTel sink renders. Hosted load time around 2.0 seconds; downstream sinks vary.
Alert primitives in the hosted product include threshold and rate-of-change. The OTel-export route means alerts can also live in your existing alerting platform (Grafana Alerting, Datadog), the dominant pattern among teams adopting Bifrost. Hosted breach-to-page latency around 40 to 60 seconds.
Export to BI tools is the strongest single feature. OTel collector configs exist for Snowflake, BigQuery, ClickHouse, S3 + Parquet. The schema is whatever the OTel sink renders, no vendor-specific shape to learn.
Retention in the hosted product is 30 days on Free, up to 180 days on Enterprise. In the export-downstream pattern, retention is governed by the warehouse.

Where it falls short:

The hosted dashboard, while perfectly usable, is less polished than Portkey’s. The product’s bet is OTel-first, not dashboard-first.
Session reconstruction is downstream-dependent. Grafana Tempo renders sessions well; less span-aware sinks need you to build the join logic.
Chargeback-table documentation is thinner than LiteLLM’s; teams new to OTel will spend more time on initial wiring.

Pricing: OSS proxy under Apache 2.0. Hosted Maxim dashboard tier starts free for individuals; team pricing custom. Enterprise custom with SLA and dedicated tenancy.

Score: 5/7 axes.

Capability matrix

Axis	Future AGI	Portkey	Helicone	LiteLLM	Maxim Bifrost
Per-session attribution	Native span tree	Trace-ID header	Session-ID header	Metadata pass-through	OTel span context
Per-developer chargeback	Native attribute group-by	Virtual-key analytics	User-ID header	Team/user on virtual keys	OTel resource attrs
Per-repo tagging	Free-form span attr	Metadata header	Custom property	Custom metadata	Free-form OTel attr
Dashboard slicing depth	6 concurrent filters	4 filters	2–3 filters	2–3 + warehouse	Hosted 4 / downstream open
Alert latency (breach → page)	~22s	~35s	~60–90s (poll)	Webhook-driven	~40–60s hosted
Export to BI tools	Native Snowflake/BigQuery/S3	CSV + API	Webhook + CSV	Postgres tables, BYO ETL	OTel collector to any sink
Retention (free → enterprise)	30 / 90 / 365 days	30 / 90 / custom	30 / custom / self-host	Postgres-bounded (12mo+ typical)	30 / 180 / downstream

Decision framework: Choose X if

Choose Future AGI Agent Command Center if monitoring depth is non-negotiable and the team expects to eventually graduate from passive observation to a self-improving routing policy without re-platforming.

Choose Portkey if the primary story is per-developer analytics with mature RBAC and a dashboard executives will log into, and the team has the wrapper-side discipline to set metadata headers correctly.

Choose Helicone if the team is small, the monitoring story is “per-request cost table plus a daily Slack digest,” and platform-engineering capacity is constrained.

Choose LiteLLM if Claude Code traffic can’t leave the VPC and the BI workflow is already warehouse-native.

Choose Maxim Bifrost if the team’s monitoring philosophy is OTel-first and the existing observability stack (Grafana, Honeycomb, Datadog, ClickHouse) is the canonical home for spans.

Common mistakes when wiring Claude Code monitoring

Mistake	What goes wrong	Fix
Pointing only the IDE plugin at the gateway	Terminal CLI still hits Anthropic directly; chargeback misses 40–60% of traffic	Set `ANTHROPIC_BASE_URL` in the shell profile, not just the IDE
Sharing one team key across developers	Every session looks identical; per-developer chargeback collapses	Issue virtual keys per developer fanning out to the team key
Not setting the session ID header	Session-attribution degrades to per-call metrics; long-session shape invisible	Wrap Claude Code’s invocation to inject a session ID per CLI run
Alerting against absolute spend only	Slow-creeping spend never trips; surprise lands on the invoice	Pair absolute thresholds with deviation-from-baseline alerts on a 7-day moving average
Treating dashboards as the long-term audit record	When retention rolls, audit data is gone; finance asks about Q3 last year	Wire the warehouse export on day one; dashboard is the live view, warehouse is the archive
Tagging spans only with `user_id`	Per-repo and per-cost-center slicing impossible later	Tag every span with `user_id`, `session_id`, `repo`, and `cost_center` from day one
Polling-based alert checks at 60-second intervals	A breach at minute 31 doesn’t page until minute 92	Pick a gateway with push-based alerting (~20–40s latency)

How Future AGI’s monitoring layer composes into the loop

The other four LLM gateways treat monitoring as a terminal product: capture, render, alert, archive after retention. Future AGI treats monitoring as the upstream input to a feedback loop other layers consume. For teams that only want monitoring this is invisible; for teams that want to graduate, the architecture means no re-platforming.

Six stages, first three are pure monitoring:

Trace. Every Claude Code turn produces a span tree via traceAI (Apache 2.0). Spans capture inputs, outputs, tool calls, model, tokens, cost, session ID, developer, repo, and any custom attributes.
Dashboard. Agent Command Center renders the span store as multi-dimensional dashboards: cost-by-developer, cost-by-repo, P95-tokens-by-session, error-rate-by-model. Six concurrent filters, native warehouse export, alert primitives wired to the same data plane.
Alert. Per-cost-center alerts fire on absolute thresholds, rate-of-change, or deviation from a 7-day baseline. Routing pages the right EM, not a #general firehose. Push-based, ~22-second latency.

Stages four through six are optional, behind feature flags:

Evaluate. fi.evals scores every turn for code-correctness, faithfulness, and tool-use accuracy. Score is co-located with the cost record.
Optimize. fi.opt.optimizers (ProTeGi, Bayesian, GEPA) reads trace + eval history and proposes routing or prompt diffs.
Re-deploy. New policy versioned, signed, applied at the gateway on the next request. Each deploy logged as an audit event. Protect guardrails (~67ms text, 109ms image per arXiv 2510.13351) enforce policy without latency tax.

Three building blocks open source:

traceAI, github.com/future-agi/traceAI (Apache 2.0)
ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)
agent-opt, github.com/future-agi/agent-opt (Apache 2.0)

Agent Command Center adds the dashboard, alert routing, warehouse exports, RBAC with EM-versus-finance role separation, SOC 2 Type II, HIPAA, GDPR, and CCPA all certified, BAA available, AWS Marketplace.

What we did not include

We deliberately left out three gateways that show up in adjacent LLM-gateway listicles:

OpenRouter. Excellent for model exploration, but its observability surface is consumer-facing; the enterprise-team chargeback story isn’t the product’s shape.
Cloudflare AI Gateway. Strong primitives, but per-developer slicing and warehouse export still require worker code as of May 2026.
Kong AI Gateway. Covered in the AI-gateway shootouts; the monitoring view is plugin-driven rather than native, which fits the AI-gateway framing more than the LLM-gateway one used here.

All three are worth a second look in Q3 2026 as their LLM-gateway-shaped product surfaces mature.

Sources

Anthropic Claude Code documentation, claude.ai/docs/claude-code
Anthropic Usage API, docs.anthropic.com/en/api/usage
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Future AGI traceAI, github.com/future-agi/traceAI
Future AGI ai-evaluation, github.com/future-agi/ai-evaluation
Future AGI agent-opt, github.com/future-agi/agent-opt
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
Portkey LLM gateway, portkey.ai
Helicone proxy, helicone.ai
LiteLLM proxy, github.com/BerriAI/litellm
Maxim Bifrost, getmaxim.ai/bifrost

Frequently asked questions

What is the cheapest way to monitor Claude Code token spend?

Helicone's free tier or the LiteLLM OSS proxy. Both give per-request cost. Per-developer or per-repo chargeback requires custom headers from the Claude Code wrapper.

What's the difference between an LLM gateway and an AI gateway for monitoring?

The terms overlap, but 'LLM gateway' tends to mark products whose primary surface is model-traffic observability (Helicone, LiteLLM, Bifrost lean this way). 'AI gateway' is broader, bundling routing, governance, guardrails. For pure monitoring, LLM-gateway framing is closer to what you want; for the full action layer, AI-gateway framing covers more.

How long should I retain Claude Code spend data?

Finance typically wants 12–18 months for chargeback. Engineering ops is fine with 30–90 days. Right pattern: hot retention (30–90 days at full fidelity in the gateway dashboard) plus cold export to a warehouse for long-term audit. Future AGI and LiteLLM make this clean; Helicone and Bifrost require team-wired export pipelines.

Can the gateway alert me before the invoice arrives?

Yes, if the alert primitives include deviation-from-baseline. Absolute-threshold alerts trip late if spend creeps. Future AGI's 7-day baseline catches slow ramps; Portkey and Bifrost support similar conditions through rule construction; Helicone's grammar is shallower.

How do I export Claude Code spend data into Snowflake or BigQuery?

Future AGI ships native sinks. Bifrost's OTel-collector pattern is the most flexible. LiteLLM exposes Postgres spend tables; teams use Dagster, Airflow, or scheduled `pg_dump`. Portkey supports CSV + API. Helicone supports webhook + CSV.

What dashboard load times should I expect for a 30-developer team?

On a synthetic 380K-turn / 30-day workload: Future AGI ~1.8s, Helicone ~1.6s on short ranges (shallower dimensions), Portkey ~2.4s, Bifrost ~2.0s, LiteLLM ~3–5s in its UI.

How is Future AGI's monitoring different from Portkey's?

Both are hosted LLM gateways with per-developer and per-session dashboards. Portkey is dashboard-first and stops there. Future AGI's monitoring is wired to an evaluation layer and an optimizer, so the same trace store that renders the dashboard also feeds a self-improving routing policy when the team is ready. For pure passive monitoring, Portkey is easier to onboard; for teams that want to graduate later without re-platforming, Future AGI is the lower-regret choice.

View all

Guides

Best 5 AI Gateways to Cache Claude Code Calls in 2026

Five AI gateways scored on caching Claude Code calls in 2026: cross-developer cache scope, semantic-match thresholds, hit-rate, TTL, what each misses.

Rishav Hada · May 16, 2026

17 min

Guides

Top 5 Tools for Claude Code Cost Management in 2026

Five tools for Claude Code cost management in 2026: four gateways, the native Anthropic dashboard, and a FinOps platform, scored on chargeback, caps.

NVJK Kartik · May 14, 2026

18 min

Guides

Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026

Five AI gateways scored on Claude Code token monitoring in 2026: per-dev attribution, per-repo budgets, session traces, alerts, where each falls short.

Rishav Hada · May 8, 2026

17 min

TL;DR

Monitoring is not management

Why Claude Code monitoring is its own problem shape

The 7 axes we score on

How we picked

1. Future AGI Agent Command Center: Best for monitoring depth that becomes a feedback loop

2. Portkey: Best for hosted dashboard polish and per-key analytics

3. Helicone: Best for lowest-friction Claude Code monitoring

4. LiteLLM: Best for self-hosted LLM gateway monitoring inside the VPC

5. Maxim Bifrost: Best for Go-native throughput with OTel-first export discipline

Capability matrix

Decision framework: Choose X if

Common mistakes when wiring Claude Code monitoring

How Future AGI’s monitoring layer composes into the loop

What we did not include

Related reading

Sources

Frequently asked questions