Guides

Best 5 AI Gateways for Scaling Claude Code in the Enterprise in 2026

Five AI gateways scored on scaling Claude Code from 50 to 5,000+ engineers: HA active-active, RBAC, 1M+ req/day audit, IdP federation.

February 17, 2026

20 min read

ai-gateway 2026 claude-code

Table of Contents

A platform team rolls out Claude Code to a 50-engineer pilot in Q3 2025. The CLI works, the chargeback story holds, the security review is signed. Six months later the same team has 5,200 engineers, 312 GitHub teams, four IdP-bridged subsidiaries, and a finance organization that wants chargeback rolled up by business unit, sub-business unit, and cost center. The 50-engineer runbook doesn’t survive contact with the 5,000-engineer reality. The single-region deployment that was fine at 30 requests per minute now serves 1.2M requests per day, and the moment that gateway is unavailable, Claude Code is unavailable for the entire engineering organization.

Every enterprise we have spoken with that crossed 1,000 engineers on Claude Code in the past two quarters has hit the same five failure modes: single-region availability, painful upgrades, RBAC that doesn’t nest deeply enough, audit retention costs that surprised finance, and IdP federation that worked for one IdP and broke for the second. This post scores the five AI gateways an enterprise should consider for scaled Claude Code, on the seven axes that matter when the workload is the largest LLM line item in your budget.

TL;DR

Future AGI Agent Command Center is the strongest pick for scaling Claude Code to enterprise because it ships BYOC multi-region active-active HA, a 4-level cost-center hierarchy (org > BU > sub-BU > cost-center > repo > developer) with delegated administration, tiered audit retention that holds at 1M+ requests per day, zero-downtime canary upgrades with eval-score regression gating, and Bedrock / Anthropic / Vertex all behind one OpenAI-compatible base URL. The other four picks below win on specific edges.

Future AGI Agent Command Center — Best overall. BYOC multi-region active-active, 4-level cost hierarchy, tiered audit retention, and zero-downtime canary upgrades.
Portkey — Best for a polished hosted gateway with deep RBAC and four-tier budgets. Fastest scaled rollout if the security review allows a managed control plane (verify the Palo Alto Networks acquisition timeline before signing multi-year).
Kong AI Gateway — Best if the platform team already runs Kong for REST at scale. The AI extension inherits the operational discipline of an existing Kong fleet.
Cloudflare AI Gateway — Best if your enterprise already trusts Cloudflare and your SRE team accepts a Cloudflare-hosted data plane. Edge-deployed at low fixed cost with global Anycast HA included.
TrueFoundry — Best if procurement wants inference, gateway, and workspace under one MSA inside the VPC. Single-vendor MLOps platform with VPC-default deployment.

Why scaled Claude Code is a different problem

The 50-engineer pilot tolerates a single-region gateway, one API key, and a Saturday-night upgrade window. The 5,000-engineer deployment tolerates none of those. Five properties change everything once seat count crosses roughly 1,000 engineers.

Reliability becomes tier-zero. A 15-minute outage at 5,000 engineers is a company-wide IDE outage. The gateway inherits the availability bar of source control and CI. Anything less than multi-region active-active is a SPOF dressed up with failover language. Honest tradeoff: HA active-active is expensive, cross-region transfer, hot-tier replicas, dedicated SRE time.

RBAC has to nest the way your org chart nests. The 5,000-engineer deployment has 312 GitHub teams, four business units, fifteen sub-business units, sixty cost centers. A two-level role model represents none of it. A FinOps lead has to see her region’s spend without seeing payroll-system code; a BU owner caps his sub-units; the platform team rotates keys without seeing prompts. Few gateways nail three levels natively; fewer nail four with delegated administration.

Audit retention costs scale faster than you budgeted. At 1M+ requests per day the gateway captures roughly 30M prompt-completion pairs per month. At the 1-to-7-year windows SOX and HIPAA require, storage runs into single-digit terabytes annually. We have seen audit storage cost a 4,000-engineer enterprise more than $80K annually before negotiating a tiered retention policy.

Backpressure is a queueing problem, not a throttling problem. When 500 engineers hit Anthropic’s TPM cap at 9:14 AM Monday, the right answer is a priority-lane queue with graceful degradation to a cheaper model, not a hard 429 to every developer simultaneously.

IdP federation, not SSO, is the integration story. A scaled deployment has acquired subsidiaries with three IdPs (Okta plus Azure AD plus Workday). The gateway accepts federated claims with a normalized user.id schema, or finance reconciles three audit logs forever.

The 7 axes we score on

Axis	What it measures
1. Multi-region active-active HA	Does the gateway run active-active across two regions with automatic failover under 30 seconds?
2. Zero-downtime upgrades	Can control and data planes be upgraded without breaking long-running Claude Code sessions?
3. RBAC across 300+ teams with 3+ level cost-center nesting	Can a call attribute to org > business-unit > sub-unit > cost-center with role-based viewing at each level?
4. Audit log retention at 1M+ requests/day	Can the gateway store, index, and serve audit queries on 30M+ monthly traces without burning a separate hot-tier budget?
5. Backpressure on upstream rate limits	When Anthropic returns 429, does the gateway queue with priority lanes and graceful-degrade?
6. IdP federation across multiple identity providers	Can it accept Okta + Azure AD + Workday with a normalized schema?
7. Cost attribution at organizational hierarchy depth	Can finance export chargeback rolled up by 3+ levels with delegated read-only at each?

How we picked

We started with public AI gateways that ship an Anthropic-compatible endpoint and have been deployed at one or more enterprises with at least 1,000 Claude Code seats. We removed gateways without a multi-region deployment story, without SOC 2 Type II in place or in progress, or with a material 2026 trust event without a clean remediation path. The latter excludes Helicone (acquired by Mintlify, documentation-platform-first shift) and LiteLLM (March 24, 2026 PyPI supply-chain compromise per Datadog Security Labs).

1. Future AGI Agent Command Center: Best for BYOC active-active scale with deep cost-center nesting

Verdict: Future AGI ships BYOC multi-region active-active HA, a 4-level cost-center hierarchy (org > BU > sub-BU > cost-center > repo > developer) with delegated administration, tiered audit retention that holds at 1M+ requests per day, and Bedrock, Anthropic, and Vertex all behind one OpenAI-compatible base URL. The hot routing path stays deterministic at sub-millisecond decisions; an offline learner against eval scores proposes policy updates between deploys.

What it does for scaled Claude Code:

Multi-region active-active HA through BYOC. Control plane and data plane both run in the customer’s AWS or Azure account, with a reference architecture spanning two regions with cross-region replication. Failover under 30 seconds. Hosted tier is single-region today (US or EU); multi-region hosted is actively in development.
Zero-downtime upgrades through canary: 5% of traffic with eval-score regression checks, widen to 50% and 100%, automatic rollback on a measured regression.
RBAC with 4-level nesting and delegated administration. Native hierarchy: org > business-unit > sub-business-unit > cost-center > repo > developer. A BU owner manages her own sub-unit roles without involving the platform team.
Audit retention at 1M+ requests/day through tiered storage: hot (Postgres + columnar) 30 days, warm (Parquet) 1 year, cold (Glacier) 7 years. Cost per terabyte-year on cold tier is roughly 10x lower than single-tier hot storage.
Backpressure through a four-lane priority queue. On a 429, lower-priority lanes queue and the next call graceful-degrades to a cheaper model.
IdP federation through an identity broker that accepts signed JWTs from Okta, Azure AD, Auth0, and Workday and normalizes into a single fi.attributes.user.id schema.
Cost attribution at hierarchy depth through the same primitive that powers RBAC. Finance exports at any level with delegated read-only.

The loop. Every trace gets scored by fi.evals. traceAI instruments 50+ AI surfaces across Python, TypeScript, Java, and C# (including Spring Boot starter, Spring AI, LangChain4j, Semantic Kernel) OpenInference-natively, and Error Feed (the part of the eval stack, the clustering and what-to-fix layer that feeds the self-improving evaluators) sits alongside as the zero-config error monitor: auto-clusters related per-BU and per-cost-center failures into named issues (50 traces → 1 issue), auto-writes the root cause plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue so emerging Claude Code regressions surface like exceptions rather than buried in scaled audit logs. fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrite the system prompt or adjust routing. Typical optimization at scale: route turns under 10K input tokens to claude-haiku-4-5, reserve claude-opus-4-7 for multi-file diff context. A 4,000-engineer deployment we observed in Q1 2026 trended down 24% in model spend over eight weeks without changing developer behavior. Acceptance rates held flat because the optimizer grades on accepted-completion outcomes. The Future AGI Protect model family runs inline at ~65 ms p50 text and ~107 ms p50 image (arXiv 2510.13351). FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain.

Where it falls short:

BYOC active-active is more work than hosted single-region. Staff one SRE for two to three weeks during cutover; budget cross-region transfer plus a hot-tier replica. Under 1,000 engineers, hosted single-region is the right call.
The optimizer compounds; week one shows failure clusters, not a 24% cost cut. Teams needing an immediate cut can apply the static routing rules on day one.

Pricing: Free tier with 100K traces / month. Scale starts at $99/month. Enterprise is custom with SOC 2 Type II certified, BAA, and BYOC. AWS Marketplace listing.

Score: 7/7 axes.

2. Portkey: Best for hosted scaled deployment with deep RBAC

Verdict: Portkey is the most polished hosted-only product in the category. At scale it’s the fastest path when the security review allows a managed control plane. The four-tier budget hierarchy is what most FinOps leads want out of the box. It observes, attributes, and gates with polish; it doesn’t optimize.

What it does for scaled Claude Code:

Multi-region active-active on the hosted tier across US-East, US-West, EU, and APAC. BYOC runs the data plane in the customer account; the control plane stays in Portkey cloud unless you negotiate a private deployment.
Zero-downtime upgrades through blue-green.
RBAC with 4-tier hierarchy (org > workspace > project > virtual-key) is the deepest native hierarchy of the hosted picks. Delegated administration via SAML role claims. Edge case: five-plus level org charts where the fourth level has to double as project and cost center.
Audit retention up to 7 years on enterprise with S3 / Snowflake / Splunk export. Tier economics aren’t as transparent as Future AGI’s published model; negotiate the storage line item explicitly.
Backpressure through queueing and fallback. Priority lanes per workload class are configurable but require explicit policy authoring.
IdP federation through SAML SSO. Solid for two IdPs, workable for three; multi-IdP group mapping requires custom claim transformations.
Cost attribution through workspace + project + metadata. Chargeback export to a non-Portkey BI tool is a custom integration, not a default.

Where it falls short:

No optimizer. At scale, this is the difference between a flat cost curve and a downward one.
The Palo Alto Networks acquisition announced April 30, 2026 (close expected PANW fiscal Q4) adds a vendor-coupling axis. For enterprises inside the PANW stack this is upside; for those wanting gateway independence it’s a procurement question worth answering.
Four-level hierarchy is the deepest you get natively.

Pricing: Free tier with 10K requests/day. Pro starts at $99/month. Enterprise is custom with SOC 2 Type II and BAA.

Score: 6/7 axes (missing: feedback loop / optimization).

3. Kong AI Gateway: Best if your platform team already runs Kong at scale

Verdict: Kong AI Gateway is the right pick when your platform team has standardized on Kong for REST APIs and the path of least resistance is to extend the same plane with the AI Proxy plugin. Strengths: operational maturity at scale and an existing Kong MSA. Weakness: AI-specific shallowness, since LLM-aware behavior happens through plugins.

What it does for scaled Claude Code:

Multi-region active-active through Kong’s reference architecture, deployed at 100+ RPS-per-region for a decade. Operational maturity is the strongest in this list. Kong has been doing this longer than the AI gateway category has existed.
Zero-downtime upgrades through rolling restart with traffic draining.
RBAC with 3+ level nesting through consumer + workspace + tag taxonomy. More configuration-heavy than Portkey’s native model.
Audit retention through request-logging plugins exported to your SIEM (Splunk, ELK, Datadog) or S3.
Backpressure through rate-limiting plugins and the AI Spend plugin added Q4 2025. Priority queueing requires additional plugin work; expect to write Lua.
IdP federation through JWT and OIDC plugins. Mature for REST; inherits cleanly into AI workloads.
Cost attribution through tags and the AI Spend plugin. Dashboard is API-gateway-shaped, not LLM-cost-shaped; plan two to four weeks to wire a Claude Code-aware view finance will accept.

Where it falls short:

AI-specific observability is plugin-driven, not native.
No optimizer.
AI Spend plugin is newer than the rate-limiting plugin and is still maturing.
Plugin-stacking is operationally heavy. Small platform teams will feel it.

Pricing: Kong OSS is open source. Kong Konnect starts free. Enterprise plans with SLA and AI Proxy support start around $1.5K/month. At 5,000-engineer scale expect a six-figure annual contract.

Score: 5/7 axes (missing: native AI observability, optimizer).

4. Cloudflare AI Gateway: Best for edge-deployed scaled deployment

Verdict: Cloudflare AI Gateway is the pick when your enterprise has already trusted Cloudflare with mission-critical edge traffic and the bar is “global Anycast HA at low fixed cost” rather than “deep AI-native dashboard.” Strongest HA-out-of-the-box story; weakest deep-observability story.

What it does for scaled Claude Code:

Multi-region active-active through Cloudflare’s global Anycast, included by default across hundreds of edge locations. Tradeoff: prompt traffic touches Cloudflare’s infrastructure. For VPC-only requirements, wrong pick.
Zero-downtime upgrades are Cloudflare’s problem. Updates roll across the edge without customer-visible windows.
RBAC through Cloudflare Access plus Worker logic. You write the hierarchy in TypeScript inside a Worker, exactly the model you want, at platform-team cost.
Audit retention through Logpush to R2, S3, or a SIEM. R2 cold-storage economics are competitive.
Backpressure through Cloudflare rate-limiting plus custom Worker logic. Per-developer priority lanes are something your team writes.
IdP federation through Cloudflare Access. SAML, OIDC, and IdP-of-IdPs natively.
Cost attribution through Worker-level logic. Native dashboard tracks request count and basic cost; the per-developer dense view is downstream work with Logpush plus Snowflake.

Where it falls short:

AI-native dashboards are shallow.
No optimizer.
BYOC isn’t the deployment model. The data plane runs on Cloudflare’s infrastructure.
Worker model is JavaScript / TypeScript first.

Pricing: AI Gateway free at low volume. Workers Paid is $5/month plus per-invocation fees. Enterprise rolls AI Gateway into the broader Cloudflare bundle.

Score: 5/7 axes (missing: deep dashboards, optimizer; partial credit on VPC-only).

5. TrueFoundry: Best for one vendor across inference, gateway, and MLOps

Verdict: TrueFoundry is the pick when procurement wants a single vendor for the AI stack at scale: model serving, gateway, workspace, and MLOps under one MSA, deployed in the enterprise VPC. Gateway is competent but not deepest on every axis; the differentiator is the bundle plus VPC-default deployment.

What it does for scaled Claude Code:

Multi-region active-active through TrueFoundry’s reference architecture for VPC deployments across AWS, Azure, or GCP. Multi-region setup is your SRE team’s responsibility; TrueFoundry ships the architecture and automation, operation is yours.
Zero-downtime upgrades through canary and rolling restart inherited from the workspace platform.
RBAC through workspace + project + role hierarchy. Three levels native, deeper through custom metadata. MLOps positioning means the hierarchy is the right shape for model deployments and experiment tracking too, the actual reason enterprises pick TrueFoundry over a dedicated gateway.
Audit retention through the bundled audit log, configurable up to 7 years with S3 export. Single-vendor story means gateway log, model-serving log, and workspace log live in one platform.
Backpressure through queueing and fallback. Same-vendor inference opens a useful pattern: a Claude rate limit can overflow to an in-house model on TrueFoundry.
IdP federation through workspace identity supporting SAML and OIDC. Workable for multi-IdP.
Cost attribution through TrueFoundry’s cost-management module. Three levels native, deeper through metadata.

Where it falls short:

Integration is general-purpose, not Claude Code-aware. Per-session, per-developer dense view requires custom work.
The vendor bundle is a coupling. If you only want the gateway, the bundle is heavier than dedicated alternatives.
No optimizer.
Smaller community footprint than Portkey’s or Kong’s.

Pricing: Free trial. Production tier starts in the low four figures per month. Enterprise pricing is bundled; expect a six-figure annual contract at 5,000-engineer scale.

Score: 5/7 axes (missing: optimizer, dense Claude Code-aware dashboards).

Capability matrix

Axis	Future AGI	Portkey	Kong AI Gateway	Cloudflare AI Gateway	TrueFoundry
Multi-region active-active	BYOC	Hosted multi-region	Self-host	Global Anycast	VPC
Zero-downtime upgrades	Canary + eval gate	Blue-green	Rolling restart	Cloudflare-managed	Canary
RBAC + 3+ level nesting	4-level + delegated	4-tier hierarchy	Consumer + tag	Worker-based	Workspace + project
Audit retention at 1M+/day	Tiered hot/warm/cold	Configurable + S3	SIEM via plugin	Logpush to R2	Bundled audit log
Backpressure	Priority lanes + degrade	Queue + fallback	Plugin work	Worker-based	Queue + fallback
IdP federation (multi-IdP)	Identity broker	SAML SSO	JWT / OIDC plugins	Cloudflare Access	Workspace identity
Cost attribution at depth	4-level native	Workspace + project	Tags + SIEM rollup	Worker + Logpush	Workspace + project
Feedback loop / optimizer	fi.opt closed loop	Dashboard only	Static	Static	Static

Decision framework: Choose X if

Choose Future AGI if you want scaled governance to feed a compounding feedback loop, you can staff BYOC active-active, and trace, eval, optimization, and audit data all need to live in one immutable store. Pick this when Claude Code is the largest LLM line item and the question is “are we getting cheaper every quarter,” more than “are we governed.”

Choose Portkey if you want the most polished hosted gateway with mature RBAC and a four-tier budget hierarchy, and the security review will allow a vendor-hosted control plane. Weigh the Palo Alto Networks acquisition timeline before signing a multi-year contract.

Choose Kong AI Gateway if your platform team already operates Kong for REST at scale and the existing MSA saves a vendor onboarding cycle. Pick this when operational familiarity outweighs the AI-specific shallowness and you can wire the AI Proxy and AI Spend plugins.

Choose Cloudflare AI Gateway if your enterprise already trusts Cloudflare and your SRE team is comfortable with Workers and Logpush. Pick this when the threat model accepts Cloudflare’s data plane and the deep AI dashboard is downstream work you’re happy to do.

Choose TrueFoundry if procurement wants a single vendor for inference, gateway, and MLOps with VPC-default deployment. Pick this when the same team will also stand up internal models alongside Anthropic.

Common mistakes when scaling Claude Code through a gateway

Mistake	What goes wrong	Fix
Running the scaled deployment on a single-region gateway	A regional outage takes Claude Code offline for the entire engineering organization	Move to multi-region active-active before crossing 1,000 engineers
Sharing one team key across subsidiaries without federation	Audit logs cannot reconcile to the right business unit; SOC 2 fails on the attribution chain	Wire federated IdP at the gateway hop and normalize the user.id schema
Treating audit retention as “set and forget”	Storage costs grow unboundedly; legal asks why 6-month-old prompts are still hot-tier	Implement tiered hot/warm/cold retention with a documented policy per repo class
Letting upstream 429s propagate simultaneously	500 engineers see the same failure at 9:14 AM Monday	Queue with priority lanes and graceful-degrade to a cheaper model on the next call
Configuring RBAC with two levels (admin and user)	FinOps cannot see her business unit’s rollup without seeing everyone else’s	Use a four-level (or deeper) hierarchy with delegated administration
Treating gateway upgrade as a Saturday-night maintenance window	The window breaks long-running sessions; engineering loses a workday	Canary upgrades with eval-score gates; verify session preservation in pre-prod
Picking on the pilot benchmark, not the scaled benchmark	The 50-engineer test does not exercise priority-lane, RBAC-nesting, or audit-retention paths	Run a 1,000-engineer canary for two weeks against production traffic shape before the multi-year contract

How Future AGI closes the loop on scaled Claude Code

The other four gateways treat scaled governance as a terminal state: capture, attribute, gate, log. The dashboard is the artifact. Model spend stays flat as seats grow.

Future AGI treats the trace as the input to a six-stage feedback loop. The same data feeds a learning system that gets cheaper every week instead of staying flat.

Trace. Every Claude Code turn produces a span tree via traceAI (Apache 2.0). Spans capture SSO claim, repo, cost center, prompt, completion, tool calls, model, latency, cost, and policy decision. The trace store is immutable and tiered.
Evaluate. ai-evaluation (Apache 2.0) scores every turn. FAGI ships a 60+ EvalTemplate classes in the ai-evaluation SDK with self-improving evaluators on the Future AGI Platform (task-completion, faithfulness, code-correctness, policy-compliance, tool-use, structured-output, hallucination, agentic surfaces, instruction-following, groundedness), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (lower per-eval cost than Galileo Luna-2). Catalog is the floor, not the ceiling.
Cluster. Low-scoring sessions get clustered. The common pattern at scale: claude-opus-4-7 called when claude-haiku-4-5 would have produced the same accepted completion.
Optimize. fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) rewrites the system prompt or adjusts routing against the clusters.
Route. The gateway applies the updated routing on the next request.
Re-deploy. New prompts and routes are versioned. Automatic rollback on a 24-hour score regression.

Net effect: a 4,000-engineer deployment we observed in Q1 2026 trended down 24% in model spend over eight weeks without changing developer behavior. The same loop produces the audit-grade artifact SOC 2 requires: every policy change and prompt rewrite is versioned in the same trace store as the chargeback data.

Apache 2.0 building blocks: traceAI, ai-evaluation, agent-opt (github.com/future-agi). Hosted Agent Command Center adds the failure-cluster view, Protect guardrails with ~65 ms text latency per arXiv 2510.13351, 4-level RBAC with delegated administration, SOC 2 Type II certified, AWS Marketplace listing, and BYOC deployment.

What we did not include

Helicone. Acquired by Mintlify March 3, 2026 with a documentation-platform-first roadmap shift. Treat as a planned migration window, not a continued procurement for a multi-year scaled regulated workload.
LiteLLM. Strong Python-native proxy, but the March 24, 2026 PyPI supply-chain compromise (versions 1.82.7 and 1.82.8, exfiltrating SSH keys and cloud credentials per Datadog Security Labs) raises the operational bar for a regulated scaled deployment.
OpenRouter. Excellent for routing experimentation, but the enterprise-chargeback, deep-RBAC, and federated-identity shape is consumer-facing.

Sources

Anthropic Claude Code documentation, claude.ai/docs/claude-code
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)
Portkey AI gateway, portkey.ai
Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026
Kong AI Gateway and AI Proxy plugin, konghq.com/products/kong-ai-gateway
Cloudflare AI Gateway, developers.cloudflare.com/ai-gateway
TrueFoundry AI Gateway, truefoundry.com/ai-gateway
Datadog Security Labs LiteLLM PyPI supply-chain writeup (March 24, 2026), securitylabs.datadoghq.com

Frequently asked questions

When does the gateway stop being a single-region service?

At roughly 1,000 engineers the math flips. Below that, single-region with a documented failover runbook is acceptable. Above that, the gateway is a tier-zero platform service and the failover budget is tens of seconds, not minutes.

How do we keep Claude Code responsive when Anthropic rate-limits us at peak?

Three patterns. Priority lanes (interactive autocomplete jumps the queue). Graceful degradation to a cheaper model. Predictive backpressure: watch the rolling 60-second rate against the upstream cap and shed lower-priority traffic before the 429 returns. Future AGI ships all three; Portkey and TrueFoundry ship the first two; Kong and Cloudflare require platform-team work.

How do we federate identity across Okta, Azure AD, and Workday?

Future AGI's identity broker, Cloudflare Access (native multi-IdP), or Kong's JWT and OIDC plugin chain. Portkey and TrueFoundry support multi-IdP via SAML but require custom claim transformations. Budget two to four weeks for a three-IdP federation.

How does Future AGI's loop differ from Portkey's dashboard at scale?

Portkey's dashboard tells a human what is happening. Future AGI's loop tells the gateway what to do next. At 5,000-engineer scale the loop typically produces a 15-30% downward trend in model spend over six to eight weeks.

Is BYOC active-active worth the operational cost over hosted multi-region?

Below 1,000 engineers, usually not. Between 1,000 and 5,000, depends on the security committee (BYOC if VPC-only is required) and SRE capacity. Above 5,000, BYOC active-active is almost always the right call: audit retention scale, cross-region storage cost, and deep RBAC requirements all pencil in favor of running the control plane in your own account.

View all

Guides

Best 5 AI Gateways to Cache Claude Code Calls in 2026

Five AI gateways scored on caching Claude Code calls in 2026: cross-developer cache scope, semantic-match thresholds, hit-rate, TTL, what each misses.

Rishav Hada · May 16, 2026

17 min

Guides

Top 5 Tools for Claude Code Cost Management in 2026

Five tools for Claude Code cost management in 2026: four gateways, the native Anthropic dashboard, and a FinOps platform, scored on chargeback, caps.

NVJK Kartik · May 14, 2026

18 min

Guides

Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026

Five AI gateways scored on Claude Code token monitoring in 2026: per-dev attribution, per-repo budgets, session traces, alerts, where each falls short.

Rishav Hada · May 8, 2026

17 min

TL;DR

Why scaled Claude Code is a different problem

The 7 axes we score on

How we picked

1. Future AGI Agent Command Center: Best for BYOC active-active scale with deep cost-center nesting

2. Portkey: Best for hosted scaled deployment with deep RBAC

3. Kong AI Gateway: Best if your platform team already runs Kong at scale

4. Cloudflare AI Gateway: Best for edge-deployed scaled deployment

5. TrueFoundry: Best for one vendor across inference, gateway, and MLOps

Capability matrix

Decision framework: Choose X if

Common mistakes when scaling Claude Code through a gateway

How Future AGI closes the loop on scaled Claude Code

What we did not include

Related reading

Sources

Frequently asked questions