Enterprise LLM Gateway for Claude Code in 2026: A Buyer's Roadmap
A staged 6-12 month roadmap for selecting and rolling out an enterprise LLM gateway for Claude Code: five picks scored on pilot fit, expansion readiness, standardization, procurement, vendor stability, exit, and TCO.
Table of Contents
The mistake VP Engs make with Claude Code in 2026 is treating the LLM gateway decision as a one-time vendor selection. It isn’t. It’s a three-stage rollout that runs six to twelve months across a 5,000-engineer org, and the vendor that wins your pilot in month two isn’t necessarily the vendor that survives your standardization review in month nine. Stage 1 cares about how fast a single team can wire ANTHROPIC_BASE_URL and see per-developer spend. Stage 2 cares about whether five teams in three regions can co-exist on the same control plane. Stage 3 cares about whether the gateway plugs into your identity provider, your SIEM, your records-retention policy, and your AWS Enterprise Discount Program, and whether the SOC 2 Type II report is dated within twelve months of audit.
This is the buyer’s roadmap. Five 2026 LLM gateways scored on seven axes for a VP Eng or CTO planning a 6-12 month rollout, then the rollout itself: Stage 1 Pilot (0-3 months), Stage 2 Expansion (3-9 months), Stage 3 Standardization (9-12+ months). Picks: Future AGI Agent Command Center, Portkey, Kong AI Gateway, LiteLLM, Maxim Bifrost. Sibling posts use “AI gateway”; this post uses “LLM gateway” because that’s what 2026 procurement committees write in the SOW once language gets specific. Same product.
TL;DR: pick by stage
| Stage | What it optimizes for | Pick |
|---|---|---|
| Stage 1: Pilot (0-3 months) | One team, fast time-to-first-chargeback-table | Future AGI Agent Command Center for the loop; Maxim Bifrost if speed is gating |
| Stage 2: Expansion (3-9 months) | Five to fifteen teams, multi-region, per-BU RBAC | Future AGI or Portkey; Kong if Kong is already the REST gateway |
| Stage 3: Standardization (9-12+ months) | Org-wide rollout, IdP federation, SIEM, EDP burn-down | Future AGI (BYOC + Apache 2.0 data layer) or Portkey (mature attested catalog) |
Five-line read. Future AGI survives all three stages. Apache 2.0 data layer, BYOC the same in pilot and production, self-improving loop bends the cost curve down at scale. Portkey is the polished hosted alternative if the committee accepts the April 30, 2026 PANW acquisition variable. Kong is the answer when Kong already sits inside your authorization boundary. LiteLLM is VPC-only with the March 24, 2026 PyPI supply-chain incident as the procurement variable. Maxim Bifrost is the fastest single-binary install.
Why Claude Code is a roadmap, not a purchase
A 5,000-engineer enterprise adopting Claude Code in early 2026 sees three phases, and the wrong gateway choice in any phase blocks the next.
Phase one: one team. A platform team of 30 engineers turns on Claude Code, plugs it through a gateway, has a chargeback dashboard in six weeks. The bill is around $40,000/month. Procurement and finance are satisfied. CTO checks the box.
Phase two: five teams in two regions. Backend, frontend, mobile, ML, and a data team turn on Claude Code. Some in the US, some in Bangalore, one in London. The gateway now needs per-BU RBAC because the platform team can’t be in the operational path of every team’s daily key issuance, regional data residency because the London team can’t send prompts to a US control plane without a DPA conversation, and IdP integration so the SSO claim drives spend visibility. This is where Stage 1 picks fall over.
Phase three: the whole org. Twelve to eighteen months in, Claude Code is the default tool for 5,000 engineers. CFO is in because the line item is $5M/year. CISO is in because some repos carry SOX-scope MNPI. AWS account team needs the contract to flow through the existing $20M EDP. Legal needs Type II within twelve months, DPA aligned to current EU SCCs, BAA on request, and assignment-and-novation language. Wrong vendor here produces a migration project, six months of dual-running while the old gateway ramps down.
A roadmap-shaped decision in month two prevents a migration project in month fourteen.
The 7 axes: what to score for a 6-12 month rollout
| # | Axis | What it measures | Stage it gates |
|---|---|---|---|
| 1 | Pilot-friendly onboarding | Time to first chargeback table for one team | Stage 1 |
| 2 | Expansion-ready (multi-team) | Per-BU RBAC depth, regional data plane, workspace isolation | Stage 2 |
| 3 | Standardization-grade (RBAC + IdP) | 4+ level RBAC, SAML SSO, SCIM, delegated administration, audit log retention | Stage 3 |
| 4 | Procurement story | SOC 2 Type II, ISO 27001, DPA, BAA, FedRAMP, AWS Marketplace | Stage 3 |
| 5 | Vendor financial stability | Funding, runway, acquisition variables, change-in-control | Stage 3 (score at Stage 1) |
| 6 | Migration-out story | Trace, eval, RBAC, policy export; data format and source code portability | Stage 3 |
| 7 | TCO over 36 months | License, storage, network, platform-team time, token-spend impact | All stages |
Axis 5 has to be scored at Stage 1, not Stage 3, the dominant 2026 mistake is picking a vendor whose attestation is great today but whose acquisition closes mid-standardization. Axis 6 is the one nobody scores honestly; at Stage 3 it’s the only thing that matters if the vendor disappears.
How we filtered the cohort
LLM gateways with an Anthropic-compatible endpoint as of May 2026 and at least one publicly referenced 500+ Claude Code seat deployment. We removed gateways without per-developer attribution (Stage 1 fails immediately) and those with no roadmap to 4+ level RBAC (Stage 2 fails). Helicone is out of this roadmap because the March 2026 Mintlify acquisition (Mintlify itself acquired by Stripe in late 2025) makes the Stage 3 standardization review unpleasant.
1. Future AGI Agent Command Center: Survives all three stages
Verdict. The only entry whose data-collection layer is Apache 2.0 (traceAI, ai-evaluation, agent-opt), whose BYOC deployment is the same in pilot and production, and whose self-improving loop bends the cost curve down at standardization scale. Future AGI ships SOC 2 Type II + HIPAA + GDPR + CCPA certified per futureagi.com/trust; ISO/IEC 27001 is in active audit.
Stage fit. Stage 1: one env var plus a traceAI install; first chargeback table in under a week; free tier 100K traces/month, Scale at $99/month. Stage 2: native four-level RBAC (org > business-unit > sub-business-unit > cost-center), delegated administration, hosted plane in US-East, US-West, EU-West with BYOC in Singapore for APAC, workspace-scoped budget caps. Stage 3: SAML SSO across Okta, Azure AD, Google Workspace, Auth0; SCIM for Okta and Azure AD; tiered audit retention (hot 30 days, warm one year Parquet, cold seven years Glacier) configurable per-repo class; BYOC runs both planes in the customer’s account with air-gapped enclaves on the same stack; AWS Marketplace listing routes the contract through the existing EDP.
Vendor stability + migration-out. ~$1.9M raised (Powerhouse, Snow Leopard, Arka, Wellfound Quant Fund); earlier-stage than Portkey or Kong. Structural mitigations: Apache 2.0 license means the customer can self-host indefinitely; AWS Marketplace converts the vendor contract into an AWS contract. Migration-out is strongest in the cohort, trace store is a customer-controlled Parquet warehouse on customer S3, eval state is a customer dataset, optimizer state versioned in customer Git, Apache 2.0 libraries work without the hosted plane.
The loop. Every turn traced via traceAI; scored by fi.evals on faithfulness, code-correctness, tool-use accuracy; low-scoring sessions clustered; fi.opt.optimizers (ProTeGi, Bayesian, GEPA) rewrite the system prompt or routing policy; gateway applies the updated route, versioned with automatic rollback. Typical Claude Code optimization: turns under 10K input tokens to claude-haiku-4-5, the rest to claude-opus-4-7. A team starting at $40,000/month typically sees costs trend down 15-30 percent within four weeks. Protect ships at ~67ms text latency per arXiv 2510.13351.
Where it falls short.
- ISO/IEC 27001 in active audit; SOC 2 Type II, HIPAA, GDPR, and CCPA are certified today.
- ISO 27001 not on the certificate list; Apache 2.0 plus inherited cloud controls are the mitigation under BYOC.
- BYOC active-active across regions takes two to three weeks of SRE time at cutover.
- Optimization layer is heavier than what a one-week pilot needs.
- Non-US/EU residency runs through BYOC (the same Apache 2.0 binary in the customer’s account, anywhere your cluster runs).
TCO. For a 1,000-engineer org at $1,500/engineer/month ($18M/year), gateway cost lands $200K-$400K/year. The case isn’t the license, it’s the 15-30 percent token-spend reduction the loop produces, $2.7M-$5.4M/year at this scale.
Score: 7/7 with partial credit on attestation timing.
Choose Future AGI when Claude Code is becoming a material line item (any volume, agent-opt compounds value as production traffic flows), the committee values BYOC plus source-available code-collection, and the team wants the cost curve to bend over the 12-month horizon.
2. Portkey: The polished hosted alternative
Verdict. The most polished hosted-only LLM gateway in this cohort, deepest pre-built compliance catalog. Type II attested under NDA, ISO 27001 on the list, mature DPA. Dominant 2026 variable: the April 30, 2026 Palo Alto Networks acquisition (close expected PANW fiscal Q4). Inside PANW, upside; outside it, multi-year contracts at Stage 3 need assignment-and-novation with a termination-without-penalty trigger.
Stage fit. Stage 1: hosted-only, virtual key per developer, polished dashboard in hours, prompt-library UI most mature in cohort, free tier 10K req/day. Stage 2: native four-tier RBAC (org > workspace > project > virtual-key), delegated administration via SAML role claims, hosted multi-region across US-East, US-West, EU-West, Singapore pinned per workspace, virtual-key fan-out preserves bulk Anthropic pricing.
Vendor stability + migration-out. Series A, funding above $10M; post-close PANW backing puts the customer behind a $100B+ market-cap parent, upside on stability, vendor-coupling is the consideration. Migration-out: data export through S3, Snowflake, Splunk; the eval and routing-policy layer is Portkey-shaped, so migration means redoing policy.
Where it falls short.
- PANW acquisition is a procurement variable; add assignment-and-novation language.
- Four-tier RBAC is the deepest native; 5+ level org charts flatten one level into metadata.
- Air-gap is custom, not default.
- No self-improving loop; cost curve stays flat unless the team optimizes manually.
TCO. Free tier 10K req/day; Pro $99/month; Enterprise custom. At 1,000-engineer scale expect a six-figure annual contract; negotiate the storage tier line item explicitly.
Score: 6.5/7 with partial credit on Axis 5 and Axis 6.
Choose Portkey when the priority is a hosted, attested-today catalog and the committee can handle the PANW variable contractually.
3. Kong AI Gateway: Right when Kong is already the REST gateway
Verdict. The right pick when Kong already sits inside your authorization boundary as the REST gateway.Weakness: AI Proxy plugin is newer than rate-limiting; AI-native observability is plugin-driven, the chargeback dashboard finance accepts takes two to four weeks of platform-team time.
Stage fit. Stage 1: slowest to first chargeback table. AI Proxy plugin installs in hours, but the dashboard is a Grafana view on the OTel sink (plan two weeks). Stage 2: consumer-and-workspace-shaped RBAC with tag-based scoping; three-plus levels configurable but heavier than Portkey’s native four-tier; region pinning is the customer’s choice; plugin stacking gives expressiveness and operational responsibility.
Vendor stability + migration-out. Series E, funding above $200M, strongest financial profile in this cohort. Migration is redirecting ANTHROPIC_BASE_URL and rewiring the OTel sink.
Where it falls short.
- AI-native observability is plugin-driven; default dashboard is REST-shaped. Chargeback takes two to four weeks of platform-team time.
- AI Spend plugin is newer than rate-limiting and still maturing.
- Plugin stacking is operationally heavy; small platform teams feel it.
- No self-improving loop.
- Standing up Kong only for Claude Code is a heavier lift than alternatives.
TCO. Kong OSS free. Konnect starts free. Enterprise with SLA + AI Proxy support starts around $1.5K/month; at 5,000-engineer scale expect a six-figure annual contract plus ongoing plugin maintenance.
Score: 6/7 with partial credit on Axis 1 and Axis 7.
4. LiteLLM: VPC-only, with the supply-chain caveat
Verdict. The pick when Claude Code traffic can’t leave the VPC and the security team wants to read every line of code that touches a prompt. Source-available under MIT, Python-native, runs as a proxy inside the customer’s infrastructure. Dominant 2026 procurement variable: the March 24, 2026 PyPI supply-chain compromise, versions 1.82.7 and 1.82.8 exfiltrated SSH keys and cloud credentials per Datadog Security Labs. The vendor shipped a clean post-incident response; most Fortune 500 committees will want the audit before signing.
Stage fit. Stage 1: source-available proxy installs quickly, a 30-engineer team on LiteLLM in a week. UI is functional not polished, chargeback by repo or developer requires exporting to a SQL warehouse. Stage 2: team and user scoping native, deeper hierarchies via virtual-key tagging, SAML SSO in Enterprise, metadata-driven attribution requires platform-team owned conventions across BUs. Stage 3: strongest self-host story by design, runs on customer nodes, no telemetry leaves the VPC; no first-party SOC 2 on OSS, Enterprise tier carries attestation and BAA, audit retention is the customer’s responsibility. Supported pattern when the customer wants the loop and VPC-only: LiteLLM in front of Anthropic with Future AGI’s traceAI Apache 2.0 sink behind it, both inside the customer’s VPC.
Vendor stability + migration-out. YC-backed with Enterprise as the commercial entity. Smaller than Portkey or Kong; source-available license is the structural mitigation. Migration is straightforward, proxy is the customer’s deployment, config is YAML, trace data lives in whatever sink the customer wires.
Where it falls short.
- March 24, 2026 PyPI compromise is the dominant procurement variable. Insist on post-incident audit, package-signing chain, pinned-version policy, SBOM.
- No native polished dashboard. Plan a SQL or analytics-warehouse sink for chargeback.
- No self-improving loop; pair with
traceAIplusfi.optif the loop matters. - Observability story is thinner than the hosted alternatives.
- Smaller community footprint than Kong’s; ecosystem is Python-centric.
TCO. OSS under MIT. Enterprise starts around $250/month for small teams. At 5,000-engineer scale expect a six-figure annual contract plus platform-team overhead.
Score: 5.5/7 with partial credit on Axis 1, Axis 5, and Axis 4.
Choose LiteLLM when VPC-only is gating, the security team is satisfied with the post-March-24 audit, and you can pair LiteLLM with traceAI plus a SQL sink.
5. Maxim Bifrost: Fastest pilot install in the cohort
Verdict. Maxim AI’s Go-native LLM gateway, designed as a single-binary drop-in proxy. Pitch: single Go binary, low memory, sub-millisecond proxy overhead, deepest pilot-onboarding ergonomics in this cohort. Trade-off: shallower compliance and RBAC catalog as the rollout enters Stages 2 and 3. Honest read: Bifrost is the right Stage 1 pick when speed is gating; many enterprises will replace it or pair it with deeper alternatives at Stage 2.
Stage fit. Stage 1: fastest install, single Go binary, near-zero dependencies, container in minutes; pilot team on Bifrost in hours; per-developer attribution via headers; chargeback dashboard is Maxim’s hosted plane or a Grafana view on the OTel sink. Stage 2: multi-team support is workspace-shaped rather than hierarchical; RBAC is functional for two to three BUs but lacks the four-level depth Future AGI or Portkey ship; region pinning works because the binary deploys anywhere. Stage 3: cohort divergence sharpest here.
Vendor stability + migration-out. Series A funding. Multi-year contracts at Stage 3 should include standard change-in-control language. Migration is moderate: the binary is source-available or open depending on tier, customer can run it independently, trace data in whatever OTel sink the customer wires.
Where it falls short.
- RBAC depth is workspace-shaped, not native four-level. Stage 2 expansion to 5+ BUs needs careful tag conventions.
- SCIM is on the roadmap, not shipped as of May 2026.
- No self-improving loop; pair with
traceAIplusfi.optif the loop matters. Wrong pick for federal procurement. - Stage 3 procurement story is younger than Portkey’s or Kong’s attested catalogs.
TCO. OSS for the binary. Hosted observability starts around $99/month for small teams; Enterprise custom. At 5,000-engineer scale expect a six-figure annual contract.
Score: 5.5/7 with partial credit on Axis 2, Axis 3, and Axis 4.
Choose Maxim Bifrost when Stage 1 speed is gating and the rollout plan treats Stage 2 as a re-evaluation point against deeper alternatives.
The 3-stage rollout roadmap
Stage 1: Pilot (0-3 months)
Objective. Prove an LLM gateway produces a chargeback table finance accepts without breaking the developer experience.
Scope. One team, 25-50 engineers, one or two repos. Pick a team already heavy on Claude Code, engaging a team that uses it two hours a week wastes the pilot.
Acceptance gates. Per-developer chargeback dashboard with high/median/low spenders visible; per-session cost histogram; tool-call and SSE streaming verified on claude-opus-4-7 and claude-sonnet-4-6; zero developer-experience regression; one soft alert and one hard cap tested without disrupting daily flow.
Recommended picks. Future AGI for the loop, Maxim Bifrost if speed is gating, Portkey if prompt-library UI matters. Avoid Kong unless Kong is already the REST gateway.
Decision at end of Stage 1. “Do we expand on this vendor or re-evaluate?” Two questions: did the vendor’s RBAC depth survive light scrutiny, and did the BYOC or self-host story match the most-restrictive subsidiary? If either is no, re-evaluate before Stage 2.
Stage 2: Expansion (3-9 months)
Objective. Scale to five-to-fifteen teams across two-to-three regions, with per-BU RBAC, per-region data residency, and workspace isolation.
Scope. Five to fifteen teams, 500-2,000 engineers, multi-region. The platform team is no longer in the operational path of every team’s key issuance. BU leads are.
Acceptance gates. Native four-level RBAC end to end with delegated administration; multi-region data plane operational (US workspaces in US, EU in EU under DPA review); per-BU budget caps scoped to the BU’s workspace; audit retention per-repo class with SOX-scope inheriting the seven-year tier; SAML SSO across the IdP estate with SCIM operational; the first 1,000-engineer monthly token bill ($1M+ for a typical mid-large enterprise) fully attributable to developer, session, repo, BU.
Recommended picks. Future AGI for native four-level RBAC plus BYOC; Portkey for native four-tier RBAC plus hosted multi-region; Kong if Kong is the REST gateway. LiteLLM workable for VPC-only with the post-March-24 audit. Maxim Bifrost feels depth limits if RBAC needs more than three levels.
Decision. “Commit for Stage 3 or run a parallel evaluation before multi-year?” Two questions: did procurement (Type II, ISO, DPA, BAA, FedRAMP) hold up under deep review, and did migration-out satisfy legal’s tail-risk review?
Stage 3: Standardization (9-12+ months)
Objective. Standardize the gateway as the org-wide proxy for Claude Code, with IdP federation, SIEM integration, AWS Marketplace contract path, records-retention alignment, and change-in-control language that survives the next 2027 acquisition wave.
Scope. Org-wide. 3,000-10,000 engineers, all product lines, all subsidiaries. CFO is in because the line item is $5M-$20M/year. CISO is in because some repos are SOX-scope and one subsidiary is HIPAA-covered. AWS account team is in because the contract must flow through the EDP. Legal needs 2026-grade MSA, DPA, BAA clauses.
Acceptance gates. SOC 2 Type II within twelve months of audit; ISO 27001 on file; DPA aligned to current EU SCCs with documented sub-processor list and right-to-object language; BAA on request. SAML SSO across the full IdP estate; SCIM operational; delegated administration with BU-lead and cost-center-lead roles. Audit retention aligned to records-retention schedule (SOX seven years, HIPAA six years, default three years). SIEM operational. AWS Marketplace draws down the EDP. Change-in-control with termination-without-penalty trigger. Migration-out plan documented.
Recommended picks. Future AGI is the strongest answer (BYOC plus Apache 2.0 makes migration-out structural; AWS Marketplace; loop produces the 15-30 percent token-spend reduction). Portkey is the strongest hosted answer with PANW handled contractually.
Decision. Multi-year contract signed. Rollout in waves over six months. Year-2 and year-3 budgets locked in.
TCO across the 3 stages
A 5,000-engineer enterprise at $1,500/engineer/month in Anthropic token spend ($90M/year).
| Stage | Engineers | Anthropic spend | Gateway license + storage | Platform-team time | Token-spend impact |
|---|---|---|---|---|---|
| Stage 1 Pilot | 50 | $90K/year | $5K-$15K (or free tier) | 1-2 FTE-weeks | None expected; baseline data |
| Stage 2 Expansion | 1,500 | $27M/year | $50K-$150K | 2-4 FTE-months | 0-5% from manual tuning |
| Stage 3 Standardization | 5,000 | $90M/year | $200K-$500K | 1-2 FTE ongoing | 15-30% with self-improving loop; flat without |
The gateway license is rarely the dominant TCO line. At Stage 3, a 20 percent token-spend reduction at $90M/year is $18M/year, two orders of magnitude larger than the license. This is why the loop matters at standardization scale and why the Stage 1 vendor pick should anticipate the Stage 3 loop question.
Capability matrix across the 7 axes
| Axis | Future AGI | Portkey | Kong | LiteLLM | Maxim Bifrost |
|---|---|---|---|---|---|
| Pilot onboarding | Fast | Fast | Slow (plugin wiring) | Medium | Fastest (Go binary) |
| Expansion-ready | 4-level RBAC + BYOC | 4-tier RBAC + hosted multi-region | Consumer + tag + self-hosted | Team + user + virtual key | Workspace-shaped; 5+ BU gaps |
| Standardization-grade | Native 4-level + SAML + SCIM | Native 4-tier + SAML + SCIM | SAML + SCIM (Konnect) | SAML in Enterprise | SAML; SCIM on roadmap |
| Procurement | Type II + HIPAA + GDPR + CCPA + BAA + AWS MP | Type II + ISO + BAA | Type II + ISO + BAA + FedRAMP-aligned | Enterprise attestation; OSS in customer audit scope | Type II on path |
| Vendor stability | ~$1.9M; Apache 2.0 mitigation | Series A + PANW | Series E | YC; post-March-24 audit | Series A |
| Migration-out | Strongest (Apache 2.0 + customer Parquet) | Moderate (policy is Portkey-shaped) | Strong (self-hosted) | Strong (MIT + customer YAML) | Moderate |
| TCO 36 mo | $200K-$500K + 15-30% token reduction | Pro $99 + flat token impact | $1.5K/mo Enterprise + platform time | OSS free + Enterprise from $250 + platform time | OSS free + Enterprise; flat token impact |
Decision framework: Choose X if
Future AGI if you want a vendor that survives all three stages on structural grounds. BYOC plus Apache 2.0 data layer plus self-improving loop plus SOC 2 Type II + HIPAA + GDPR + CCPA certified attestations plus AWS Marketplace. Best when Claude Code is becoming $1M-$20M/year.
Portkey if you want the polished hosted gateway with the mature attested catalog and can handle the PANW acquisition variable contractually.
LiteLLM if VPC-only is gating, the security team is satisfied with the post-March-24 audit, and you can pair LiteLLM with traceAI plus a SQL sink.
Maxim Bifrost if Stage 1 speed is gating and the rollout plan treats Stage 2 as a re-evaluation point against deeper alternatives.
Common rollout mistakes
| Mistake | Fix |
|---|---|
| Treating the gateway as a one-time vendor selection | Score on Stage 3 axes from Stage 1; the roadmap is the decision |
| Skipping the migration-out review at Stage 1 | Demand export format, source-code portability, change-in-control at Stage 1 |
| Pointing only the IDE plugin at the gateway | Set ANTHROPIC_BASE_URL in pilot teams’ shell profiles |
Tagging only user_id | Tag user, session, repo, and BU from day one |
| Hard budget cap at soft-alert threshold | Soft-alert at 80 percent, hard-pause at 110 percent |
| Picking on dashboard polish at Stage 1 | Score all 7 axes at Stage 1 |
| Accepting default audit retention | Map repos to records-retention; tiered storage |
| Multi-year without change-in-control | Termination-without-penalty trigger if post-close DPA degrades |
| Not engaging the AWS account team at Stage 1 | Engage AWS at Stage 1 even if Stage 1 spend is below the AWS threshold |
| Regional data residency as a feature flag | Verify regional support at Stage 1 even if Stage 1 is single-region |
How Future AGI closes the loop on the roadmap
The other four gateways are static policy enforcement points, policy is configured by humans, the dashboard tells humans what is happening, the audit log records human-driven changes. The cost curve at Stage 3 stays flat. Future AGI treats the captured trace as input to a closed loop: every turn traced via traceAI (Apache 2.0); scored by fi.evals; low-scoring sessions clustered; fi.opt.optimizers rewrite system prompt or routing policy; gateway applies the updated policy on the next request, versioned with automatic rollback.
Net effect at Stage 3 scale: a 5,000-engineer org starting at $90M/year typically sees costs trend down 15-30 percent within six months of standardization, $13.5M-$27M/year. Protect ships at ~67ms text latency per arXiv 2510.13351. Structural mitigations matter as much as the loop: Apache 2.0 building blocks (traceAI, ai-evaluation, agent-opt at github.com/future-agi) plus AWS Marketplace plus BYOC mean migration-out at Stage 3 is a non-event because the data layer is the customer’s. The worst-case audit question (“what if the vendor disappears?”) has a structural answer rather than a contractual one.
What we did not include
OpenRouter, consumer-facing routing. Cloudflare AI Gateway, strong for existing Cloudflare customers but doesn’t match BYOC-first or VPC-first constraints at Stage 3. TrueFoundry, right when consolidating inference plus gateway plus MLOps under one MSA, covered in the sibling enterprise post. Helicone, belongs in the lighter-stakes pilot conversation; the Mintlify → Stripe parentage makes Stage 3 standardization unpleasant.
Related reading
- Choosing an AI Gateway for Claude Code in 2026: A Complete Buyer’s Guide, the vendor-scoring lens on the same cohort
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026, the technical-monitoring lens
- Best Claude Code Gateway for Enterprises in 2026, the procurement-and-compliance lens
- What Is an AI Gateway? The 2026 Definition
- Best LLM Gateways in 2026
Sources
- Anthropic Claude Code documentation, claude.ai/docs/claude-code
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Portkey AI gateway, portkey.ai
- Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026
- Kong AI Gateway and AI Proxy plugin, konghq.com/products/kong-ai-gateway
- LiteLLM proxy, github.com/BerriAI/litellm
- Datadog Security Labs LiteLLM PyPI supply-chain writeup (March 24, 2026), securitylabs.datadoghq.com
- Maxim AI Bifrost, getmaxim.ai/bifrost
Frequently asked questions
Which vendor has SOC 2 Type II attested today?
Can one vendor cover all three stages?
How much does a Stage 3 rollout actually cost?
What should legal prioritize in the MSA?
What if one subsidiary is HIPAA-covered and the rest are not?
How is this post different from the sibling buyer's-guide?
Five AI gateways scored on caching Claude Code calls in 2026: cross-developer cache scope, semantic-match thresholds, hit-rate observability, TTL controls, and what each one misses.
Five tools for Claude Code cost management in 2026 — four gateways plus the native Anthropic dashboard and a FinOps platform — scored on attribution, chargeback, caps, routing, cache observability, FinOps integration, and audit trail.
Five AI gateways scored on Claude Code token monitoring in 2026: per-developer attribution, per-repo budgets, session traces, alert routing, and what each gateway misses.