Guides

Choosing an AI Gateway for Claude Code in 2026: A Complete Buyer's Guide

The complete 2026 buyer's guide to choosing an AI gateway for Claude Code: five picks scored against eight buying criteria, a 12-week procurement timeline, a 20-question RFP template, and a 30-day pilot recipe.

·
25 min read
ai-gateway 2026 claude-code
Editorial cover image for Choosing an AI Gateway for Claude Code in 2026: A Complete Buyer's Guide
Table of Contents

The hardest part of buying an AI gateway for Claude Code isn’t picking the vendor. It’s convincing seven internal stakeholders that the vendor you picked is the right one, developer leads who care about tool-call passthrough, FinOps who cares about per-developer chargeback, security who cares about source-code egress, procurement who cares about SOC 2 and the DPA, the AWS account team who cares whether the contract draws down EDP, the legal team who cares about assignment-and-novation, and the CFO who cares whether the cost curve bends. Every stakeholder has a different question. Every question routes to a different feature in a different vendor.

This is the complete buyer’s guide for Choosing an AI Gateway for Claude Code. It scores the five 2026 picks against the eight criteria every internal stakeholder asks about, lays out the twelve-week procurement timeline, gives you an RFP template to send vendors today, and ends with the thirty-day pilot recipe that engineering can run while procurement is in legal review. The picks: Future AGI Agent Command Center at #1, then Portkey, Kong AI Gateway, LiteLLM, and Helicone.


TL;DR: the picks at a glance

RankGatewayBest forHeadline trade-off
1Future AGI Agent Command CenterClosing the loop from trace to optimizer to route while keeping data in your VPCSOC 2 Type II + HIPAA + GDPR + CCPA certified; ISO/IEC 27001 in active audit
2PortkeyHosted gateway with the deepest pre-built compliance catalog and a polished prompt libraryThe April 30, 2026 Palo Alto Networks acquisition announcement adds a vendor-coupling variable to multi-year contracts
3Kong AI GatewayEnterprises that already run Kong for REST APIs and want FedRAMP-aligned reference architectureAI-native observability is plugin-driven; the chargeback dashboard is two to four weeks of platform-team work
4LiteLLMSource-available, Python-native self-host when Claude Code traffic cannot leave the VPCThe March 24, 2026 PyPI supply-chain compromise (versions 1.82.7 and 1.82.8) raised the operational bar; expect a clean-post-incident audit before procurement signs
5HeliconeLightweight per-request observability with minimal setupAcquired by Mintlify in March 2026, which was acquired by Stripe in late 2025 — the gateway roadmap inherits a documentation-platform-first parent

Why Claude Code is its own buying problem

Claude Code doesn’t look like a generic LLM workload, and a gateway that scored well on a generic rubric can fail on five Claude-Code-specific dimensions.

Sessions are long and stateful. A single Claude Code bug-fix conversation can produce thirty to fifty turns, each shipping 30K to 200K input tokens because the CLI packs project files into the prompt. The economic unit is the session, not the call.

Cost is concentrated. Across twenty-two engineering teams we observed in Q1 2026, the top 10% of Claude Code users accounted for 51% of token spend. The right unit is per-developer with per-session granularity, then rolled up to repo and team.

Traffic is bursty. US engineering orgs see 3-5x peak-to-trough ratios within a day. The cost-per-engineer number, which we have seen range from $300/month for a junior engineer to over $4,200/month for a staff engineer running parallel sessions across three repos, only makes sense when the spikes are preserved.

Tool calls and streaming have to survive the gateway hop. Claude Code uses tool calls heavily; early-2025 proxies broke tool-use by re-serialising content blocks. Verification has to be part of the buying process, not assumed.

Source code is regulated. Some repos carry SOX-scope MNPI, HIPAA-scope PHI in test fixtures, or CUI. Audit retention, BAA availability, and data residency are gating, not optional.


The 8 buying criteria: what each stakeholder asks

Most buyer’s guides cluster around five axes. We score on eight because the procurement, security, FinOps, and platform-team conversations all have to converge on the same vendor, and each one cares about a different axis.

#AxisWhose question this answers
1Per-session, per-developer, per-repo cost attributionFinOps; CFO
2Tool-call and streaming passthrough on claude-opus-4-7Engineering leads
3Self-improving loop — does the gateway optimize routing and prompts over timeCFO (cost curve); CTO (capability curve)
4SOC 2 Type II, ISO 27001, BAA, FedRAMP postureSecurity; procurement
5RBAC depth, SSO + SCIM, delegated administrationIT; identity team
6Self-host / BYOC / air-gap deployment optionsSecurity; compliance
7Vendor stability and assignment-and-novation languageLegal; procurement
8Pricing, AWS Marketplace listing, EDP burn-downProcurement; AWS account team

Every pick below is scored on all eight. Partial credit is named explicitly.


How we filtered the cohort

Two filters reduced the universe of Anthropic-compatible gateways to five. First, Claude Code compatibility, pointing Claude Code at the gateway via ANTHROPIC_BASE_URL has to preserve tool-use blocks and SSE streaming. Second, enterprise readiness on at least one buying axis, a real SOC 2 attestation path, source-available code, or a parent company with long-standing enterprise relationships.

Notable exclusions: OpenRouter (consumer-facing); Cloudflare AI Gateway (right for existing Cloudflare customers but doesn’t match VPC-first or BYOC-first constraints, covered in the sibling enterprise post); TrueFoundry (right when consolidating inference plus gateway plus MLOps under one MSA, a different buying motion).


1. Future AGI Agent Command Center: Best overall for Claude Code in 2026

Verdict. Future AGI is the only entry in this cohort that closes the loop from trace to evaluation to optimizer to route, the gateway gets better at its job every week instead of staying flat. traceAI, ai-evaluation, and agent-opt are Apache 2.0, so the customer’s data-collection layer runs on source-available infrastructure regardless of the vendor’s future. The hosted Agent Command Center adds failure-cluster views, live Protect guardrails (the Future AGI Protect model family. Gemma 3n fine-tuned adapters across Content Moderation, Bias Detection, Security, and Data Privacy Compliance; multi-modal text, image, and audio), four-level RBAC, SOC 2 Type II, HIPAA, GDPR, and CCPA all certified, BAA available, and an AWS Marketplace listing. ISO/IEC 27001 is in active audit alongside the existing SOC 2 Type II / HIPAA / GDPR / CCPA stack.

Eight-axis read. Per-session traces are native with session ID as a top-level span attribute, per-developer aggregation via fi.attributes.user.id, per-repo tagging via span attributes, group-by-developer and group-by-repo are dashboard-native, not export-to-warehouse. Tool-use blocks and SSE streaming both pass through preserved. The self-improving loop is the wedge no other gateway in this list ships: every captured trace gets scored by fi.evals, low-scoring sessions get clustered, fi.opt.optimizers (ProTeGi, Bayesian, GEPA) rewrite the system prompt or routing policy, and the gateway applies the updated route on the next request. For Claude Code specifically the typical optimization is “route turns under 10K input tokens to claude-haiku-4-5, everything else to claude-opus-4-7”, a team that starts at $40,000/month typically sees costs trend down 15-30% within four weeks without changing developer behavior.

On compliance, SOC 2 Type II + HIPAA + GDPR + CCPA are all certified per futureagi.com/trust; ISO/IEC 27001 is in active audit alongside the existing SOC 2 / HIPAA / GDPR / CCPA stack. BAA available, request via FAGI sales. Retention tiered (hot thirty days, warm one year on Parquet, cold seven years on Glacier) and configurable per repo class so SOX-scope repos inherit the seven-year tier. Four-level native RBAC (org > business-unit > sub-business-unit > cost-center) with delegated administration, SAML SSO across major IdPs, SCIM for Okta and Azure AD. BYOC is the flagship deployment with both planes in the customer’s AWS or Azure account, air-gapped enclaves on the same stack. Future AGI has raised approximately $1.9M (Powerhouse, Snow Leopard, Arka, Wellfound Quant Fund), earlier-stage than Portkey or Kong, with the Apache 2.0 license on the data layer as the primary mitigation. Pricing: Free at 100K traces/month, Scale at $99/month, Enterprise custom, AWS Marketplace listing routes contracts through the existing EDP commitment. The Future AGI Protect model family (Gemma 3n fine-tuned adapters across Content Moderation, Bias Detection, Security, and Data Privacy Compliance) ships at ~67ms text latency per arXiv 2510.13351, keeping inline policy enforcement from breaking Claude Code’s interactive UX.

Where it falls short.

  • ISO/IEC 27001 in active audit; procurement requiring the ISO attestation letter today should request the audit timeline. SOC 2 Type II + HIPAA + GDPR + CCPA are already certified.
  • Earlier-stage vendor than Portkey or Kong. Apache 2.0 plus AWS Marketplace mitigates the exit path.
  • BYOC active-active across regions requires SRE time, two to three weeks at cutover plus cross-region transfer and hot-tier replica cost.
  • The optimization layer is heavier than what a one-week pilot needs; for the first month of chargeback-only, the loop isn’t what to evaluate.

Score: 7/8. Partial credit on attestation timing.

Choose Future AGI when Claude Code is becoming a material line item (any volume, agent-opt compounds value as production traffic flows), the committee values BYOC plus source-available code-collection, the AWS account team can route the contract through Marketplace, and the team wants the cost curve to bend down over time.


2. Portkey: Best for the polished hosted option

Verdict. Portkey is the most polished hosted Claude Code gateway in 2026, with the deepest pre-built compliance catalog among hosted-only products and a mature prompt-library UI. Type II attested under NDA, ISO 27001 on the list, mature DPA aligned to current EU SCCs. The 2026 variable is the April 30, 2026 Palo Alto Networks acquisition announcement (close expected PANW fiscal Q4). Inside the PANW stack the acquisition is upside; outside it, multi-year contracts need assignment-and-novation language with a termination-without-penalty trigger.

Eight-axis read. Per-session traces via the trace_id request header, per-developer chargeback via the virtual-key system (each developer gets their own key fanning out to one underlying Anthropic key, preserving bulk pricing), per-repo via metadata headers, the header model requires Claude Code wrapper changes. Tool-use and SSE confirmed working with claude-opus-4-7 and claude-sonnet-4-6. No optimizer; traces inform humans, humans configure policy.

Native four-tier RBAC (org > workspace > project > virtual-key) with delegated administration via SAML role claims, SAML SSO across major IdPs, SCIM supported. Hosted multi-region across US-East, US-West, EU-West, and APAC Singapore pinned per workspace; BYOC available with data plane in the customer account, control plane in Portkey cloud unless private deployment is negotiated. Series A with disclosed funding above $10M, post-close PANW backing puts the customer behind a parent with a $100B+ market cap. Free tier at 10K requests/day, Pro at $99/month, Enterprise custom.

Where it falls short.

  • PANW acquisition is a procurement variable. Add assignment-and-novation with a termination-without-penalty trigger if post-close terms degrade.
  • Four-tier RBAC is the deepest native; deeper org charts flatten one level into metadata.
  • Air-gap is custom, not default, wrong fit for high-side workloads without a paid customization SOW.
  • No optimizer. Cost curve stays flat unless the team does the optimization work manually.

**Score: 6.5/8.

Choose Portkey when the priority is a hosted, attested-today catalog with a polished prompt library, the committee is comfortable with the PANW variable handled by contractual language, and the team is satisfied with monitoring plus virtual keys as the end state.


3. Kong AI Gateway: Best for FedRAMP-shaped procurement

**Verdict.Weakness: the AI Proxy plugin is newer than rate-limiting, the AI-native observability story is plugin-driven, the chargeback dashboard your finance team will accept takes two to four weeks of platform-team time to wire.

Eight-axis read. Per-session traces through OpenTelemetry plugins (span attributes wired via Lua or the AI Proxy plugin), per-developer chargeback via consumer + tag patterns with a typically-Grafana dashboard on the OTel sink, per-repo via consumer tags. Tool-use and streaming confirmed working through AI Proxy plugin on Kong 3.6+. No optimizer.

” Consumer-and-workspace-shaped RBAC with tag-based scoping, three-plus levels configurable but heavier to set up than Portkey’s native four-tier, SAML SSO via OIDC and JWT plugins, SCIM via Konnect. Self-hosted Kong is the reference air-gap deployment in this cohort, deployed inside federal enclaves for years. Series E, funding above $200M, strongest financial profile in this cohort. Kong open source is free, Konnect starts free, Enterprise plans with SLA and AI Proxy support start around $1.5K/month, at 5,000-engineer scale expect a six-figure annual contract.

Where it falls short.

  • AI observability is plugin-driven; default dashboard is REST-shaped. Chargeback view finance wants takes two to four weeks of platform-team time.
  • AI Spend plugin is newer than rate-limiting and still maturing.
  • Plugin stacking is operationally heavy; small platform teams feel the weight.
  • No optimizer.
  • Standing up Kong only for Claude Code is a heavier lift than the alternatives.

Score: 6/8. Partial credit on AI-native catalog depth and optimizer absence.


4. LiteLLM: Best for source-available VPC-only self-host

Verdict. LiteLLM is the pick when Claude Code traffic can’t leave the VPC and the security team wants to read every line of code that touches a prompt. Source-available under MIT, Python-native, runs as a proxy inside the customer’s infrastructure. The 2026 variable is the March 24, 2026 PyPI supply-chain compromise, versions 1.82.7 and 1.82.8 exfiltrated SSH keys and cloud credentials per Datadog Security Labs. The vendor has shipped a clean post-incident response, but most Fortune 500 buying committees will want to see the audit before signing.

Eight-axis read. Per-session traces via metadata pass-through (metadata.session_id and metadata.trace_id in the proxy config), per-developer chargeback via team_id and user_id on virtual keys with team_id mapped to the SSO claim, per-repo via custom metadata, the dashboard is functional but not polished, so slicing by repo means exporting to a SQL analytics warehouse. Tool-use blocks and streaming both preserved. No native optimizer; the supported pattern when the customer wants both the loop and the VPC-only constraint is LiteLLM in front of Anthropic with Future AGI’s traceAI Apache 2.0 sink behind it for the trace, eval, and optimizer layers, both inside the customer’s VPC.

No first-party SOC 2 on the OSS distribution; the LiteLLM Enterprise tier carries an attestation path and a BAA. Team and user scoping native, deeper hierarchies via virtual-key tagging, SAML SSO in the Enterprise tier. Strongest self-host story by design, source-available, runs on the customer’s nodes, no telemetry leaves the VPC. YC-backed, with the Enterprise tier as the commercial entity. Open source under MIT, Enterprise starts around $250/month for small teams and scales up.

Where it falls short.

  • March 24, 2026 PyPI supply-chain compromise is the dominant procurement variable. Insist on the post-incident audit, package-signing chain, pinned-version policy, and SBOM.
  • No native polished dashboard. Plan a SQL or analytics-warehouse sink for chargeback.
  • No optimizer; pair with traceAI plus fi.opt if the loop matters.
  • Observability story is thinner than the hosted alternatives. Plan to wire an OTel sink behind LiteLLM for depth.
  • Smaller community footprint than Kong’s; plugin ecosystem is Python-centric.

Score: 5.5/8. Partial credit on supply-chain risk, optimizer absence, and dashboard polish.

Choose LiteLLM when VPC-only is gating, the security team is satisfied with the post-March-24 audit, the platform team is comfortable with a Python-native proxy, and the customer is prepared to pair LiteLLM with traceAI plus a SQL sink.


5. Helicone: Best for lightweight pilots

Verdict. Helicone is the right pick when you want per-request observability for Claude Code and nothing else. Drop the proxy URL in front of Anthropic, get a per-request cost table, move on with your week. The 2026 variable is the parentage change: Helicone was acquired by Mintlify in March 2026, and Mintlify was acquired by Stripe in late 2025. The roadmap inherits a documentation-platform-first parent; for low-stakes pilots this is acceptable, for multi-year enterprise contracts most buying committees will treat this as planned-migration risk.

Eight-axis read. Per-session via Helicone-Session-Id, per-developer via Helicone-User-Id, per-repo via custom properties, aggregation is simple, slicing shallower than Portkey or Future AGI. Tool-use and streaming both confirmed working. No optimizer; routing intelligence is basic (round-robin / failover); Claude-Code-specific routing has to be coded by the team upstream of the proxy.

SOC 2 Type II attested on the pre-acquisition entity, BAA on request. RBAC is lighter than Portkey or Future AGI, workspace-level access controls, SSO in higher tiers. Open-source self-host is available but scale-out beyond a few hundred RPS gets operational, good for low-volume teams, not for 5,000-engineer rollouts. Parentage change is the dominant stability variable. Free tier at 10K requests/month, Pro at $25/month, Enterprise custom, lightest cost in the cohort.

Where it falls short.

  • Parentage change (Mintlify → Stripe; Helicone → Mintlify) is the dominant 2026 variable. Most committees treat the gateway as planned-migration risk for multi-year contracts.
  • No optimizer.
  • No prompt library worth comparing to Portkey’s.
  • Routing intelligence is basic.
  • Self-host works for low-volume; not the right fit at scale.

Score: 4.5/8. Partial credit on every axis touched by the parentage change.

Choose Helicone when the workload is a low-stakes pilot, the team values setup simplicity over RBAC depth, the committee is comfortable with the parentage-change risk, and the use case is monitoring rather than routing or optimization.


The 8-axis capability matrix

AxisFuture AGIPortkeyKong AI GatewayLiteLLMHelicone
Cost attribution (session / dev / repo)Native, four-levelHeader + virtual keyOTel plugin + tagMetadata + virtual keyHeader + custom prop
Tool-call + streaming on claude-opus-4-7YesYesYes (3.6+)YesYes
Self-improving loopfi.opt + auto-routeNoNoVia traceAI pairingNo
SOC 2 Type II / ISO / BAA / FedRAMPType II + HIPAA + GDPR + CCPA certified; BAA; FedRAMP via BYOCType II attested; ISO 27001; BAA; FedRAMP via PANWType II (Konnect); ISO 27001; BAA; FedRAMP-aligned ref archEnterprise tier attestation; BAA via contractType II (pre-acquisition); BAA on request
RBAC + SSO + SCIM4-level + delegated; SSO + SCIM4-tier; SSO + SCIMConsumer + workspace + tag; SSO + SCIMTeam + user + virtual key; SSO in EnterpriseWorkspace; SSO in higher tiers
Self-host / BYOC / air-gapBYOC default; air-gap referenceBYOC; air-gap customSelf-hosted air-gap standardOSS self-host (strongest)OSS self-host (low scale)
Vendor stability + retention~$1.9M raised; Apache 2.0 mitigation; tiered retentionSeries A; PANW post-close; mature retentionSeries E; SIEM-exported retentionYC-backed; post-March-24 auditMintlify → Stripe parentage
Pricing + AWS Marketplace + EDP burn-down$99 Scale; AWS Marketplace; Enterprise custom$99 Pro; Enterprise custom$1.5K/mo Enterprise; bring-your-own-SIEMOSS free; Enterprise from $250/mo$25 Pro; Enterprise custom

Three patterns surface. Future AGI is the only entry checking the self-improving-loop box. Portkey, Kong, and Future AGI are the three entries whose compliance catalog scales to Fortune 500 procurement without footnotes. LiteLLM and the lightweight proxy fit specific lower-stakes shapes but carry meaningful 2026 variables, the supply-chain incident and the parentage change.


The buying-process timeline: week 1 through week 12

The most common procurement mistake is treating an AI gateway as a technical decision and discovering in week 8 that legal needs another four weeks to red-line the DPA. Here is the twelve-week timeline that keeps the launch date in the quarter you committed to.

Week 1. Discovery and buying committee. Identify every stakeholder who will block the decision: developer experience lead, two engineering leads from the largest pilot teams, FinOps, security architect, compliance/GRC, IT identity team, legal counsel, AWS or Azure account team, procurement, executive sponsor. Output: a stakeholder matrix mapping each name to the buying axes they care about.

Week 2. Requirements and RFP draft. Use the eight-axis rubric above to draft the RFP. Three to five vendor questions per axis in procurement language. Output: a 15-20 question RFP signed off by procurement, security, and legal.

Weeks 3-4. RFP issuance and short-listing. Issue the RFP to the five-vendor longlist with a ten-business-day response window. In parallel, engineering runs an early technical evaluation on each vendor’s free tier, point Claude Code at each gateway via ANTHROPIC_BASE_URL, verify tool-use and streaming, capture preliminary per-session cost data. Output: a shortlist of three vendors.

Week 5. Security questionnaire and SOC 2 review. Send each shortlisted vendor your security questionnaire (50-150 questions). Request the Type II report, the latest pen-test summary, SBOM, sub-processor list, and the DPA. Output: a security score per vendor with required mitigations.

Week 6. Pricing negotiation and AWS Marketplace path. Each vendor presents pricing for the customer’s projected annual volume. Verify the AWS Marketplace listing and whether the contract draws down the existing EDP commitment. Negotiate the storage tier line item explicitly. Output: a TCO comparison across years 1, 2, and 3.

Week 7. Legal red-line on MSA, DPA, BAA. Required 2026 additions: assignment-and-novation with a termination-without-penalty trigger, explicit sub-processor list with right-to-object language, audit-log retention aligned per repo class, BAA-on-request even on non-HIPAA pilots. Output: a fully red-lined contract package per vendor.

Week 8. Pilot kick-off. Provisional vendor selected. Pilot kicks off with 25-50 engineers from one or two pilot teams (30-day recipe in the next section).

Weeks 9-10. Pilot execution. Engineering captures per-session cost data, tool-use success rate, streaming latency, developer satisfaction. FinOps verifies the chargeback dashboard against expectations. Security verifies log retention and audit trail.

Week 11. Pilot evaluation and steering committee decision. Results presented; decision is proceed to full rollout, extend, switch vendors, or escalate.

Week 12. Contract signature and rollout kick-off. Contract signed, production rollout planned in waves, AWS Marketplace contract executed, SCIM provisioning enabled.

Three calibrations: week 7 is the most common slippage source, start it earlier if your legal team is small; AWS Marketplace contracting compresses weeks 7 and 12 by two to three weeks when the AWS account team is engaged early; serializing the steps is the right call for most enterprises rather than running pilot and procurement in parallel.


The RFP question template: 18 questions to send vendors

Copy this template, customize the bracketed sections, and send to each shortlisted vendor with a ten-business-day window.

Compliance and audit (questions 1-5)

  1. Provide your current SOC 2 Type II report. If Type II is in progress, provide the Type I report, the bridge letter, and the expected attestation date for Type II.
  2. Provide your ISO 27001 certificate or the documented path to certification with a timeline.
  3. Confirm BAA availability on request. Describe the scope of the BAA (which products, which sub-processors).
  4. Describe your audit log retention policy. Confirm the maximum retention period available on the Enterprise tier and the per-TB-year cold-tier economics.

Data residency and deployment (questions 6-9)

  1. List the regions in which the hosted control plane and data plane operate. Confirm region pinning per workspace, project, or tenant.
  2. Confirm BYOC deployment availability. Describe what runs in the customer’s account versus what runs in the vendor’s cloud.
  3. Confirm air-gap deployment support, including the egress posture, telemetry options, and any required customization SOW.
  4. Provide the sub-processor list with right-to-object language in the DPA. Confirm change-notification SLA for sub-processor additions or removals.

Identity, access, and observability (questions 10-13)

  1. Describe the RBAC hierarchy depth and whether delegated administration is native or via metadata flattening. Provide a diagram of the four levels above the user.
  2. Confirm SAML SSO support across Okta, Azure AD, Google Workspace, Auth0, OneLogin. Confirm SCIM support and the IdPs covered.
  3. Describe per-session, per-developer, and per-repository cost attribution. Confirm whether the dashboard supports group-by-developer and group-by-repo natively or requires export to an analytics warehouse.
  4. Confirm tool-call and SSE streaming passthrough on claude-opus-4-7 and claude-sonnet-4-6 as of the response date. Provide test artifacts if available.

Vendor stability and pricing (questions 14-18)

  1. Provide the corporate entity, funding history, and any pending or completed M&A transactions in the last twelve months. If acquired, provide the post-close MSA template and confirm assignment-and-novation language is acceptable.
  2. Provide the pricing model for projected annual volume of [X million requests / Y million tokens / Z developers]. Provide year-1, year-2, and year-3 TCO assuming 30% YoY growth.
  3. Confirm AWS Marketplace listing and whether the contract draws down an existing AWS EDP commitment. Provide the listing URL.
  4. Describe the optimizer or self-improving loop, if any. Confirm whether the gateway updates routing or prompt policies based on captured trace data, or whether policy is human-configured only.
  5. Provide three reference customers at similar scale (5,000+ engineer Claude Code deployment or equivalent token volume) willing to take a thirty-minute reference call.

Two notes. Question 17 separates the gateway market, vendors that answer “yes, with a self-improving loop” are in a different product category than vendors that answer “policy is human-configured.” Question 18 is the most predictive, vendors that can’t provide three references at the customer’s scale aren’t yet ready for the customer’s procurement.


The 30-day pilot recipe: the framework engineering runs in week 8

Most pilots fail because the acceptance criteria aren’t defined before the pilot starts. Here is the thirty-day recipe that turns a pilot into a procurement input rather than a science experiment.

Day 0. Pilot scope. Pick two teams totaling 25-50 engineers, one backend-heavy and one frontend-heavy to stress different workload shapes. Define three target metrics with acceptance thresholds: per-developer cost variance (high/median/low), tool-use success rate, and pilot-team NPS for the gateway experience.

Days 1-3. Provisioning. Provision the gateway. Issue virtual keys per developer (Future AGI, Portkey, LiteLLM) or wire metadata headers (Helicone). Configure SAML SSO and SCIM. Set ANTHROPIC_BASE_URL in the pilot teams’ shell profiles, not the IDE plugin, terminal CLI usage misses chargeback otherwise alone.

Days 4-7. First-week data capture. Engineers use Claude Code normally. By end of day 7, you should have a per-session cost histogram, a per-developer leaderboard, and a per-repo aggregation. Verify the top 10% of pilot users account for roughly 50% of token spend, if your distribution differs significantly from this baseline, the dashboard is missing data.

Days 8-14. Tool-use and routing stress test. Each engineer runs at least one multi-file refactor with claude-opus-4-7, one quick bug-fix with claude-haiku-4-5, and one long-context session over 100K tokens. Acceptance gate: zero tool-use degradation, no streaming UX regression, session-level cost data captured for 100% of sessions.

Days 15-21. Chargeback validation. FinOps reviews the chargeback dashboard with engineering leads. Verify the dashboard surfaces the top-10% power users by name and spend. Run a what-if budget cap scenario with a soft-alert at 80% and a hard-pause at 110%.

Days 22-28. Security and audit validation. Security architect reviews the audit log against the customer’s compliance matrix. For gateways with a self-improving loop (Future AGI), verify that policy changes appear in the audit trail with the evidence chain, eval scores, failure-cluster IDs, version diff.

Days 29-30. Debrief and decision. Pilot teams give NPS feedback. Platform team presents the three target metrics against day-0 thresholds. Steering committee decides: proceed, extend, switch, or escalate.

Three calibrations: the most common pilot failure is a missed expectation about chargeback granularity, so verify dashboard slicing on day 7 not day 30; loop pilots (Future AGI) need the full thirty days to show the cost curve bending; SCIM provisioning needs day-0 start because IdP teams typically take three to five business days.


How Future AGI closes the loop on the buying question

The other four gateways treat token monitoring as an end state, capture trace, show dashboard, send alert, hope humans rewire prompts and routing. The cost curve stays roughly flat.

Future AGI Agent Command Center treats the captured trace as the input to a closed loop. Every Claude Code turn is traced via traceAI (Apache 2.0); scored by fi.evals on task-completion, faithfulness, and code-correctness; low-scoring sessions clustered by failure mode in the Agent Command Center; fi.opt.optimizers (ProTeGi, Bayesian, GEPA) rewrite the system prompt or adjust routing against the clustered failures; the gateway applies the updated policy on the next request; the new prompt plus route are versioned with automatic rollback if score regresses. For Claude Code specifically the typical optimization is “turns under 10K input tokens to claude-haiku-4-5, everything else to claude-opus-4-7.”

Net effect: a team that starts at $40,000/month typically sees costs trend down 15-30% within four weeks without changing developer behavior. Protect guardrail ships at ~67ms text latency per arXiv 2510.13351, so inline policy enforcement doesn’t break the interactive UX.

The three building blocks are open source: traceAI, ai-evaluation, and agent-opt (github.com/future-agi). The hosted Agent Command Center adds failure-cluster views, live Protect guardrails, four-level RBAC, SOC 2 Type II + HIPAA + GDPR + CCPA all certified with BAA available, AWS Marketplace listing for procurement burn-down, and BYOC deployment.


Common buying mistakes: and the fix for each

MistakeFix
Treating SOC 2 Type II as binaryScore on observation period, scope, exceptions, and bridge-letter availability — not on the presence of the attestation alone
Missing the BAA on a non-HIPAA pilotNegotiate BAA-on-request at MSA signature, not at HIPAA-team onboarding
Signing multi-year with no assignment-and-novationAdd assignment-and-novation with a termination-without-penalty trigger if post-close DPA, sub-processor list, or contract entity degrades
Picking hosted, then finding the air-gap subsidiary needs a different vendorBuy the gateway whose BYOC fits the most-restrictive subsidiary; run hosted as default elsewhere
Accepting default audit retentionMap each repo class to a records-retention schedule; negotiate tier economics in the MSA
Not asking for the sub-processor listDemand the full list with right-to-object language and change-notification SLA
Treating vendor stability as binaryScore on the exit path (Apache 2.0, AWS Marketplace, self-host), not on revenue thresholds alone
Skipping pilot acceptance gatesDefine three target metrics with acceptance thresholds on day 0; debrief against those on day 30
Pointing only the IDE plugin at the gatewaySet ANTHROPIC_BASE_URL in pilot teams’ shell profiles, not just the IDE
Tagging with only user_id, not session_idTag both — the session ID is what makes the “cost to fix one bug” question legible

What we did not include

Three gateways show up in adjacent 2026 listicles but didn’t match this cluster’s buying motion. OpenRouter, consumer-facing routing, not the right shape for enterprise chargeback. Cloudflare AI Gateway, strong for existing Cloudflare customers, doesn’t match BYOC-first or VPC-first constraints. TrueFoundry, right when consolidating inference plus gateway plus MLOps under one MSA. The sibling enterprise post covers Cloudflare and TrueFoundry in detail.



Sources

  • Anthropic Claude Code documentation, claude.ai/docs/claude-code
  • Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
  • Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
  • Portkey AI gateway, portkey.ai
  • Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026
  • Kong AI Gateway and AI Proxy plugin, konghq.com/products/kong-ai-gateway
  • LiteLLM proxy, github.com/BerriAI/litellm
  • Datadog Security Labs LiteLLM PyPI supply-chain writeup (March 24, 2026), securitylabs.datadoghq.com
  • Helicone proxy, helicone.ai
  • Mintlify (Helicone parent), mintlify.com

Frequently asked questions

Which gateway has SOC 2 Type II attested today?
Future AGI, Portkey, Kong Konnect, and Helicone (pre-acquisition entity) all ship SOC 2 Type II. Future AGI's [trust page](https://futureagi.com/trust) lists Type II + HIPAA + GDPR + CCPA certified; ISO/IEC 27001 is in active audit. LiteLLM's Enterprise tier carries an attestation path; the OSS distribution is in the customer's audit scope.
Which gateway is the right answer for federal procurement?
FedRAMP authorization is on the partner roadmap; federal-facing Claude Code deployments today run via air-gapped self-host (BYOC). Future AGI's Apache 2.0 OSS data plane plus BAA covers the regulated VPC; Kong Konnect runs in-VPC by default; LiteLLM OSS is the baseline air-gapped option. The audit shape (per-developer attribution, immutable spans, OTel-native retention) is identical to the cloud-control-plane setup — only the data plane changes.
How much does it cost to put 1,000 engineers on Claude Code through a gateway?
For a 1,000-engineer org with a $1,500 average per-engineer monthly spend ($18M/year of Anthropic tokens), the gateway's incremental cost lands in the $150K-$400K/year range depending on tier and configuration. The cost case is not the gateway's fee — it is the 15-30% token-spend reduction the loop produces, which at this scale is $2.7M-$5.4M/year.
Can we route Claude Code through non-Claude models?
Yes, but with care. Claude Code is tuned for Claude models; routing to non-Claude models often degrades tool-use. The safe pattern is to route between `claude-haiku-4-5`, `claude-sonnet-4-6`, and `claude-opus-4-7` by token budget, not to swap providers.
What is the right audit retention policy for SOX-scope repos?
Seven years from creation, applied to the gateway audit log because the gateway is part of the chain of custody. Map each repo to its records-retention schedule and configure tiered storage so the seven-year requirement applies only to in-scope repos.
How does Future AGI compare to Portkey for Claude Code?
Portkey is a polished hosted gateway with a mature pre-built catalog — Type II attested, ISO 27001, four-tier RBAC, virtual keys, prompt library. Future AGI is the BYOC gateway with Apache 2.0 data layer, AWS Marketplace contract path, SOC 2 Type II + HIPAA + GDPR + CCPA all certified with BAA available per [futureagi.com/trust](https://futureagi.com/trust), and a closed loop that turns trace data into prompt rewrites and routing-policy updates. Pick Portkey for the hosted catalog; pick Future AGI for BYOC plus source-available data layer plus the self-improving loop.
Related Articles
View all
Top 5 Tools for Claude Code Cost Management in 2026
Guides

Five tools for Claude Code cost management in 2026 — four gateways plus the native Anthropic dashboard and a FinOps platform — scored on attribution, chargeback, caps, routing, cache observability, FinOps integration, and audit trail.

NVJK Kartik
NVJK Kartik ·
18 min