Guides

Top Enterprise AI Gateways to Use Non-Anthropic Models in Claude Code in 2026

Five enterprise AI gateways scored on running Claude Code against non-Anthropic models in 2026: model whitelist enforcement, BYO and on-prem inference, audit logs, SOC 2 / BAA procurement, translation SLA, and cost-aware routing.

·
20 min read
ai-gateway 2026 claude-code hr
Editorial cover image for Top Enterprise AI Gateways to Use Non-Anthropic Models in Claude Code in 2026
Table of Contents

The first conversation a platform team has about Claude Code rarely goes the way the developers expect. Engineering wants to ship the CLI to 300 seats by Friday. Procurement wants the SOC 2 Type II report, the Business Associate Agreement, and a vendor risk file. Security wants to know which upstream models developers are allowed to call. By the time the lawyers are done, “Claude Code” is no longer a CLI, it’s a procurement artefact with a controlled model list, an audit trail, and a deployment footprint security can name.

Claude Code, as shipped, wasn’t built for that conversation. The binary speaks one protocol, points at one vendor, and has no notion of an enterprise-side model whitelist. If procurement has standardised on GPT-5 through an Azure landing zone, on a self-hosted Llama 3 70B running on internal H100s, or on Gemini through an existing GCP relationship, Claude Code doesn’t know how to reach those upstreams. And when it does reach them through a translation layer, the question of which gateway makes that translation an enterprise-grade artefact rather than a side project is non-trivial.

This guide is about that gateway choice. The angle is enterprise procurement and governance, model whitelist enforcement, SOC 2 evidence, BAA availability, BYO model and on-prem inference, audit-grade cross-provider routing logs, not translation mechanics in isolation. A sibling guide covers translation fidelity in depth; this one assumes translation works and asks what else the gateway has to do to survive a security review.

This is the 2026 cohort, scored on the seven enterprise axes that decide whether the gateway clears the procurement gate.


TL;DR: pick by procurement constraint

Procurement constraintPickWhy
Enterprise-grade translation plus a self-improving optimization loop, with BYOC and on-premFuture AGI Agent Command CenterOnly entry that ships the translation, the SOC 2 evidence, the BAA, the BYOC deployment, and the eval-driven optimizer in one stack
Hosted multi-provider gateway with the deepest virtual-key and RBAC controls for non-Anthropic upstreamsPortkeyFastest path when procurement accepts hosted-with-BYOC and OpenAI is the primary non-Anthropic target
Coding-agent-tuned open-source runtime with explicit Claude-Code-with-any-provider supportMaxim BifrostWhen the team wants source-available translation packaged for the coding-agent workload specifically
API-gateway-grade SLA, plugin ecosystem, and existing platform-team familiarityKong AI GatewayWhen Kong is already the API gateway of record and the AI extension is the path of least operational resistance
Source-available, self-host-anywhere proxy with Enterprise SLA and on-prem inference supportLiteLLM (Enterprise)When Claude Code traffic must terminate inside the VPC and reach OpenAI, Gemini, Bedrock, and OSS models from one auditable Python codebase

Why “non-Anthropic in Claude Code” is an enterprise procurement question

A developer pointing their personal Claude Code install at GPT-5 through OpenRouter is running an experiment. The same setup across 300 enterprise developers is source code traversing a vendor with no MSA, on terms engineering accepted without legal review. Most security teams will block it the first time they see the network logs. Three concrete concerns drive the enterprise gateway choice when the upstream is non-Anthropic.

The model whitelist becomes a real artefact. When the only model is Claude Opus, the whitelist conversation is trivial. The moment GPT-5, Gemini 2.5 Pro, Bedrock-hosted Llama, and an internal Qwen fine-tune are in the mix, the whitelist has structure: each model approved at a specific version through a specific deployment path. The gateway must enforce that whitelist at request time, including rejecting new model IDs upstream vendors release between security reviews.

Vendor risk multiplies with provider count. A single-vendor stack means one SOC 2 report, one DPA, one subprocessor list. Four upstreams means four independent vendor-risk files, and the gateway becomes the consolidation point for the audit. A gateway with its own SOC 2 Type II, BAA when healthcare workloads are in scope, and a clear subprocessor list saves the procurement team months.

BYO model and on-prem inference are the highest-trust deployment. For defence, some healthcare workloads, and parts of financial services, sending source code to any hosted model is non-starter. The only path is a self-hosted model inside the network. The gateway must terminate Claude Code’s Anthropic-shaped request and forward it to an internal vLLM or Triton endpoint serving Llama 3 70B or a fine-tuned Qwen, end-to-end inside the VPC. Not every gateway can do this; the ones that can are the procurement-grade ones.

For the rest of this guide, “gateway” means an AI gateway that speaks the Anthropic protocol to Claude Code, translates to at least one non-Anthropic upstream, and is procurement-ready for an enterprise rollout.


The 7 enterprise axes we score on

The translation axes from the sibling Any-LLM guide aren’t the right scoring frame here. A gateway that translates well but can’t pass vendor risk doesn’t get deployed. We scored each pick on seven enterprise-procurement axes.

AxisWhat it measures
1. Enterprise-grade translation with SLAAnthropic-to-OpenAI / Gemini / Llama translation that vendor commits to under a stated SLA, not best-effort
2. Model whitelist enforcementThe gateway can enforce a controlled list of upstream models, with policy-side rejection of off-list calls and audit of attempts
3. BYO model and on-prem inferenceThe gateway can route Claude Code traffic to self-hosted upstreams (Llama, Qwen, internal fine-tunes) inside the customer’s VPC
4. Audit log for cross-provider routingEvery routing decision (which model fired, which policy matched, what the input-token estimate was) is queryable, retained, and exportable for compliance
5. Procurement-ready vendor postureSOC 2 Type II, ISO 27001, BAA availability, AWS Marketplace / GCP Marketplace listing, a clear subprocessor list, and a DPA you can actually sign
6. Translation reliability and monitoringThe gateway scores its own translation correctness over time so regressions on non-Anthropic upstreams are caught by the system, not by a developer reporting a freeze
7. Cost-aware routing within the whitelistThe gateway picks the cheapest whitelisted model that meets the quality bar for each turn-shape, not a single static fallback chain

The verdict line at the end of each pick scores all seven.


How we picked

We started from the universe of AI gateways advertising Anthropic-compatible inbound, non-Anthropic upstream translation, and an enterprise-tier offering with SOC 2 evidence as of May 2026. We removed gateways without an enterprise procurement story, consumer marketplaces, hobbyist proxies, and gateways still in beta on the Anthropic-inbound path. We removed two products whose model-whitelist enforcement was documented but not actually applied at request time (the docs described a roadmap item, not running behaviour). We removed one product whose BAA was only available at six-figure annual commits, a non-starter for mid-market teams. The remaining five are the cohort below.


1. Future AGI Agent Command Center: Best enterprise translation plus a closed loop

Verdict: Future AGI is the only gateway in this list shipping enterprise-grade translation, BYOC + on-prem deployment, SOC 2 evidence, BAA availability, AWS Marketplace procurement, and an eval-driven optimization loop in a single product. The other four are translation layers; Agent Command Center is the translation layer wired to a feedback loop.

What it does for enterprise non-Anthropic Claude Code: Translation built on an intermediate-representation step inside traceAI (Apache 2.0). Tool-use blocks survive intact including parallel calls; cache_control is honoured on Anthropic routes and remapped to OpenAI’s automatic cache or Gemini’s context-cache priming on those routes. Scale and Enterprise tiers carry an availability SLA with breach remedies named in the MSA. Allowed upstreams are declared in policy and enforced at request time, with off-list calls returning a structured error and logged for security review; new model IDs from upstream vendors don’t propagate until explicitly approved. BYOC runs inside the customer’s VPC, routing to OpenAI, Gemini, Bedrock, or to in-VPC vLLM / Triton endpoints serving Llama 3 70B, Qwen, or internal fine-tunes. Every routing decision is captured as span attributes, matched policy, selected upstream, model ID, input-token estimate, cost estimate, hot path or fallback, retained per customer policy and exportable in OpenTelemetry format to the SIEM. SOC 2 Type II certified (alongside HIPAA, GDPR, and CCPA) as of May 2026; ISO 27001 on roadmap; BAA available on Enterprise; AWS Marketplace listing with private-offer support; DPA, subprocessor list, and security questionnaire pack ship as standard. fi.evals scores every translated call on task-completion, tool-use correctness, and code-correctness, so regressions on a specific upstream show up as score drops before they become developer-reported bugs; the Protect guardrail adds 65 ms text median time-to-label text-evaluation latency (arXiv 2510.13351) to catch prompt-injection content travelling through cross-provider routes. Cost-aware policies key on input token count, tool-call complexity, eval-score history, and per-developer / per-repo metadata, a typical policy routes short turns under 8K to Gemini Flash, mid-range to GPT-5-mini or Claude Sonnet on Bedrock, and long-context or eval-flagged turns to Opus direct.

The loop. Every translated call is scored. Low-scoring sessions cluster by failure mode (“parallel tool calls collapsed on Provider X,” “cache hint dropped on Provider Y”). fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer with Optuna teacher-inferred few-shot and resumable runs, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing EarlyStoppingConfig) reacts two ways: rewrite the per-provider system-prompt prefix, or adjust routing weight so the offending provider drops out until reliability recovers. Policies are versioned with automatic rollback. This is the wedge no other gateway in this list implements end-to-end.

Where it falls short:

  • The full optimization loop is heavier than a procurement team wants in week one. Start with the gateway alone; switch on the optimizer once trace volume justifies it.

  • Prompt-library UI is less mature than Portkey’s; teams that lean on a shared prompt library should weigh that feature.

Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II, BAA, and AWS Marketplace for procurement.

Score: 7/7 axes.


2. Portkey: Best hosted multi-provider with mature enterprise controls

Verdict: Portkey is the most polished hosted product if procurement accepts hosted-with-BYOC and OpenAI is the primary non-Anthropic upstream. Virtual keys give every developer attributable usage on pooled provider credentials, RBAC covers the common enterprise patterns, and SOC 2 Type II is in place. The optimizer is absent.

What it does for enterprise non-Anthropic Claude Code: Anthropic-to-OpenAI translation works end-to-end for Claude Code’s standard tool-use patterns and Enterprise carries an availability SLA; the Gemini path lags and is documented as beta. Virtual keys scope to a subset of upstreams, with off-list rejection logged and administered through the Portkey console. BYOC deploys inside the customer’s VPC and routes to OpenAI-compatible endpoints including self-hosted vLLM; fully air-gapped deployments need extra coordination. Per-request logs are retained with metadata, but per-developer chargeback only works when the Claude Code wrapper sets the required headers, otherwise the audit trail collapses to one shared key. SOC 2 Type II in place; HIPAA / BAA available on Enterprise; AWS Marketplace listing exists; DPA and subprocessor list are standard. Translation correctness is observable in dashboards; acting on regressions is left to the operator. Conditional routing matches on metadata including token count and routes to the cheapest whitelisted model.

Where it falls short:

  • No optimizer. Translation traces inform humans, not the gateway.
  • Gemini translation parity with OpenAI lags. Confirm against your procurement-approved Gemini variant.
  • Per-developer attribution requires Claude Code wrapper changes; without those, virtual keys aggregate everyone under one identity.
  • Pricing escalates above 5M requests / month faster than open-source alternatives; BYOC is more constrained than LiteLLM’s source-available story.

Pricing: Free tier with 10K requests/day. Scale starts at $99/month. Enterprise is custom with SOC 2 Type II.

Score: 6/7 axes (missing: closed loop on translation correctness).


3. Maxim Bifrost: Best open-source runtime tuned for the coding-agent workload

Verdict: Maxim Bifrost is the pick when the team wants a source-available translation layer packaged for the Claude-Code-on-non-Anthropic workload, with the procurement story handled through Maxim AI’s hosted Enterprise offering. The translation matrix is tuned for coding-agent patterns, parallel tool calls, long diffs, multi-turn sessions.

What it does for enterprise non-Anthropic Claude Code: Anthropic-protocol inbound adapter maps to OpenAI, Bedrock, Vertex, and OSS upstreams, with the coding-agent test surface called out in docs; hosted Bifrost carries an SLA, the OSS runtime is as-is. Policy config supports a controlled model list with policy-layer rejection, less mature than Portkey’s UI-driven administration, but auditable. Runs anywhere including inside the VPC, with explicit support for Ollama and vLLM, making it a strong pick for fully self-hosted Claude Code + Llama 3 stacks. OpenTelemetry-native spans capture routing decisions and metadata for SIEM export. Hosted Bifrost from Maxim AI carries SOC 2 Type II; BAA on roadmap; AWS Marketplace listing not yet present as of May 2026. Bifrost publishes per-provider tool-use correctness numbers and updates them, more transparency than most peers, though treat as directional since vendor-reported. Policy supports token-count and tool-call-complexity rules; coding-agent patterns are first-class.

Where it falls short:

  • Younger project; long-tail bug surface still being shaken out at high RPS.
  • No closed loop; tool-use correctness is a metric, not a routing signal.
  • Enterprise controls (SSO, RBAC, audit retention) less mature than hosted alternatives.
  • Smaller community means an unsupported upstream is patch-it-yourself.

Pricing: OSS runtime under MIT. Hosted Bifrost is a separate commercial product; pricing on inquiry.

Score: 5/7 axes (missing: closed loop, AWS Marketplace, mature enterprise UI).


4. Kong AI Gateway: Best when Kong is already the API gateway of record

Verdict: Kong AI Gateway is the pick when the platform team already runs Kong for the company’s REST APIs and the SLA story comes from the existing Kong Enterprise contract. Strengths are deployment polish, plugin ecosystem, and SOC 2 / ISO 27001 maturity inherited from Kong’s core product. The weakness is that AI-specific concerns are plugin-driven rather than native.

What it does for enterprise non-Anthropic Claude Code: Kong’s AI Proxy plugin (3.6+) handles Anthropic-protocol inbound with translation to OpenAI, Azure, Bedrock, Vertex, and Ollama-style OSS upstreams; the plugin is younger than the core product and translation depth on Claude Code’s tool-use surface improves each release. Kong’s existing consumer + plugin model maps to whitelist enforcement naturally, with route-level rejection of off-list calls and audit through Kong’s existing audit-log plugin. Kong is self-host by design and AI Proxy supports self-hosted vLLM, Ollama, and Together, a clean pick for fully on-prem stacks where Kong is already deployed. Audit-log and OTel plugins combine to capture routing decisions; the chargeback dashboard is typically Grafana on top of the OTel sink rather than a native UI. Kong Inc. carries SOC 2 Type II and ISO 27001 on the core product; BAA available; AWS Marketplace listings for Kong Konnect. Translation correctness isn’t scored natively; the operator wires that into a downstream eval pipeline. Token-count-based routing requires Lua plugin work.

Where it falls short:

  • AI-specific observability is plugin-driven, not native. Default dashboard is the API-gateway view, not the LLM-cost view.
  • No optimizer.
  • Spend-tracking requires multiple plugins. Plan two weeks of platform-team time to deliver a finance-acceptable chargeback view.
  • Translation depth for the long tail of non-Anthropic providers lags the dedicated AI-gateway products in this list.

Pricing: Kong is open source. Konnect (managed) starts free. Enterprise plans for SLA, plugins, and support start around $1.5K/month.

Score: 5/7 axes (missing: native AI observability, closed loop, polished cost dashboard).


5. LiteLLM (Enterprise): Best source-available self-host with enterprise tier

Verdict: LiteLLM Enterprise is the pick when Claude Code traffic must terminate inside the VPC, security needs to read every line of the translator, and procurement needs an enterprise tier on top of the open-source codebase.

What it does for enterprise non-Anthropic Claude Code: Explicit translators for OpenAI, Azure, Gemini, Vertex, Bedrock, Cohere, Together, and a long tail of OSS endpoints. Source-readable in Python; corner cases patchable in-house; Enterprise adds SLA, SSO, and audit. The model_list config is the whitelist, with off-list rejection at request time and whitelist changes flowing through the same config-management process as any infrastructure change. BYO model and on-prem inference is LiteLLM’s strongest axis: self-host on customer infrastructure, route to OpenAI, Bedrock, Vertex, or any in-VPC OpenAI-compatible endpoint (vLLM, Triton, Ollama, TGI); air-gapped deployments supported. Spend tracking and per-key logs cover the basics, with Enterprise adding SSO, RBAC, and longer retention; slicing by repo or developer typically exports to the customer’s data warehouse. LiteLLM Enterprise from BerriAI provides SOC 2 Type II, SSO, contractual SLA, BAA availability, and AWS Marketplace listing. Spend and request metrics are first-class; tool-use correctness isn’t scored natively, plan to wire traceAI or another OTel sink behind LiteLLM for translation-behaviour depth.

Where it falls short:

  • No optimizer. Traces are observation only.
  • Native dashboard is functional, not polished. Slicing by tool-use success rate per provider means SQL.
  • High-RPS deployments need horizontal scaling and Python tuning.
  • The Enterprise tier is the procurement story; OSS alone doesn’t clear vendor risk for most regulated industries.

Pricing: OSS under MIT. Enterprise adds SLA, SSO, audit; starts around $250/month for small teams, custom at scale.

Score: 5.5/7 axes (missing: polished dashboard, closed loop on correctness).


Capability matrix

AxisFuture AGIPortkeyBifrostKong AI GatewayLiteLLM (Ent)
Enterprise translation + SLAYes, IR-basedYes, hostedYes, runtime-tunedYes, plugin SLAYes, source-readable
Model whitelist enforcementNative, auditedVirtual-key scopedPolicy configPlugin + consumermodel_list config
BYO model + on-prem inferenceBYOC + VPCBYOCOSS runtimeSelf-host nativeSelf-host native
Audit log for cross-provider routingNative + OTel exportPer-request logsOTel spansAudit + OTel pluginsSpend + Enterprise audit
Procurement posture (SOC 2 / BAA / Marketplace)SOC 2 IP, BAA, AWS MPSOC 2, BAA, AWS MPSOC 2 (hosted), BAA roadmapSOC 2, ISO, BAA, AWS MPSOC 2, BAA (Enterprise), AWS MP
Translation reliability monitoringEval-scored, optimizedObservablePer-provider metricOTel-basedSpend-focused
Cost-aware routing in whitelistPolicy + evalConditional routingPolicy configPlugin workmodel_list + hooks
Closed loop on translation correctnessfi.optNoMetric onlyNoNo

Decision framework: choose X if

Choose Future AGI if procurement wants the translation, SOC 2 evidence, BAA, BYOC, and AWS Marketplace listing in one stack, and engineering wants the gateway to learn which upstream is reliable for which turn-shape over time. Pick this when Claude Code on non-Anthropic models is becoming a meaningful spend item and the cost curve should bend down without continuous engineering attention.

Choose Portkey if procurement accepts hosted-with-BYOC, OpenAI is the primary non-Anthropic upstream, and the team values a polished virtual-key + RBAC console more than a closed loop.

Choose Maxim Bifrost if the team is building around the coding-agent + multi-provider workload, prefers a source-available runtime, and is willing to handle procurement through Maxim AI’s hosted offering. Accept a younger ecosystem in exchange for coding-agent focus.

Choose Kong AI Gateway if Kong is already the API gateway of record and the procurement posture inherited from Kong Enterprise is sufficient. Pick this when the company-wide gateway strategy is “everything is Kong.”

Choose LiteLLM Enterprise if Claude Code traffic must terminate inside the VPC, security needs to read the translator’s source, and procurement needs an enterprise tier on top of the open-source codebase.


Common procurement mistakes when wiring Claude Code to non-Anthropic models

MistakeWhat goes wrongFix
Treating the gateway like a developer tool, not a vendorProcurement finds out after rollout that the gateway has no DPA and no SOC 2Pre-clear the vendor like any other software purchase; the gateway is part of the supply chain
Letting developers point Claude Code at any non-Anthropic upstreamThe whitelist exists on paper but not in the runtime; security finds out from network logsEnforce the whitelist at the gateway layer; reject off-list calls server-side
Single shared key across developersPer-developer chargeback collapses; cross-provider audit becomes one rowIssue virtual keys per developer that fan out to pooled upstream credentials
Skipping the BAA because “we’re not handling PHI yet”A Claude Code use case touches PHI six months in; renegotiation is painfulProcure the BAA upfront whenever the company touches healthcare adjacencies
Hosted gateway in a region the data-residency policy forbidsLegal flags that the hosted gateway processes EU data in US infrastructureConfirm the processing region matches your data-residency policy; pick BYOC if not
Treating audit logs as a developer convenienceCompliance asks for six months of cross-provider routing data; retention was 30 daysSet retention to match the compliance regime, not the default
Skipping the model-version pin on non-Anthropic upstreamsOpenAI swaps a default GPT-5 variant; Claude Code’s tool use degrades Monday with no code changePin the model ID, not the alias; treat upstream-default changes as a configuration event
Ignoring translation regressions until a developer reports a freezeThe trace shows a Gemini update three weeks ago flipped a tool-call field orderUse a gateway that scores translation correctness over time; treat score drops as P1

How Future AGI closes the loop on enterprise non-Anthropic Claude Code

The other four picks treat enterprise non-Anthropic Claude Code as a deployment problem: ship the translation, document the procurement artefacts, monitor the network, fix bugs as they come in. Future AGI treats translation correctness, cost, and policy adherence as the inputs to a feedback loop.

traceAI (Apache 2.0) captures each turn’s span tree, inbound Anthropic request, matched whitelist policy, chosen upstream, translated request, upstream response, and the Anthropic-shaped response rebuilt for the CLI. fi.evals scores each turn on task-completion, tool-use correctness, and code-correctness rubrics. A translation regression on a specific non-Anthropic upstream (say, a Gemini point release that returns functionCall parts in a slightly different order) shows up as a tool-use-correctness drop scoped to that provider, not as a developer-reported freeze three days later.

Low-scoring sessions cluster by failure mode. fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer with Optuna teacher-inferred few-shot and resumable runs, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing EarlyStoppingConfig) reacts two ways: rewrite the per-provider system-prompt prefix, or adjust routing weight so the offending provider drops out until reliability recovers. Whitelist constraints stay enforced, the optimizer never selects an off-list model. Policies are versioned with automatic rollback on eval-score regression. Every policy change, every off-list rejection, every routing decision is in the audit log for compliance review. Protect runs alongside, adding 65 ms text median time-to-label text-evaluation latency (arXiv 2510.13351), to catch prompt-injection content travelling through cross-provider routes.

Net effect: a team that starts with a static “cheap upstream for short turns, expensive for long” rule typically ends after four weeks with a policy capturing three to five turn-shapes per upstream, picking the cheapest whitelisted model that scored above the quality bar, and shifting traffic when a provider regresses.

The three building blocks are open source under Apache 2.0: traceAI, ai-evaluation, and agent-opt. Hosted Agent Command Center adds the failure-cluster view, live Protect, RBAC, SOC 2 Type II certified, BAA on Enterprise, and AWS Marketplace for procurement, with BYOC available when hosted is the wrong shape for the customer’s data-residency policy.


What we did not include

Three gateways show up in other 2026 enterprise listicles that we deliberately left out:

  • Helicone. Strong native-Anthropic observability and procurement story is improving, but multi-provider translation depth for Claude Code on non-Anthropic upstreams is thinner than the picks above.
  • OpenRouter. The widest model catalogue in the category, but the consumer-facing shape of the product makes enterprise governance (RBAC, per-developer chargeback, SOC 2, BAA) a custom-work conversation.
  • Cloudflare AI Gateway. Strong primitives and fast edge, but the Anthropic-protocol-inbound with non-Anthropic-upstream story is still developing as of May 2026.

All three are worth a re-look later in 2026.



Sources

  • Anthropic Messages API protocol, docs.anthropic.com/en/api/messages
  • Anthropic prompt caching, docs.anthropic.com/en/docs/build-with-claude/prompt-caching
  • Claude Code documentation, claude.ai/docs/claude-code
  • Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
  • Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text / 107 ms image median time-to-label)
  • Portkey AI gateway, portkey.ai
  • Maxim Bifrost, github.com/maximhq/bifrost
  • Kong AI Gateway, konghq.com/products/kong-ai-gateway
  • LiteLLM proxy, github.com/BerriAI/litellm
  • LiteLLM Enterprise, litellm.ai/enterprise

Frequently asked questions

Why does my enterprise need a gateway to use non-Anthropic models in Claude Code instead of running OpenAI's coding-agent SDK directly?
Claude Code is the UX engineering wants to keep — keyboard shortcuts, tool primitives, and session model are already in workflows. Translating Anthropic to OpenAI keeps that UX while changing the model underneath. Procurement for one CLI plus one gateway is also cleaner than two coding agents with different governance footprints.
Which non-Anthropic models clear enterprise procurement most cleanly?
Bedrock-hosted Claude variants (protocol identical to native Anthropic, procurement on AWS terms), followed by Azure OpenAI for GPT-5 and GPT-5-mini through the existing Azure landing zone. Gemini through Vertex is a strong third when GCP is already approved. Direct OpenAI or direct Google AI Studio each add a separate vendor risk file.
How does the gateway enforce a model whitelist?
The approved upstream list lives in the gateway's policy config. Every Claude Code request is matched against the policy; off-list calls return a structured error and are logged. New model IDs from upstream vendors do not propagate without explicit approval, preventing upstreams from quietly enabling a model security has not reviewed.
Can Claude Code run against self-hosted Llama 3 or an internal fine-tune?
Yes, through a gateway supporting OpenAI-compatible self-hosted endpoints. Future AGI BYOC, Kong AI Gateway, LiteLLM Enterprise, and Maxim Bifrost all route Claude Code's Anthropic-shaped requests to internal vLLM, Triton, or Ollama endpoints. The translation path is the same one used for hosted OpenAI; the upstream URL points inside the VPC.
What does a procurement-ready vendor posture actually require?
At minimum: SOC 2 Type II (or credible in-progress timeline), a signable DPA, a subprocessor list, a BAA for healthcare adjacencies, contractual SLA, and ideally an AWS or GCP Marketplace listing. ISO 27001 is increasingly expected. Standard questionnaires (CAIQ, SIG Lite) should be available without a custom NDA round.
Is sending source code through a translation gateway to a non-Anthropic provider safe?
For hosted gateways the data path is gateway to upstream; both already see the code. The trust decision is on the upstream, not the gateway. If compliance forbids the code reaching a specific upstream, restrict the gateway's whitelist. If compliance forbids the code leaving the network, use BYOC or self-hosted and route only to in-VPC inference.
How is Future AGI Agent Command Center different from Portkey for this workload?
Portkey is a hosted translation and observation layer with mature virtual keys, RBAC, and SOC 2 evidence. Future AGI ships the same procurement artefacts plus an optimization layer — translation traces feed back into routing-policy updates so the gateway gets better at picking the right whitelisted upstream over time.
Related Articles
View all
Top 5 Tools for Claude Code Cost Management in 2026
Guides

Five tools for Claude Code cost management in 2026 — four gateways plus the native Anthropic dashboard and a FinOps platform — scored on attribution, chargeback, caps, routing, cache observability, FinOps integration, and audit trail.

NVJK Kartik
NVJK Kartik ·
18 min