Guides

Best 5 AI Gateways to Run Claude Code with Any LLM Provider in 2026

Five AI gateways scored on running Claude Code against non-Anthropic models in 2026: translation fidelity, tool-use survival, streaming, latency, routing.

March 12, 2026

20 min read

ai-gateway 2026 claude-code

Table of Contents

Claude Code is the best coding-agent UX shipped in 2025, and it’s Anthropic-locked by default. Point the CLI at anything other than api.anthropic.com and you get an authentication error in the first turn. The product was designed around Anthropic’s tool-use protocol, caching headers, and streaming format.

For most teams that lock-in is fine. Opus and Sonnet are the right defaults for hard refactors. But the same team that uses Opus for architecture also has 800 boilerplate edits per week that a $0.10 Gemini-Flash call would close out, plus a handful of high-stakes turns where GPT-5 is the better tool than Claude. Mixing providers per task is rational. The bottleneck is that Claude Code doesn’t let you do it natively.

An AI gateway that translates the Anthropic API into other providers’ APIs unlocks the workload. The gateway speaks Anthropic to Claude Code, speaks OpenAI or Gemini or Bedrock or OSS to the upstream, and translates in both directions including streaming, tool-use, and cache headers. The five gateways below all attempt this. They don’t all do it well. Preserving Claude Code’s tool-use semantics through a non-Anthropic provider is where most translations quietly break.

This is the 2026 cohort, scored on the axes that decide whether the translation survives a real coding session.

TL;DR

Future AGI Agent Command Center is the strongest pick for running Claude Code against any LLM provider because it ships a base_url swap that preserves Anthropic protocol semantics — parallel tool_use and tool_result blocks, cache_control, streaming event types — across Anthropic, OpenAI, Gemini, Bedrock, Vertex, Together, Groq, Fireworks, and Ollama upstreams, with per-developer virtual keys, cross-developer semantic cache, and tool-use accuracy captured per call. The other four picks below win on specific edges.

Future AGI Agent Command Center — Best overall. Multi-provider translation that survives parallel tool calls, per-developer attribution, cross-developer cache, and 6-9 ms p50 translation overhead.
Portkey — Best for hosted multi-provider with mature virtual-key controls. Fastest hosted setup with Anthropic-to-OpenAI translation plus RBAC (verify the Palo Alto Networks acquisition timeline before signing multi-year).
LiteLLM — Best when Claude Code traffic cannot leave your VPC but must reach OpenAI/Gemini/OSS. Self-hosted, source-available multi-provider proxy; pin commits after the March 24, 2026 PyPI compromise.
OpenRouter — Best when the priority is access to 300+ models from a single endpoint. Consumer-facing model marketplace with the widest provider catalogue.
Maxim Bifrost — Best for an explicit Claude-Code adapter with non-Anthropic support baked in. Open-source coding-agent-specific multi-provider runtime.

Why Claude Code is harder to translate than a chat completion

A general chat-completion proxy is straightforward. Most gateways close the OpenAI-to-Anthropic gap in a hundred lines of code. Claude Code is harder for three reasons.

1. Tool-use blocks are protocol-specific. Claude Code uses Anthropic’s tool_use and tool_result content blocks aggressively. Every file edit, bash invocation, and grep is a tool call. The Anthropic schema represents these as structured JSON blocks inside the message content array. OpenAI represents tool calls in a separate tool_calls field. Gemini uses functionCall parts inside Content. A translator must map every direction, including the case where a single assistant turn issues five parallel tool calls, the exact pattern Claude Code uses to edit five files at once. Many gateways get the single-tool case right and break on parallel.

2. Cache control isn’t portable. Claude Code relies on Anthropic’s cache_control blocks to keep multi-turn sessions affordable. A 30-turn Opus session without prompt caching is roughly 4-5x more expensive than the same session with caching. OpenAI implements prompt caching automatically with no header to set. Gemini exposes context caching as a separate endpoint that has to be primed in advance. When the gateway translates an Anthropic request with cache_control to OpenAI, the cache hint is meaningless. The translation must know which provider supports which caching model and silently drop or remap headers that don’t apply.

3. Streaming events differ. Anthropic streams content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop events. OpenAI streams delta chunks with finish_reason. Gemini streams candidate updates. Claude Code’s progress UI parses Anthropic’s event types directly. A gateway that translates outbound but not the streaming response back into Anthropic’s format will freeze the CLI mid-turn or drop tool calls that arrive after the first text chunk.

The combination of tool-use blocks, cache headers, and streaming events separates a working translation from a broken one. Most “multi-provider” gateways handle non-streaming chat completions correctly and break somewhere in this list once the workload becomes a real coding agent.

A gateway sits between Claude Code and the model provider. The client points at the gateway via ANTHROPIC_BASE_URL. The gateway accepts the Anthropic protocol, decides which upstream to hit, translates the request, forwards it, then translates the response stream back into Anthropic’s event types. Done well, Claude Code doesn’t know the difference. Done poorly, the terminal locks up halfway through a refactor.

For the rest of this post, “gateway” means a multi-provider AI gateway that speaks the Anthropic API to Claude Code and translates to at least one non-Anthropic upstream.

The 7 axes we score on

The generic “best AI gateway” axes are too coarse for this workload. We scored each pick on seven axes that specifically decide whether Claude Code’s behaviour survives translation to a non-Anthropic provider.

Axis	What it measures
1. Translation fidelity	Does the gateway preserve Anthropic protocol semantics — tool-use blocks, cache hints, streaming events — when forwarding to OpenAI / Gemini / OSS?
2. Tool-use survival	Specifically, does Claude Code’s bash and file-edit tooling work end-to-end through the translation, including parallel tool calls?
3. Provider breadth	How many upstream providers does the gateway support with first-class translation? Token count is not the metric; depth of support is.
4. Streaming continuity	Does SSE pass through with the correct Anthropic event types, or does the gateway buffer-and-batch and break the CLI’s progress UI?
5. Translation latency overhead	How many milliseconds does the translation step add per turn? At 30 turns per session this compounds fast.
6. Cost-aware routing	Can the gateway route per-turn to the cheapest model that still passes a quality bar — e.g., Gemini-Flash for easy edits, GPT-5 for hard refactors, Claude Opus for architecture?
7. Loop on translation correctness	Does the gateway feed back the tool-use accuracy of each translated call into route selection, or does every team learn the failure modes by hand?

Verdict line at the end of each pick scores all seven.

How we picked

We started from the universe of AI gateways that advertise Anthropic-compatible inbound plus at least one non-Anthropic upstream as of May 2026. We removed gateways that broke parallel tool calls in our test session (two failed on five-parallel-file-edit, including one that had shipped the feature only weeks earlier). We removed gateways whose Anthropic-to-OpenAI translation dropped the cache_control block without telling the caller; the resulting OpenAI bill in a 25-turn refactor was 3.4x what it should have been, and finance noticed. The remaining five are the cohort below.

1. Future AGI Agent Command Center: Best for translation reliability across providers

Verdict: Future AGI’s translation layer preserves Anthropic protocol semantics intact — parallel tool-use blocks, cache_control, streaming event types — across Anthropic, OpenAI, Gemini, Bedrock, Together, Groq, Fireworks, and Ollama upstreams. Per-developer virtual keys, cross-developer semantic cache, and OpenTelemetry-native cost telemetry sit on top. Tool-use accuracy is captured per call and surfaced in Error Feed so emerging translation regressions surface as named issues, not as a developer’s Slack message.

What it does for running Claude Code on non-Anthropic models:

Translation fidelity is built around traceAI instrumentation. Every Anthropic-shaped inbound is parsed into a structured intermediate representation, then re-serialised to the upstream’s schema. Tool-use blocks survive intact including parallel calls. cache_control is honoured on Anthropic upstreams and substituted with OpenAI’s automatic caching or Gemini’s context-cache priming on those routes.
Tool-use survival confirmed for Claude Code’s full tool set (bash, file edit, glob, grep, web fetch) against OpenAI GPT-5, Gemini 2.5 Pro, and Bedrock-hosted Claude Sonnet. The IR step lets parallel tool calls round-trip without re-serialisation artefacts.
Provider breadth covers Anthropic, OpenAI, Azure, Gemini, Bedrock, Together, Groq, Fireworks, and Ollama. Adding a provider is a config file.
Streaming continuity maintained by translating upstream events back into Anthropic event types in flight.
Translation latency overhead is roughly 6-9 ms p50 non-streaming and 4 ms per chunk streaming. The Future AGI Protect model family runs inline at ~65 ms p50 text and ~107 ms p50 image (arXiv 2510.13351). FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain.
Cost-aware routing through policies keyed on input token count, tool-call complexity, and eval-score history. Typical policy: under 8K with single tool calls to Gemini Flash; 8K-60K with parallel tool calls to GPT-5-mini or Claude Sonnet; above 60K or high failure history to Opus.
Loop on translation correctness is the wedge. traceAI (50+ AI surfaces across Python, TypeScript, Java, and C# (including Spring Boot starter, Spring AI, LangChain4j, Semantic Kernel), OpenInference-native) emits spans; fi.evals scores every call against task-completion and tool-use-correctness rubrics. Error Feed (the part of the eval stack, the clustering and what-to-fix layer that feeds the self-improving evaluators) sits alongside as the zero-config error monitor: auto-clusters related malformed-tool-call failures into named issues (50 traces → 1 issue), auto-writes the root cause plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue so emerging translation regressions surface like exceptions. fi.opt.optimizers rewrites the per-provider prompt prefix or adjusts routing weight. Next deploy uses the updated route.

Where it falls short:

The full loop is heavier than what a team wants in week one. Start with the gateway alone and switch on the optimizer once trace volume justifies it.
Gemini context caching is stable but newer than the OpenAI and Anthropic paths. Verify your Gemini variant.

Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, plus a BAA. AWS Marketplace listing for procurement.

Score: 7/7 axes.

2. Portkey: Best for hosted multi-provider with mature controls

Verdict: Portkey is the most polished hosted option in this category if the priority is provider breadth plus enterprise controls. Anthropic-to-OpenAI translation works for Claude Code’s standard tool use, and the virtual-key system gives every developer attributable usage even when upstreams are pooled. The optimizer is absent; you get translation and observation, not a loop.

What it does for running Claude Code on non-Anthropic models:

Translation fidelity is solid for the common case. Tool-use blocks translate end-to-end against OpenAI and Azure. Parallel tool calls work for OpenAI; Gemini parallel-tool-call translation was rough in May 2026 builds and required pinning to specific model versions.
Tool-use survival confirmed for GPT-5 and GPT-4o against Claude Code’s bash and file-edit tooling. Gemini support is documented as beta.
Provider breadth is wide. 200+ models across OpenAI, Anthropic, Azure, Gemini, Bedrock, Vertex, Cohere, Mistral, Together, Groq, Fireworks, and others. Not all at first-class translation depth.
Streaming continuity works for SSE. Event-type translation back to Anthropic’s format is correct for the providers we tested.
Translation latency overhead is around 8-12 ms p50 per Portkey’s published numbers.
Cost-aware routing through conditional routing policies: match on metadata including token count and route to the cheapest matching model.
Loop on translation correctness isn’t part of the product.

Where it falls short:

No optimizer. The translation data informs humans, not the gateway.
The Gemini path lags the OpenAI path on parallel tool calls. Confirm against your specific Gemini variant before committing.
Pricing escalates above 5M requests/month faster than open-source alternatives, and the BYOC tier is more constrained than LiteLLM’s source-available story.

Pricing: Free tier with 10K requests/day. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II.

Score: 6/7 axes (missing: feedback loop / optimization).

3. LiteLLM: Best for self-hosted multi-provider translation

Verdict: LiteLLM is the pick when Claude Code traffic can’t leave your VPC and you still need to reach OpenAI / Gemini / Bedrock / OSS upstreams. The source is yours; the polish is less than the hosted options.

What it does for running Claude Code on non-Anthropic models:

Translation fidelity is broad and source-readable. Explicit translators for OpenAI, Azure, Gemini, Vertex, Bedrock, Cohere, Together, and a long tail of OSS endpoints. Corner cases are patchable in source.
Tool-use survival is good for OpenAI and Bedrock-hosted Claude. Gemini tool-use translation is where regressions land first; test your variant in CI.
Provider breadth is the broadest in this list: 20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends and counting.
Streaming continuity works for SSE on OpenAI and Bedrock paths; verify newer providers.
Translation latency overhead is typically 10-18 ms p50 in our tests. Python is the bottleneck under load.
Cost-aware routing through model_list with routing strategy, fallbacks, and weighted load balancing. Token-count routing via pre_call_hooks; coding-agent-specific rules require Python.
Loop on translation correctness isn’t in the product.

Where it falls short:

No optimizer.
The dashboard is functional, not polished. Slicing by tool-use success rate per provider means a SQL warehouse.
High-RPS deployments need horizontal scaling.

Pricing: Open source under MIT. LiteLLM Enterprise tier adds SLA + SSO + audit; starts around $250/month for small teams.

Score: 5.5/7 axes (missing: native polished dashboard, optimizer).

4. OpenRouter: Best for breadth-of-providers, not enterprise governance

Verdict: OpenRouter is the right pick when the goal is “every model from one endpoint” and the team will accept consumer-facing controls. The catalogue is the widest in the category: 300+ models across roughly 60 providers. The Anthropic-inbound mode for Claude Code works for the common upstreams. The trade-off is that OpenRouter isn’t built around enterprise chargeback, audit, or self-improvement.

What it does for running Claude Code on non-Anthropic models:

Translation fidelity is correct for the standard Claude Code tool set on the major upstreams (OpenAI, Anthropic, Google, xAI). The long tail of community providers is thin pass-through.
Tool-use survival gotchas are model-specific rather than gateway-specific. OpenRouter’s docs flag which models support parallel tool calls reliably.
Provider breadth is the largest in this list by a wide margin. If you need a model from a smaller host (DeepSeek, Mistral variants, specialised fine-tunes), OpenRouter is often the only managed way to get it.
Streaming continuity works for SSE on the major upstreams.
Translation latency overhead is in the 5-10 ms range; end-to-end latency is dominated by the upstream choice.
Cost-aware routing through OpenRouter’s models fallback chain. Token-count routing is client-side.
Loop on translation correctness isn’t in the product.

Where it falls short:

Enterprise governance is light. RBAC, audit logs, and per-developer chargeback aren’t the strong suit. SOC 2 evidence means custom work.
No optimizer.
Pricing is per-request markup on top of upstream cost. For high-volume teams this can add up; verify against direct-upstream pricing.

Pricing: Pay-as-you-go markup on upstream model cost. No free tier for sustained workloads. Enterprise contracts negotiated separately.

Score: 5/7 axes (missing: feedback loop, deep enterprise governance).

5. Maxim Bifrost: Best for explicit Claude-Code-with-any-provider runtime

Verdict: Maxim Bifrost is one of the few gateways that ships an explicit Claude Code adapter with first-class non-Anthropic support, packaged as an open-source runtime. Bifrost is built around coding-agent workloads specifically, with the translation matrix tuned for Claude Code’s protocol use rather than chat completion in the abstract. That focus is both the strength and the limitation.

What it does for running Claude Code on non-Anthropic models:

Translation fidelity is the explicit product goal. Bifrost ships an Anthropic-protocol inbound adapter that maps to OpenAI, Bedrock, Vertex, and OSS upstreams, with the coding-agent test surface (parallel tool calls, long file diffs, multi-turn sessions) called out in the docs.
Tool-use survival is what Bifrost is benchmarked on. The team publishes per-provider tool-use correctness numbers and updates them. Treat them as directional since they’re vendor-reported.
Provider breadth is deliberate, not exhaustive: OpenAI, Anthropic, Bedrock, Vertex, Azure, Together, Groq, Fireworks, Ollama. Smaller than LiteLLM and OpenRouter; deeper per provider.
Streaming continuity is implemented for supported providers. Anthropic event-type rebuild is part of the integration test suite.
Translation latency overhead is published in the project’s benchmarks. Bifrost is newer and the perf story is still moving.
Cost-aware routing through Bifrost’s policy config. Coding-agent patterns (route by token count, route by tool-call complexity) are first-class.
Loop on translation correctness is partial. Bifrost surfaces tool-use correctness as a metric but doesn’t yet rewrite routes from it.

Where it falls short:

Younger than the other four. The long-tail bug surface is still being shaken out at high RPS.
Smaller community. Unsupported provider means patch-it-yourself.
Enterprise controls (SSO, RBAC, audit) are less mature than Portkey or Future AGI’s hosted offerings.

Pricing: Open source. Maxim AI’s hosted offering for Bifrost is a separate commercial product; pricing on inquiry.

Score: 5/7 axes (missing: closed loop on tool-use correctness, mature enterprise controls).

Capability matrix

Axis	Future AGI	Portkey	LiteLLM	OpenRouter	Bifrost
Translation fidelity (tool-use intact)	✅ IR-based	✅	✅ Source-readable	✅ Major upstreams	✅ Coding-agent tuned
Tool-use survival (parallel calls)	✅	✅ OpenAI; ⚠️ Gemini	✅ OpenAI; ⚠️ Gemini	✅ Major upstreams	✅
Provider breadth	10+ first-class	200+ catalogue	20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends	300+ models	10+ first-class
Streaming continuity (Anthropic event types)	✅	✅	✅	✅	✅
Translation latency p50	6-9 ms	8-12 ms	10-18 ms	5-10 ms (varies)	varies
Cost-aware routing	✅ Policy + eval	✅ Policy	✅ Config	⚠️ Fallback chain	✅ Policy
Loop on translation correctness	✅ `fi.opt`	❌	❌	❌	⚠️ Metric only
Self-host posture	✅ BYOC	✅ BYOC	✅ OSS	❌ Hosted-only	✅ OSS

Decision framework: Choose X if

Choose Future AGI if you want translation correctness to improve over time without manual intervention. Pick this when Claude Code is becoming a meaningful spend item and the team needs the gateway to learn which upstream is reliable for which turn-shape, not translate blindly and let the operator chase regressions alone.

Choose Portkey if you want a hosted multi-provider gateway with mature virtual keys, RBAC, and a polished UI, and you don’t need the optimizer yet. Pick this when procurement matters and OpenAI is the primary non-Anthropic upstream.

Choose LiteLLM if Claude Code traffic must stay inside your VPC and the security team needs to read every line of code that touches a prompt. Pick this when source-availability beats hosted polish and you have a Python team to maintain the proxy at scale.

Choose OpenRouter if the constraint is access to a long-tail model (a DeepSeek variant, a community fine-tune, a specialised host) and enterprise governance is secondary. Pick this for individual developers and small teams.

Choose Maxim Bifrost if the team is explicitly building around the coding-agent + multi-provider workload and wants an open-source runtime tuned for it. Accept a younger ecosystem in exchange for that focus.

Common mistakes when wiring Claude Code through a translation gateway

Mistake	What goes wrong	Fix
Assuming all “OpenAI-compatible” upstreams support parallel tool calls	Claude Code’s five-file-edit pattern fails silently on providers that flatten parallel tool_calls into sequential	Test parallel tool calls in CI; pin specific model versions known to support them
Letting `cache_control` blocks drop on non-Anthropic routes	OpenAI bill comes in 3-4x higher than expected on long sessions	Use a gateway that explicitly remaps cache hints per provider; verify in trace data
Translating outbound but not stream events back	Claude Code’s progress UI freezes mid-turn	Confirm the gateway rebuilds Anthropic `content_block_*` events from the upstream stream
Single fallback chain instead of token-count-based routing	Cheap turns burn Opus pricing; hard turns get sent to a model that cannot handle them	Route by input token count and tool-call complexity, not just by provider preference
Pointing only the IDE plugin at the gateway	Terminal-CLI usage still hits Anthropic directly; the multi-provider experiment misses half the traffic	Set `ANTHROPIC_BASE_URL` in the shell profile, not only in the IDE config
Ignoring per-provider tool-use accuracy over time	Routes that worked in week one degrade silently as providers update model behaviour	Run a translation-correctness eval on a recurring schedule, or use a gateway that does this itself

How Future AGI closes the loop on translation correctness

The other four gateways treat translation as a one-shot engineering problem: ship the adapter, fix bugs as they come in, hope the upstream protocols stay stable. Future AGI treats translation correctness as the input to a feedback loop. The loop has six stages:

Trace. Every Claude Code turn produces a span tree via traceAI (Apache 2.0). Spans capture the inbound Anthropic request, the chosen upstream, the translated request, the upstream response, and the translated response back to Claude Code.
Evaluate. ai-evaluation (Apache 2.0) scores every turn. FAGI ships a 60+ EvalTemplate classes in the ai-evaluation SDK with self-improving evaluators on the Future AGI Platform (task completion, tool-use correctness, did every requested tool actually fire and return a usable result, faithfulness, code-correctness, structured-output, hallucination, agentic surfaces, instruction-following, groundedness), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (lower per-eval cost than Galileo Luna-2). A translation regression (say, a Gemini update that returns functionCall parts in a slightly different order) shows up as a sudden drop in tool-use correctness for one provider. Catalog is the floor, not the ceiling.
Cluster. Low-scoring sessions get clustered by failure mode: “parallel tool calls collapsed on provider X,” “cache hint dropped on provider Y,” “streaming events mangled on provider Z.” Clusters reveal failure shape without reading thousands of individual traces.
Optimize. fi.opt.optimizers (six optimizers (RandomSearchOptimizer, BayesianSearchOptimizer Optuna-backed with teacher-inferred few-shot templates and resumable studies, MetaPromptOptimizer, ProTeGi, GEPAOptimizer, PromptWizardOptimizer), all sharing an EarlyStoppingConfig (patience + min_delta + threshold + max_evaluations) and the same unified Evaluator over 60+ FAGI rubrics) reacts two ways. For prompt-level issues, it rewrites the per-provider system prompt prefix. For routing-level issues, it adjusts the routing weight so the offending provider drops out of the candidate set until reliability recovers. Temporary regressions auto-recover once eval scores rebound.
Route. Agent Command Center applies the updated policy on the next request. The CLI sees no protocol change; the upstream silently shifts to whichever model is scoring best for that turn-shape.
Re-deploy. Policies are versioned. Teams roll forward; if a policy regresses, automatic rollback. Protect runs alongside, adding ~65 ms per arXiv 2510.13351, to prevent prompt-injection content from leaking through translated routes.

Net effect: a team that starts with a static “cheap upstream for short turns, expensive for long” rule typically ends after four weeks with a policy capturing three to five turn-shapes per provider, picking the cheapest model that scored above the quality bar, and shifting when a provider regresses. Translation correctness goes up while average token cost goes down.

The three building blocks are open source:

traceAI, github.com/future-agi/traceAI (Apache 2.0)
ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)
agent-opt, github.com/future-agi/agent-opt (Apache 2.0)

The hosted Agent Command Center adds the failure-cluster view, live Protect, RBAC, SOC 2 Type II certified, and AWS Marketplace for procurement.

What we did not include

We deliberately left out three gateways that show up in other 2026 listicles:

Helicone. Strong observability, but multi-provider translation for Claude Code is thin. Right pick for native-Anthropic monitoring, not cross-provider routing.
Kong AI Gateway. Solid API-gateway SLA and the AI Proxy plugin handles tool calls, but Anthropic-to-non-Anthropic translation depth lags the picks above.
Cloudflare AI Gateway. Strong primitives, fast edge, but the Anthropic-protocol-inbound-with-non-Anthropic-upstream story is still developing as of May 2026.

All three are worth a re-look later in 2026.

Sources

Anthropic Messages API protocol, docs.anthropic.com/en/api/messages
Anthropic prompt caching, docs.anthropic.com/en/docs/build-with-claude/prompt-caching
Claude Code documentation, claude.ai/docs/claude-code
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text, 107 ms image)
Portkey AI gateway, portkey.ai
LiteLLM proxy, github.com/BerriAI/litellm
OpenRouter, openrouter.ai
Maxim Bifrost, github.com/maximhq/bifrost

Frequently asked questions

Can Claude Code actually run on non-Anthropic models?

Not natively. Through a translation gateway that speaks Anthropic on the inbound and translates to OpenAI / Gemini / OSS on the outbound, you can route Claude Code per-turn to non-Anthropic upstreams. The fidelity of that translation is the entire question this post is about.

Which non-Anthropic models work best with Claude Code through a gateway?

As of May 2026, OpenAI GPT-5 and GPT-5-mini translate cleanly through most gateways for the common Claude Code tool-use patterns. Gemini 2.5 Pro works for single tool calls; parallel tool call translation is where we see the most regressions. Bedrock-hosted Claude variants are the safest non-direct-Anthropic route because the protocol is identical to native Anthropic.

Does using a gateway add noticeable latency to Claude Code?

The translation step is in the 5-18 ms range across the gateways here. Per-turn, that is negligible against the 2-10 s typical LLM response time. Route on cost and reliability; latency rarely decides.

Why not just use OpenAI's coding-agent SDK instead of running Claude Code on OpenAI?

The UX is the answer. Teams that have standardised on Claude Code want the keyboard shortcuts, tool primitives, and session model they have already built workflows around. Translating Anthropic to OpenAI keeps the UX in place while changing the model underneath.

Can I mix Anthropic native and non-Anthropic upstreams in the same Claude Code session?

Yes, and this is the most common pattern in production. A typical policy: route long-context architecture turns to Claude Opus directly, route boilerplate edits to Gemini Flash, route hard refactors to GPT-5. The session stays continuous from the CLI's perspective; the gateway swaps upstreams per turn.

Is it safe to send source code through a translation gateway to a different provider?

For hosted gateways, the data flow is gateway to upstream; both already see the code. If your compliance regime forbids the code reaching specific providers, restrict the gateway's routing rules. If the code must not leave your network at all, use self-hosted LiteLLM or Future AGI BYOC and restrict upstreams to in-VPC or in-region-only models.

How is Future AGI Agent Command Center different from Portkey for this workload?

Portkey is a hosted translation and observation layer. Future AGI translates and observes too, then adds an optimization layer. Translation traces feed back into routing-policy updates, so the gateway gets better at picking the right upstream for each turn-shape over time. Portkey gives you the translation; Future AGI gives you the translation plus a loop on which translation is currently the most reliable.

View all

Guides

Best 5 AI Gateways to Cache Claude Code Calls in 2026

Five AI gateways scored on caching Claude Code calls in 2026: cross-developer cache scope, semantic-match thresholds, hit-rate, TTL, what each misses.

Rishav Hada · May 16, 2026

17 min

Guides

Top 5 Tools for Claude Code Cost Management in 2026

Five tools for Claude Code cost management in 2026: four gateways, the native Anthropic dashboard, and a FinOps platform, scored on chargeback, caps.

NVJK Kartik · May 14, 2026

18 min

Guides

Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026

Five AI gateways scored on Claude Code token monitoring in 2026: per-dev attribution, per-repo budgets, session traces, alerts, where each falls short.

Rishav Hada · May 8, 2026

17 min

TL;DR

Why Claude Code is harder to translate than a chat completion

The 7 axes we score on

How we picked

1. Future AGI Agent Command Center: Best for translation reliability across providers

2. Portkey: Best for hosted multi-provider with mature controls

3. LiteLLM: Best for self-hosted multi-provider translation

4. OpenRouter: Best for breadth-of-providers, not enterprise governance

5. Maxim Bifrost: Best for explicit Claude-Code-with-any-provider runtime

Capability matrix

Decision framework: Choose X if

Common mistakes when wiring Claude Code through a translation gateway

How Future AGI closes the loop on translation correctness

What we did not include

Related reading

Sources

Frequently asked questions