Best 5 AI Gateways to Run Claude Code with Any LLM Provider in 2026
Five AI gateways scored on running Claude Code against non-Anthropic models in 2026: translation fidelity, tool-use survival, streaming, latency overhead, and per-task routing across providers.
Table of Contents
Claude Code is the best coding-agent UX shipped in 2025, and it’s Anthropic-locked by default. Point the CLI at anything other than api.anthropic.com and you get an authentication error in the first turn. The product was designed around Anthropic’s tool-use protocol, caching headers, and streaming format.
For most teams that lock-in is fine. Opus and Sonnet are the right defaults for hard refactors. But the same team that uses Opus for architecture also has 800 boilerplate edits per week that a $0.10 Gemini-Flash call would close out, plus a handful of high-stakes turns where GPT-5 is the better tool than Claude. Mixing providers per task is rational. The bottleneck is that Claude Code doesn’t let you do it natively.
An AI gateway that translates the Anthropic API into other providers’ APIs unlocks the workload. The gateway speaks Anthropic to Claude Code, speaks OpenAI or Gemini or Bedrock or OSS to the upstream, and translates in both directions including streaming, tool-use, and cache headers. The five gateways below all attempt this. They don’t all do it well. Preserving Claude Code’s tool-use semantics through a non-Anthropic provider is where most translations quietly break.
This is the 2026 cohort, scored on the axes that decide whether the translation survives a real coding session.
TL;DR
Future AGI Agent Command Center is the strongest pick for running Claude Code against any LLM provider because it ships a base_url swap that preserves Anthropic protocol semantics — parallel tool_use and tool_result blocks, cache_control, streaming event types — across Anthropic, OpenAI, Gemini, Bedrock, Vertex, Together, Groq, Fireworks, and Ollama upstreams, with per-developer virtual keys, cross-developer semantic cache, and tool-use accuracy captured per call. The other four picks below win on specific edges.
- Future AGI Agent Command Center — Best overall. Multi-provider translation that survives parallel tool calls, per-developer attribution, cross-developer cache, and 6-9 ms p50 translation overhead.
- Portkey — Best for hosted multi-provider with mature virtual-key controls. Fastest hosted setup with Anthropic-to-OpenAI translation plus RBAC (verify the Palo Alto Networks acquisition timeline before signing multi-year).
- LiteLLM — Best when Claude Code traffic cannot leave your VPC but must reach OpenAI/Gemini/OSS. Self-hosted, source-available multi-provider proxy; pin commits after the March 24, 2026 PyPI compromise.
- OpenRouter — Best when the priority is access to 300+ models from a single endpoint. Consumer-facing model marketplace with the widest provider catalogue.
- Maxim Bifrost — Best for an explicit Claude-Code adapter with non-Anthropic support baked in. Open-source coding-agent-specific multi-provider runtime.
Why Claude Code is harder to translate than a chat completion
A general chat-completion proxy is straightforward. Most gateways close the OpenAI-to-Anthropic gap in a hundred lines of code. Claude Code is harder for three reasons.
1. Tool-use blocks are protocol-specific. Claude Code uses Anthropic’s tool_use and tool_result content blocks aggressively. Every file edit, bash invocation, and grep is a tool call. The Anthropic schema represents these as structured JSON blocks inside the message content array. OpenAI represents tool calls in a separate tool_calls field. Gemini uses functionCall parts inside Content. A translator must map every direction, including the case where a single assistant turn issues five parallel tool calls, the exact pattern Claude Code uses to edit five files at once. Many gateways get the single-tool case right and break on parallel.
2. Cache control isn’t portable. Claude Code relies on Anthropic’s cache_control blocks to keep multi-turn sessions affordable. A 30-turn Opus session without prompt caching is roughly 4-5x more expensive than the same session with caching. OpenAI implements prompt caching automatically with no header to set. Gemini exposes context caching as a separate endpoint that has to be primed in advance. When the gateway translates an Anthropic request with cache_control to OpenAI, the cache hint is meaningless. The translation must know which provider supports which caching model and silently drop or remap headers that don’t apply.
3. Streaming events differ. Anthropic streams content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop events. OpenAI streams delta chunks with finish_reason. Gemini streams candidate updates. Claude Code’s progress UI parses Anthropic’s event types directly. A gateway that translates outbound but not the streaming response back into Anthropic’s format will freeze the CLI mid-turn or drop tool calls that arrive after the first text chunk.
The combination of tool-use blocks, cache headers, and streaming events separates a working translation from a broken one. Most “multi-provider” gateways handle non-streaming chat completions correctly and break somewhere in this list once the workload becomes a real coding agent.
A gateway sits between Claude Code and the model provider. The client points at the gateway via ANTHROPIC_BASE_URL. The gateway accepts the Anthropic protocol, decides which upstream to hit, translates the request, forwards it, then translates the response stream back into Anthropic’s event types. Done well, Claude Code doesn’t know the difference. Done poorly, the terminal locks up halfway through a refactor.
For the rest of this post, “gateway” means a multi-provider AI gateway that speaks the Anthropic API to Claude Code and translates to at least one non-Anthropic upstream.
The 7 axes we score on
The generic “best AI gateway” axes are too coarse for this workload. We scored each pick on seven axes that specifically decide whether Claude Code’s behaviour survives translation to a non-Anthropic provider.
| Axis | What it measures |
|---|---|
| 1. Translation fidelity | Does the gateway preserve Anthropic protocol semantics — tool-use blocks, cache hints, streaming events — when forwarding to OpenAI / Gemini / OSS? |
| 2. Tool-use survival | Specifically, does Claude Code’s bash and file-edit tooling work end-to-end through the translation, including parallel tool calls? |
| 3. Provider breadth | How many upstream providers does the gateway support with first-class translation? Token count is not the metric; depth of support is. |
| 4. Streaming continuity | Does SSE pass through with the correct Anthropic event types, or does the gateway buffer-and-batch and break the CLI’s progress UI? |
| 5. Translation latency overhead | How many milliseconds does the translation step add per turn? At 30 turns per session this compounds fast. |
| 6. Cost-aware routing | Can the gateway route per-turn to the cheapest model that still passes a quality bar — e.g., Gemini-Flash for easy edits, GPT-5 for hard refactors, Claude Opus for architecture? |
| 7. Loop on translation correctness | Does the gateway feed back the tool-use accuracy of each translated call into route selection, or does every team learn the failure modes by hand? |
Verdict line at the end of each pick scores all seven.
How we picked
We started from the universe of AI gateways that advertise Anthropic-compatible inbound plus at least one non-Anthropic upstream as of May 2026. We removed gateways that broke parallel tool calls in our test session (two failed on five-parallel-file-edit, including one that had shipped the feature only weeks earlier). We removed gateways whose Anthropic-to-OpenAI translation dropped the cache_control block without telling the caller; the resulting OpenAI bill in a 25-turn refactor was 3.4x what it should have been, and finance noticed. The remaining five are the cohort below.
1. Future AGI Agent Command Center: Best for translation reliability across providers
Verdict: Future AGI’s translation layer preserves Anthropic protocol semantics intact — parallel tool-use blocks, cache_control, streaming event types — across Anthropic, OpenAI, Gemini, Bedrock, Together, Groq, Fireworks, and Ollama upstreams. Per-developer virtual keys, cross-developer semantic cache, and OpenTelemetry-native cost telemetry sit on top. Tool-use accuracy is captured per call and surfaced in Error Feed so emerging translation regressions surface as named issues, not as a developer’s Slack message.
What it does for running Claude Code on non-Anthropic models:
- Translation fidelity is built around
traceAIinstrumentation. Every Anthropic-shaped inbound is parsed into a structured intermediate representation, then re-serialised to the upstream’s schema. Tool-use blocks survive intact including parallel calls.cache_controlis honoured on Anthropic upstreams and substituted with OpenAI’s automatic caching or Gemini’s context-cache priming on those routes. - Tool-use survival confirmed for Claude Code’s full tool set (bash, file edit, glob, grep, web fetch) against OpenAI GPT-5, Gemini 2.5 Pro, and Bedrock-hosted Claude Sonnet. The IR step lets parallel tool calls round-trip without re-serialisation artefacts.
- Provider breadth covers Anthropic, OpenAI, Azure, Gemini, Bedrock, Together, Groq, Fireworks, and Ollama. Adding a provider is a config file.
- Streaming continuity maintained by translating upstream events back into Anthropic event types in flight.
- Translation latency overhead is roughly 6-9 ms p50 non-streaming and 4 ms per chunk streaming. The Future AGI Protect model family runs inline at ~67 ms p50 text and ~109 ms p50 image (arXiv 2510.13351). FAGI’s own fine-tuned Gemma 3n adapters across content moderation, bias detection, security/prompt-injection, and data privacy/PII, multi-modal across text/image/audio, a model family rather than a plugin chain.
- Cost-aware routing through policies keyed on input token count, tool-call complexity, and eval-score history. Typical policy: under 8K with single tool calls to Gemini Flash; 8K-60K with parallel tool calls to GPT-5-mini or Claude Sonnet; above 60K or high failure history to Opus.
- Loop on translation correctness is the wedge.
traceAI(35+ framework integrations, OpenInference-native) emits spans;fi.evalsscores every call against task-completion and tool-use-correctness rubrics. Error Feed (FAGI’s “Sentry for AI agents”) sits alongside as the zero-config error monitor: auto-clusters related malformed-tool-call failures into named issues (50 traces → 1 issue), auto-writes the root cause plus a quick fix plus a long-term recommendation per issue, and tracks rising/steady/falling trend per issue so emerging translation regressions surface like exceptions.fi.opt.optimizersrewrites the per-provider prompt prefix or adjusts routing weight. Next deploy uses the updated route.
Where it falls short:
- The full loop is heavier than what a team wants in week one. Start with the gateway alone and switch on the optimizer once trace volume justifies it.
- Gemini context caching is stable but newer than the OpenAI and Anthropic paths. Verify your Gemini variant.
Pricing: Free tier with 100K traces / month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, plus a BAA. AWS Marketplace listing for procurement.
Score: 7/7 axes.
2. Portkey: Best for hosted multi-provider with mature controls
Verdict: Portkey is the most polished hosted option in this category if the priority is provider breadth plus enterprise controls. Anthropic-to-OpenAI translation works for Claude Code’s standard tool use, and the virtual-key system gives every developer attributable usage even when upstreams are pooled. The optimizer is absent; you get translation and observation, not a loop.
What it does for running Claude Code on non-Anthropic models:
- Translation fidelity is solid for the common case. Tool-use blocks translate end-to-end against OpenAI and Azure. Parallel tool calls work for OpenAI; Gemini parallel-tool-call translation was rough in May 2026 builds and required pinning to specific model versions.
- Tool-use survival confirmed for GPT-5 and GPT-4o against Claude Code’s bash and file-edit tooling. Gemini support is documented as beta.
- Provider breadth is wide. 200+ models across OpenAI, Anthropic, Azure, Gemini, Bedrock, Vertex, Cohere, Mistral, Together, Groq, Fireworks, and others. Not all at first-class translation depth.
- Streaming continuity works for SSE. Event-type translation back to Anthropic’s format is correct for the providers we tested.
- Translation latency overhead is around 8-12 ms p50 per Portkey’s published numbers.
- Cost-aware routing through
conditional routingpolicies: match on metadata including token count and route to the cheapest matching model. - Loop on translation correctness isn’t part of the product.
Where it falls short:
- No optimizer. The translation data informs humans, not the gateway.
- The Gemini path lags the OpenAI path on parallel tool calls. Confirm against your specific Gemini variant before committing.
- Pricing escalates above 5M requests/month faster than open-source alternatives, and the BYOC tier is more constrained than LiteLLM’s source-available story.
Pricing: Free tier with 10K requests/day. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II.
Score: 6/7 axes (missing: feedback loop / optimization).
3. LiteLLM: Best for self-hosted multi-provider translation
Verdict: LiteLLM is the pick when Claude Code traffic can’t leave your VPC and you still need to reach OpenAI / Gemini / Bedrock / OSS upstreams. The source is yours; the polish is less than the hosted options.
What it does for running Claude Code on non-Anthropic models:
- Translation fidelity is broad and source-readable. Explicit translators for OpenAI, Azure, Gemini, Vertex, Bedrock, Cohere, Together, and a long tail of OSS endpoints. Corner cases are patchable in source.
- Tool-use survival is good for OpenAI and Bedrock-hosted Claude. Gemini tool-use translation is where regressions land first; test your variant in CI.
- Provider breadth is the broadest in this list: 100+ providers and counting.
- Streaming continuity works for SSE on OpenAI and Bedrock paths; verify newer providers.
- Translation latency overhead is typically 10-18 ms p50 in our tests. Python is the bottleneck under load.
- Cost-aware routing through
model_listwith routing strategy, fallbacks, and weighted load balancing. Token-count routing viapre_call_hooks; coding-agent-specific rules require Python. - Loop on translation correctness isn’t in the product.
Where it falls short:
- No optimizer.
- The dashboard is functional, not polished. Slicing by tool-use success rate per provider means a SQL warehouse.
- High-RPS deployments need horizontal scaling.
Pricing: Open source under MIT. LiteLLM Enterprise tier adds SLA + SSO + audit; starts around $250/month for small teams.
Score: 5.5/7 axes (missing: native polished dashboard, optimizer).
4. OpenRouter: Best for breadth-of-providers, not enterprise governance
Verdict: OpenRouter is the right pick when the goal is “every model from one endpoint” and the team will accept consumer-facing controls. The catalogue is the widest in the category: 300+ models across roughly 60 providers. The Anthropic-inbound mode for Claude Code works for the common upstreams. The trade-off is that OpenRouter isn’t built around enterprise chargeback, audit, or self-improvement.
What it does for running Claude Code on non-Anthropic models:
- Translation fidelity is correct for the standard Claude Code tool set on the major upstreams (OpenAI, Anthropic, Google, xAI). The long tail of community providers is thin pass-through.
- Tool-use survival gotchas are model-specific rather than gateway-specific. OpenRouter’s docs flag which models support parallel tool calls reliably.
- Provider breadth is the largest in this list by a wide margin. If you need a model from a smaller host (DeepSeek, Mistral variants, specialised fine-tunes), OpenRouter is often the only managed way to get it.
- Streaming continuity works for SSE on the major upstreams.
- Translation latency overhead is in the 5-10 ms range; end-to-end latency is dominated by the upstream choice.
- Cost-aware routing through OpenRouter’s
modelsfallback chain. Token-count routing is client-side. - Loop on translation correctness isn’t in the product.
Where it falls short:
- Enterprise governance is light. RBAC, audit logs, and per-developer chargeback aren’t the strong suit. SOC 2 evidence means custom work.
- No optimizer.
- Pricing is per-request markup on top of upstream cost. For high-volume teams this can add up; verify against direct-upstream pricing.
Pricing: Pay-as-you-go markup on upstream model cost. No free tier for sustained workloads. Enterprise contracts negotiated separately.
Score: 5/7 axes (missing: feedback loop, deep enterprise governance).
5. Maxim Bifrost: Best for explicit Claude-Code-with-any-provider runtime
Verdict: Maxim Bifrost is one of the few gateways that ships an explicit Claude Code adapter with first-class non-Anthropic support, packaged as an open-source runtime. Bifrost is built around coding-agent workloads specifically, with the translation matrix tuned for Claude Code’s protocol use rather than chat completion in the abstract. That focus is both the strength and the limitation.
What it does for running Claude Code on non-Anthropic models:
- Translation fidelity is the explicit product goal. Bifrost ships an Anthropic-protocol inbound adapter that maps to OpenAI, Bedrock, Vertex, and OSS upstreams, with the coding-agent test surface (parallel tool calls, long file diffs, multi-turn sessions) called out in the docs.
- Tool-use survival is what Bifrost is benchmarked on. The team publishes per-provider tool-use correctness numbers and updates them. Treat them as directional since they’re vendor-reported.
- Provider breadth is deliberate, not exhaustive: OpenAI, Anthropic, Bedrock, Vertex, Azure, Together, Groq, Fireworks, Ollama. Smaller than LiteLLM and OpenRouter; deeper per provider.
- Streaming continuity is implemented for supported providers. Anthropic event-type rebuild is part of the integration test suite.
- Translation latency overhead is published in the project’s benchmarks. Bifrost is newer and the perf story is still moving.
- Cost-aware routing through Bifrost’s policy config. Coding-agent patterns (route by token count, route by tool-call complexity) are first-class.
- Loop on translation correctness is partial. Bifrost surfaces tool-use correctness as a metric but doesn’t yet rewrite routes from it.
Where it falls short:
- Younger than the other four. The long-tail bug surface is still being shaken out at high RPS.
- Smaller community. Unsupported provider means patch-it-yourself.
- Enterprise controls (SSO, RBAC, audit) are less mature than Portkey or Future AGI’s hosted offerings.
Pricing: Open source. Maxim AI’s hosted offering for Bifrost is a separate commercial product; pricing on inquiry.
Score: 5/7 axes (missing: closed loop on tool-use correctness, mature enterprise controls).
Capability matrix
| Axis | Future AGI | Portkey | LiteLLM | OpenRouter | Bifrost |
|---|---|---|---|---|---|
| Translation fidelity (tool-use intact) | ✅ IR-based | ✅ | ✅ Source-readable | ✅ Major upstreams | ✅ Coding-agent tuned |
| Tool-use survival (parallel calls) | ✅ | ✅ OpenAI; ⚠️ Gemini | ✅ OpenAI; ⚠️ Gemini | ✅ Major upstreams | ✅ |
| Provider breadth | 10+ first-class | 200+ catalogue | 100+ providers | 300+ models | 10+ first-class |
| Streaming continuity (Anthropic event types) | ✅ | ✅ | ✅ | ✅ | ✅ |
| Translation latency p50 | 6-9 ms | 8-12 ms | 10-18 ms | 5-10 ms (varies) | varies |
| Cost-aware routing | ✅ Policy + eval | ✅ Policy | ✅ Config | ⚠️ Fallback chain | ✅ Policy |
| Loop on translation correctness | ✅ fi.opt | ❌ | ❌ | ❌ | ⚠️ Metric only |
| Self-host posture | ✅ BYOC | ✅ BYOC | ✅ OSS | ❌ Hosted-only | ✅ OSS |
Decision framework: Choose X if
Choose Future AGI if you want translation correctness to improve over time without manual intervention. Pick this when Claude Code is becoming a meaningful spend item and the team needs the gateway to learn which upstream is reliable for which turn-shape, not translate blindly and let the operator chase regressions alone.
Choose Portkey if you want a hosted multi-provider gateway with mature virtual keys, RBAC, and a polished UI, and you don’t need the optimizer yet. Pick this when procurement matters and OpenAI is the primary non-Anthropic upstream.
Choose LiteLLM if Claude Code traffic must stay inside your VPC and the security team needs to read every line of code that touches a prompt. Pick this when source-availability beats hosted polish and you have a Python team to maintain the proxy at scale.
Choose OpenRouter if the constraint is access to a long-tail model (a DeepSeek variant, a community fine-tune, a specialised host) and enterprise governance is secondary. Pick this for individual developers and small teams.
Choose Maxim Bifrost if the team is explicitly building around the coding-agent + multi-provider workload and wants an open-source runtime tuned for it. Accept a younger ecosystem in exchange for that focus.
Common mistakes when wiring Claude Code through a translation gateway
| Mistake | What goes wrong | Fix |
|---|---|---|
| Assuming all “OpenAI-compatible” upstreams support parallel tool calls | Claude Code’s five-file-edit pattern fails silently on providers that flatten parallel tool_calls into sequential | Test parallel tool calls in CI; pin specific model versions known to support them |
Letting cache_control blocks drop on non-Anthropic routes | OpenAI bill comes in 3-4x higher than expected on long sessions | Use a gateway that explicitly remaps cache hints per provider; verify in trace data |
| Translating outbound but not stream events back | Claude Code’s progress UI freezes mid-turn | Confirm the gateway rebuilds Anthropic content_block_* events from the upstream stream |
| Single fallback chain instead of token-count-based routing | Cheap turns burn Opus pricing; hard turns get sent to a model that cannot handle them | Route by input token count and tool-call complexity, not just by provider preference |
| Pointing only the IDE plugin at the gateway | Terminal-CLI usage still hits Anthropic directly; the multi-provider experiment misses half the traffic | Set ANTHROPIC_BASE_URL in the shell profile, not only in the IDE config |
| Ignoring per-provider tool-use accuracy over time | Routes that worked in week one degrade silently as providers update model behaviour | Run a translation-correctness eval on a recurring schedule, or use a gateway that does this itself |
How Future AGI closes the loop on translation correctness
The other four gateways treat translation as a one-shot engineering problem: ship the adapter, fix bugs as they come in, hope the upstream protocols stay stable. Future AGI treats translation correctness as the input to a feedback loop. The loop has six stages:
-
Trace. Every Claude Code turn produces a span tree via
traceAI(Apache 2.0). Spans capture the inbound Anthropic request, the chosen upstream, the translated request, the upstream response, and the translated response back to Claude Code. -
Evaluate.
ai-evaluation(Apache 2.0) scores every turn. FAGI ships a 50+ built-in rubric catalog (task completion, tool-use correctness, did every requested tool actually fire and return a usable result, faithfulness, code-correctness, structured-output, hallucination, agentic surfaces, instruction-following, groundedness), plus unlimited custom evaluators authored end-to-end by an in-product eval-authoring agent that uses tool calling on your code, plus self-improving evaluators that learn from live production traces, plus FAGI’s proprietary classifier model family at very low cost-per-token (Galileo Luna-2 cost economics, rubric-flexible). A translation regression (say, a Gemini update that returnsfunctionCallparts in a slightly different order) shows up as a sudden drop in tool-use correctness for one provider. Catalog is the floor, not the ceiling. -
Cluster. Low-scoring sessions get clustered by failure mode: “parallel tool calls collapsed on provider X,” “cache hint dropped on provider Y,” “streaming events mangled on provider Z.” Clusters reveal failure shape without reading thousands of individual traces.
-
Optimize.
fi.opt.optimizers(ProTeGi, Bayesian, GEPA) reacts two ways. For prompt-level issues, it rewrites the per-provider system prompt prefix. For routing-level issues, it adjusts the routing weight so the offending provider drops out of the candidate set until reliability recovers. Temporary regressions auto-recover once eval scores rebound. -
Route. Agent Command Center applies the updated policy on the next request. The CLI sees no protocol change; the upstream silently shifts to whichever model is scoring best for that turn-shape.
-
Re-deploy. Policies are versioned. Teams roll forward; if a policy regresses, automatic rollback. Protect runs alongside, adding ~67ms per arXiv 2510.13351, to prevent prompt-injection content from leaking through translated routes.
Net effect: a team that starts with a static “cheap upstream for short turns, expensive for long” rule typically ends after four weeks with a policy capturing three to five turn-shapes per provider, picking the cheapest model that scored above the quality bar, and shifting when a provider regresses. Translation correctness goes up while average token cost goes down.
The three building blocks are open source:
traceAI, github.com/future-agi/traceAI (Apache 2.0)ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
The hosted Agent Command Center adds the failure-cluster view, live Protect, RBAC, SOC 2 Type II certified, and AWS Marketplace for procurement.
What we did not include
We deliberately left out three gateways that show up in other 2026 listicles:
- Helicone. Strong observability, but multi-provider translation for Claude Code is thin. Right pick for native-Anthropic monitoring, not cross-provider routing.
- Kong AI Gateway. Solid API-gateway SLA and the AI Proxy plugin handles tool calls, but Anthropic-to-non-Anthropic translation depth lags the picks above.
- Cloudflare AI Gateway. Strong primitives, fast edge, but the Anthropic-protocol-inbound-with-non-Anthropic-upstream story is still developing as of May 2026.
All three are worth a re-look later in 2026.
Related reading
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- What Is an AI Gateway? The 2026 Definition
- Best LLM Gateways in 2026
- Best AI Gateways for Agentic AI in 2026
Sources
- Anthropic Messages API protocol, docs.anthropic.com/en/api/messages
- Anthropic prompt caching, docs.anthropic.com/en/docs/build-with-claude/prompt-caching
- Claude Code documentation, claude.ai/docs/claude-code
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Portkey AI gateway, portkey.ai
- LiteLLM proxy, github.com/BerriAI/litellm
- OpenRouter, openrouter.ai
- Maxim Bifrost, github.com/maximhq/bifrost
Frequently asked questions
Can Claude Code actually run on non-Anthropic models?
Which non-Anthropic models work best with Claude Code through a gateway?
Does using a gateway add noticeable latency to Claude Code?
Why not just use OpenAI's coding-agent SDK instead of running Claude Code on OpenAI?
Can I mix Anthropic native and non-Anthropic upstreams in the same Claude Code session?
Is it safe to send source code through a translation gateway to a different provider?
How is Future AGI Agent Command Center different from Portkey for this workload?
Five AI gateways scored on caching Claude Code calls in 2026: cross-developer cache scope, semantic-match thresholds, hit-rate observability, TTL controls, and what each one misses.
Five tools for Claude Code cost management in 2026 — four gateways plus the native Anthropic dashboard and a FinOps platform — scored on attribution, chargeback, caps, routing, cache observability, FinOps integration, and audit trail.
Five AI gateways scored on Claude Code token monitoring in 2026: per-developer attribution, per-repo budgets, session traces, alert routing, and what each gateway misses.