Running Claude Code with OpenAI Models in 2026: A Gateway Setup Guide
How to run Claude Code against OpenAI GPT-5 and GPT-4 via a translation gateway in 2026. Setup walkthrough, ENV vars, config snippets, then five gateways scored on translation fidelity.
Table of Contents
Claude Code is the best coding-agent UX shipped to date, and it speaks Anthropic. Point the CLI at api.openai.com and you get an authentication error on turn one, the binary issues POST /v1/messages with x-api-key headers and expects Anthropic-shaped streaming events back. OpenAI’s API answers POST /v1/chat/completions and streams a different event schema. The two protocols overlap in shape but disagree on every detail that matters: tool calls live in different fields, system prompts go in different places, cache control means different things, streaming event types don’t match.
That mismatch is the gap an AI gateway closes. The gateway accepts Claude Code’s Anthropic-shaped request, translates to OpenAI’s chat-completion format, forwards to GPT-5 or GPT-4o, then translates the streaming response back into Anthropic’s content_block_* events so Claude Code’s progress UI keeps moving. Done well, the CLI doesn’t know. Done poorly, the terminal freezes mid-refactor or silently drops every parallel tool call after the first.
This guide is in two parts. First: the implementation walkthrough, prereqs, translation-layer mechanics, ENV vars, and a gateway config that routes the four common Claude Code patterns to GPT-5 variants. Second: a scored shortlist of five gateways that ship this translation in production, named honestly with what each breaks on.
Why anyone runs Claude Code on OpenAI models
Three reasons keep coming up in the field.
Cost arbitrage on the easy turns. A typical Claude Code workload is bimodal: roughly 60-70% of turns are boilerplate edits and small refactors under 8K input tokens, the other 30-40% are architecture and multi-file work needing 60K-200K context. Sending the easy turns to GPT-5-mini instead of Claude Sonnet saves 60-75% on those calls without measurable quality loss. Hard turns stay on Opus.
Capability mix. GPT-5 is genuinely better than Claude Opus on some workloads, structured JSON extraction with strict schemas, certain numerical reasoning chains, SQL-with-aggregations generation. The point isn’t “OpenAI is better”; it’s “different models win on different turn-shapes, and you want both.”
Rate-limit and outage hedging. Anthropic occasionally rate-limits or has regional incidents. A gateway that routes to OpenAI as deterministic fallback keeps the team coding through the outage.
Shared tradeoff across all three: Claude Code was designed around Claude’s tool-use protocol and prompt caching. Running it on OpenAI through a gateway works, but the gateway has to do real translation work on every turn.
Prereqs
Before wiring anything, confirm:
- Claude Code CLI installed. The binary honors
ANTHROPIC_BASE_URLsince 1.6; older versions ignore it. - OpenAI API key with GPT-5 or GPT-4o access (needs
responses:writescope). - A gateway choice. This guide uses Future AGI Agent Command Center as the worked example because it ships the translation layer end-to-end. Section two covers four alternatives.
- Shell with persistent env vars. Set them in
.zshrcor.bash_profileso both your IDE terminal and direct CLI usage hit the gateway. The biggest setup mistake we see is wiring the IDE only, leaking half the traffic toapi.anthropic.comdirect. - A test repo with five to ten files. Translation regressions show up first on real-shaped sessions, not on
hello world.
How the translation layer actually works
Four conversions matter between Claude Code’s outbound request and OpenAI’s inbound endpoint.
1. Endpoint mapping. Claude Code calls POST /v1/messages. OpenAI exposes POST /v1/chat/completions (and the newer POST /v1/responses). The gateway listens on /v1/messages, parses the Anthropic body, and reissues to the OpenAI endpoint. The reverse mapping, converting OpenAI’s response back into an Anthropic Message, happens before the stream returns to the CLI.
2. System prompt placement. Anthropic accepts system as a top-level string. OpenAI expects a messages array where the first element has role: "system" (or role: "developer" on newer endpoints). The translator must lift Claude Code’s system block to the first message. If the gateway forgets, GPT-5 answers but ignores tool-use instructions, and parallel tool calls collapse into sequential ones.
3. Tool-use block conversion. This is where most translations break. Claude Code uses Anthropic’s tool_use and tool_result content blocks aggressively, every bash invocation, every file edit, every grep is a structured JSON block inside the content array. OpenAI splits this across tool_calls on the assistant message and a tool role for results. Five parallel tool calls in Claude’s protocol become five entries inside a single tool_calls array. A naive translator that maps one-block-to-one-message flattens parallel calls into sequential, and Claude Code’s five-file-edit pattern breaks silently. The gateway must round-trip parallel arrays in both directions, matching tool_use_id to tool_call_id.
4. Streaming SSE bridging. Anthropic streams typed events (content_block_start, content_block_delta, content_block_stop, message_delta, message_stop). OpenAI streams loose delta chunks with finish_reason and tool_calls deltas threaded in. The gateway re-emits Anthropic-typed events in real time. Buffering the OpenAI stream and replaying at the end makes Claude Code’s progress UI hang for the full turn, functionally frozen. The right pattern is event-by-event translation with no buffering beyond the SSE chunk boundary.
Cache control deserves a brief mention. Claude Code sets cache_control on long system prompts. OpenAI caches automatically with no header. The translator should drop cache_control on OpenAI-bound requests; passing it through is a no-op but makes the trace misleading.
Step 1: Set the environment variables
Open your shell profile and add the following:
# ~/.zshrc or ~/.bash_profile
# Point Claude Code at the gateway instead of api.anthropic.com
export ANTHROPIC_BASE_URL="https://gateway.futureagi.com/v1"
# The API key Claude Code sends. The gateway maps this to your OpenAI key
# server-side; the CLI never sees the OpenAI credential.
export ANTHROPIC_API_KEY="fi_live_xxxxxxxxxxxxxxxx"
# Pin the protocol version so tool-use behavior is deterministic
export ANTHROPIC_VERSION="2025-09-01"
# Optional: set the default model alias the gateway routes from.
# This is what Claude Code's --model flag will resolve to if you do not
# override it on the command line.
export ANTHROPIC_MODEL="gpt-5-via-gateway"
# Reload
source ~/.zshrc
Two checks before moving on. First, echo $ANTHROPIC_BASE_URL in a fresh terminal should return the gateway URL. Second, the CLI honors the override: run claude with a trivial prompt and watch the gateway’s request log to confirm the call landed there and not at Anthropic.
Step 2: Configure the model aliases in the gateway
The gateway needs to know that when Claude Code asks for gpt-5-via-gateway, it should translate to OpenAI’s GPT-5 endpoint. Most production gateways express this as a config file or dashboard rule. Below is a representative shape, the exact YAML differs across vendors but the model is the same.
# gateway-config.yaml
routes:
- name: gpt-5-via-gateway
inbound_protocol: anthropic
upstream:
provider: openai
model: gpt-5
endpoint: https://api.openai.com/v1/chat/completions
api_key_ref: openai_prod
translation:
system_placement: first_message
tool_use: openai_tool_calls
streaming: anthropic_events
drop_headers: [cache_control]
- name: gpt-5-mini-via-gateway
inbound_protocol: anthropic
upstream:
provider: openai
model: gpt-5-mini
endpoint: https://api.openai.com/v1/chat/completions
api_key_ref: openai_prod
translation:
system_placement: first_message
tool_use: openai_tool_calls
streaming: anthropic_events
drop_headers: [cache_control]
- name: claude-opus-fallback
inbound_protocol: anthropic
upstream:
provider: anthropic
model: claude-opus-4-7
endpoint: https://api.anthropic.com/v1/messages
api_key_ref: anthropic_prod
translation:
passthrough: true
policies:
- name: cost_aware_routing
rule: |
if input_tokens < 8000 and tool_call_count <= 2:
route gpt-5-mini-via-gateway
elif input_tokens < 60000:
route gpt-5-via-gateway
else:
route claude-opus-fallback
The policies block is what makes the multi-provider story useful. Without a routing policy, every Claude Code call goes to one upstream and you haven’t improved over a single-provider setup. The example routes short turns with simple tool use to GPT-5-mini, mid-range turns to full GPT-5, and reserves Opus for long-context architecture work where Claude’s tool-use training still beats GPT-5.
After saving the config, restart the gateway, then verify by issuing one turn against each alias and inspecting the trace to confirm the upstream model matches the policy choice.
Step 3: Run a real session and watch the trace
The verification step that catches everything is a single multi-turn session against a real repo. Ask Claude Code for a change that requires four to six parallel file edits (something like “rename getUserById to findUserById across the codebase”) and confirm:
- The trace shows the request hitting
gpt-5-via-gateway(or whichever alias the policy resolved). - The CLI’s progress UI streams tokens in real time, not freeze-then-dump.
- The tool-call count in the trace matches the number of edits performed. If you asked for six edits and the trace shows two, parallel tool-call translation is broken.
- The final response lands in the CLI with the diff displayed normally.
If all four hold, the wiring is correct. If streaming freezes, the gateway is buffering instead of bridging event-by-event. If parallel tool calls collapsed, the translation is mapping content blocks one-to-one. Both are gateway-side fixes.
Step 4: Production checklist
Before declaring victory, walk through the operational concerns that bite once a team uses this daily.
| Concern | What to check |
|---|---|
| Latency overhead | Measure p50 and p95 of the gateway hop. Translation alone should add 5-15ms; anything higher suggests buffering or a JSON re-parse that is not necessary. |
| Failure isolation | If the gateway is down, does Claude Code surface a clean error or hang? Wire a deterministic fallback to Anthropic-direct as a degraded mode. |
| Cost attribution | Tag every request with developer ID and repo. Without this, the gateway has saved cost but lost the chargeback story finance needs. |
| Audit log | Every gateway decision (which model, which policy fired, what the input-token estimate was) should be queryable. This becomes the trail you need when a session looks anomalous. |
| Cold-start | First request after a config push should not take 5+ seconds. If it does, the gateway is recompiling the route map per request. |
| Rollback | You should be able to disable the gateway hop in under a minute by unsetting ANTHROPIC_BASE_URL in the team’s shell template, and have everyone fall back to direct Anthropic without code changes. |
The walkthrough above gets Claude Code talking to OpenAI through a single gateway. The next question is which gateway. Translation correctness isn’t uniform, and the failure modes differ in ways that show up only in production. Below are five gateways that ship Anthropic-to-OpenAI translation today, scored on the axes that matter for Claude Code.
The 5 axes we score on
| Axis | What it measures |
|---|---|
| 1. Translation fidelity | Does the gateway correctly map system prompts, tool-use blocks, and cache headers in both directions? |
| 2. Parallel tool-call survival | Does Claude Code’s five-file-edit pattern round-trip without flattening into sequential calls? |
| 3. Streaming event bridging | Does SSE arrive at the CLI with Anthropic event types intact, or does the gateway buffer-and-batch? |
| 4. Translation latency overhead | How many milliseconds does the translation step add per turn? |
| 5. Loop on correctness | Does the gateway score translation correctness and adjust routes when an upstream regresses, or does the operator chase issues by hand? |
1. Future AGI Agent Command Center: Best for closing the loop on translation
Verdict: Future AGI is the only gateway here that captures tool-use correctness per translated call and feeds it back into routing. The other four are static translation layers that depend on the operator to notice when GPT-5 misbehaves after a model update.
What it does for Claude Code on OpenAI:
- Translation fidelity uses an intermediate-representation step. Inbound Anthropic requests parse to a typed IR, then re-serialize to OpenAI’s
chat/completionsorresponsesshape. System-prompt placement, tool-use conversion, and cache-header handling are explicit decisions in code. - Parallel tool-call survival verified for Claude Code’s bash, file edit, glob, and grep tools against GPT-5 and GPT-4o, including six-file-edit patterns.
- Streaming event bridging rebuilds Anthropic
content_block_*events from OpenAI’s delta chunks in flight. No buffering. - Translation latency overhead runs 6-9 ms p50 non-streaming and 4 ms per chunk streaming. Optional Protect guardrail adds ~67ms per arXiv 2510.13351.
- Loop on correctness is the wedge.
fi.evalsscores every translated call; failures cluster by shape;fi.opt.optimizersadjusts per-route system-prompt prefix or shifts traffic until the regression clears.
The honest tradeoff: GPT-5’s caching is automatic, so cache-control hints from Claude Code are dropped on this route. The trace records this. Mixing upstreams in one session is the common pattern.
Where it falls short:
- agent-opt is opt-in, start with traceAI + ai-evaluation for one-week pilots and turn the optimizer on once eval baselines stabilize. The loop compounds value over weeks rather than at day one.
- Prompt library is opinionated, fewer review-and-collaboration knobs than Portkey’s prompt hub, which keeps the daily workflow tight; teams running large multi-author prompt libraries should preview the workflow before standardizing.
Pricing: Free tier with 100K traces/month. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II, HIPAA, GDPR, and CCPA certifications, a BAA, and AWS Marketplace listing.
Score: 5/5 axes.
2. Portkey: Best for hosted gateway with mature RBAC
Verdict: Portkey is the most polished hosted-only product if the priority is virtual-key controls and RBAC on top of working Anthropic-to-OpenAI translation. Routes don’t get optimized back.
What it does for Claude Code on OpenAI:
- Translation fidelity is solid for standard tool-use patterns. System-prompt placement and tool-call conversion work end-to-end for GPT-5 and GPT-4o as of May 2026.
- Parallel tool-call survival confirmed for OpenAI upstreams. OpenAI is the more reliable non-Anthropic path through Portkey.
- Streaming event bridging works. SSE pass-through with correct Anthropic event-type rebuild.
- Translation latency overhead runs around 8-12 ms p50 per Portkey’s published numbers.
- Loop on correctness isn’t part of the product.
The honest tradeoff: Portkey’s metadata-header model for per-developer attribution needs the Claude Code wrapper to set x-portkey-trace-id and similar headers. Without that wiring, the gateway sees one shared key and developer aggregation is impossible.
Where it falls short:
- No optimizer.
- Metadata-header model needs client-side wiring; otherwise developer-level attribution collapses.
- Pricing escalates above 5M requests/month faster than open-source alternatives.
Pricing: Free tier with 10K requests/day. Scale tier starts at $99/month. Enterprise is custom with SOC 2 Type II.
Score: 4/5 axes (missing: feedback loop on correctness).
3. LiteLLM: Best for self-hosted multi-provider translation
Verdict: LiteLLM is the pick when Claude Code traffic can’t leave your VPC and the security team needs to read every line of the translator. Source-available, Python-native, proxies on your infrastructure.
What it does for Claude Code on OpenAI:
- Translation fidelity is broad and source-readable. The
anthropictoopenaiadapter handles system-prompt placement and tool-use conversion in code you can audit. Corner cases are patchable. - Parallel tool-call survival is good on OpenAI. Occasional regressions on edge cases like nested JSON in tool arguments; the community typically lands a fix within days.
- Streaming event bridging works on OpenAI.
- Translation latency overhead is typically 10-18 ms p50 in our tests. Python is the bottleneck at high RPS.
- Loop on correctness isn’t in the product.
The honest tradeoff: Observability is thinner than the hosted offerings. Plan to wire fi.evals or another sink behind LiteLLM. Slicing per-provider tool-use success rate means SQL.
Where it falls short:
- No optimizer.
- High-RPS deployments need explicit horizontal scaling.
- Dashboard is functional, not polished.
Pricing: Open source under MIT. LiteLLM Enterprise tier (SLA + SSO + audit) starts around $250/month for small teams.
Score: 3.5/5 axes (missing: feedback loop, polished native dashboard).
4. OpenRouter: Best for breadth of upstream catalog
Verdict: OpenRouter is the pick when the goal is “every OpenAI variant plus 300 other models from one endpoint” and enterprise governance is secondary.
What it does for Claude Code on OpenAI:
- Translation fidelity is correct for the standard Claude Code tool set against GPT-5, GPT-4o, and OpenAI variants. The long tail of community providers is thinner pass-through.
- Parallel tool-call survival is upstream-dependent. OpenRouter’s docs flag which OpenAI variants support parallel calls reliably.
- Streaming event bridging works for SSE on the major upstreams.
- Translation latency overhead sits in the 5-10 ms range.
- Loop on correctness isn’t in the product.
The honest tradeoff: OpenRouter is consumer-facing in shape. Chargeback for a 30-developer team is light; SOC 2 evidence and team-scoped audit logs mean custom work.
Where it falls short:
- Enterprise governance is light.
- No optimizer.
- Per-request markup on upstream cost; verify against direct-OpenAI pricing.
Pricing: Pay-as-you-go markup. No free tier for sustained workloads.
Score: 3/5 axes (missing: enterprise governance, feedback loop).
5. Maxim Bifrost: Best for explicit Claude-Code-with-any-provider runtime
Verdict: Maxim Bifrost ships an explicit Claude Code adapter with first-class non-Anthropic support as an open-source runtime tuned for coding-agent workloads, both the strength and the limitation.
What it does for Claude Code on OpenAI:
- Translation fidelity is the explicit product goal. Anthropic-protocol inbound maps to OpenAI (plus Bedrock, Vertex, OSS) with parallel tool calls, long file diffs, and multi-turn sessions called out in docs.
- Parallel tool-call survival is what Bifrost is benchmarked on. The team publishes per-provider correctness numbers; treat them as directional since they’re vendor-reported.
- Streaming event bridging is implemented for OpenAI; event-type rebuild is part of the test suite.
- Translation latency overhead is published in the project’s benchmarks; Bifrost is newer and the perf story is still moving.
- Loop on correctness is partial, tool-use correctness shows up as a metric but doesn’t yet rewrite routes.
The honest tradeoff: Younger project, smaller community, smaller bug-surface coverage at high RPS. Enterprise controls lag the hosted alternatives.
Where it falls short:
- Younger ecosystem.
- Enterprise controls less mature than hosted alternatives.
- Loop is metric-only, not closed.
Pricing: Open source. Maxim AI’s hosted Bifrost is a separate commercial product; pricing on inquiry.
Score: 3/5 axes (missing: closed loop on correctness, mature enterprise controls).
Capability matrix
| Axis | Future AGI | Portkey | LiteLLM | OpenRouter | Bifrost |
|---|---|---|---|---|---|
| Translation fidelity (system + tools) | IR-based | Solid | Source-readable | Major upstreams | Coding-agent tuned |
| Parallel tool-call survival (OpenAI) | Yes | Yes | Yes | Yes (upstream-dependent) | Yes |
| Streaming event bridging | Yes | Yes | Yes | Yes | Yes |
| Translation latency p50 | 6-9 ms | 8-12 ms | 10-18 ms | 5-10 ms | varies |
| Loop on translation correctness | fi.opt | No | No | No | Metric only |
| Self-host posture | BYOC | BYOC | OSS | Hosted-only | OSS |
Decision framework: Choose X if
Choose Future AGI if you want the gateway to learn which upstream is reliable for which turn-shape over time. Pick this when Claude Code on OpenAI is becoming a meaningful line item ($10K+/month).
Choose Portkey if you want a hosted gateway with mature RBAC and virtual keys, and you don’t need the optimizer yet. Pick this when procurement matters and OpenAI is the primary non-Anthropic upstream.
Choose LiteLLM if Claude Code traffic must stay inside your VPC and the security team needs to read every line of the translator. Pick this when source-availability beats hosted polish.
Choose OpenRouter if the constraint is access to a long tail of OpenAI variants and community providers, and enterprise governance is secondary. Pick this for individual developers and small teams.
Choose Maxim Bifrost if the team is explicitly building around coding-agent + multi-provider workloads and wants an open-source runtime tuned for it.
How Future AGI closes the loop on translation correctness
The four other picks treat Anthropic-to-OpenAI translation as a one-shot engineering problem: ship the adapter, fix bugs as they come in. Future AGI treats translation correctness as the input to a feedback loop.
traceAI (Apache 2.0) captures each turn’s span tree, inbound Anthropic request, chosen OpenAI model, translated body, upstream stream, and Anthropic-shaped stream rebuilt back to the CLI. fi.evals scores each turn on task-completion and tool-use correctness; a regression like GPT-5 returning tool-call JSON in a different order shows up as a sudden score drop. Low-scoring sessions cluster by failure shape, and fi.opt.optimizers reacts two ways: rewrite the per-route system-prompt prefix so OpenAI receives a Claude-Code-aware framing, or adjust routing weight so the offending model drops out of the candidate set until reliability recovers. Policies are versioned with automatic rollback. Protect runs alongside, adding ~67ms per arXiv 2510.13351, to catch prompt-injection content.
The three building blocks are open source under Apache 2.0: traceAI, ai-evaluation, agent-opt. The hosted Agent Command Center adds the failure-cluster view, live Protect, RBAC, SOC 2 Type II certified, and AWS Marketplace.
What we did not include
Three gateways show up in other 2026 listicles that we deliberately left out:
- Helicone. Strong native-Anthropic observability, but Anthropic-to-OpenAI translation depth is thinner than the picks above.
- Kong AI Gateway. Solid API-gateway SLA, but Anthropic-inbound-with-OpenAI-upstream translation lags.
- Cloudflare AI Gateway. Strong primitives and edge latency, but the Anthropic-protocol-inbound story is still developing as of May 2026.
All three are worth a re-look later in 2026.
Related reading
- Best 5 AI Gateways to Run Claude Code with Any LLM Provider in 2026
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- What Is an AI Gateway? The 2026 Definition
- Best LLM Gateways in 2026
Sources
- Anthropic Messages API protocol, docs.anthropic.com/en/api/messages
- OpenAI chat completions and responses APIs, platform.openai.com/docs/api-reference
- Anthropic prompt caching, docs.anthropic.com/en/docs/build-with-claude/prompt-caching
- Claude Code documentation, claude.ai/docs/claude-code
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Portkey AI gateway, portkey.ai
- LiteLLM proxy, github.com/BerriAI/litellm
- OpenRouter, openrouter.ai
- Maxim Bifrost, github.com/maximhq/bifrost
Frequently asked questions
Can Claude Code actually run on OpenAI GPT-5?
Which OpenAI model best replaces Claude Sonnet in Claude Code?
Will prompt caching still work when Claude Code runs on OpenAI?
Is it safe to send source code through a translation gateway to OpenAI?
How is Future AGI different from Portkey for this workload?
A practitioner's guide to cutting Claude Code token spend with five stackable levers — native cache_control, MCP-tool compilation, semantic caching, model right-sizing, and context pruning — with worked math and an honest read on where the 90 percent claim holds.
A practical 2026 how-to for cutting MCP token spend on Claude Code at fleet scale: five levers, the mcp.json + gateway config that wires them, the metrics that prove the cut held.
Step-by-step walkthrough for wiring Claude Code to an MCP gateway in 2026: mcp.json config, routing rules, per-server auth scoping, and verification. With production checklist and gateway picks.