Using OpenAI Codex CLI with Multiple Model Providers in 2026: A Gateway Setup Guide
Step-by-step walkthrough for pointing OpenAI Codex CLI at Anthropic, Gemini, Mistral, and OSS models through an AI gateway in 2026, with five gateway picks scored on five axes.
Table of Contents
OpenAI Codex CLI ships with one assumption hard-coded into it: every model on the other side of OPENAI_API_KEY is an OpenAI model speaking OpenAI’s Responses API. Point it directly at api.anthropic.com or generativelanguage.googleapis.com and the very first bash tool call returns a 401 or a malformed function-call block. The CLI loops, you stare at a frozen progress indicator, the logs say “tool_calls field missing.”
The fix is to put an AI gateway in front of Codex CLI. The gateway accepts the OpenAI-shaped request, translates the body and the tool-call JSON for whichever provider you want to land on, and streams a response Codex CLI can render without modification. This guide walks through the setup end-to-end (environment variables, routing config, model aliases, verification curl) and names five gateways that ship the translation layer in production today.
This is the implementation-side companion to the picker post on Codex CLI routing. If you already know which gateway you want and just need the wiring, you’re in the right place.
The problem in one paragraph
Codex CLI reads OPENAI_API_KEY and posts to api.openai.com/v1/responses by default. It sends tool_calls in OpenAI’s function-call shape, expects OpenAI’s SSE delta format on the way back, and uses OpenAI’s specific response_format semantics for structured output. Three things go wrong the moment you change providers without a gateway:
- API surface drift. Anthropic’s Messages API is a different endpoint and payload shape; Gemini’s
generateContentis different again. Codex CLI doesn’t know how to speak either. - Tool-call shape drift. Anthropic returns
tool_usecontent blocks; Gemini returnsfunctionCallobjects. Codex CLI expectstool_calls. A naive proxy that flattens these to text silently breaks the agent, every tool turn returns a string, the CLI sees no structured call, and the loop hangs. - Streaming shape drift. OpenAI streams
delta.contentanddelta.tool_calls.function.argumentschunks. Anthropic streamscontent_block_deltawith a different chunk schema. The CLI’s progress UI is wired to OpenAI’s chunk format; the wrong shape means a frozen terminal.
A gateway built for multi-provider routing handles all three translations inline. The rest of this guide shows the exact configuration.
Prereqs
Before starting, confirm the following versions and accounts:
| Component | Minimum version (May 2026) | Notes |
|---|---|---|
| Codex CLI | 0.18.x or later | Earlier builds read OPENAI_API_BASE; newer ones prefer OPENAI_BASE_URL. Both work. |
| Node.js | 20.x LTS | Codex CLI runtime. |
| Gateway endpoint | A live URL | Hosted (e.g. gateway.futureagi.com/v1) or self-hosted (e.g. http://litellm.internal:4000). |
| Provider API keys | Anthropic, Google AI Studio, Mistral, etc. | One per non-OpenAI provider you want to route to. |
| Shell | bash or zsh | Examples below assume zsh. |
The four environment variables that matter for Codex CLI in this configuration:
# Replace OpenAI's default endpoint with the gateway
export OPENAI_BASE_URL="https://gateway.futureagi.com/v1"
# Older Codex CLI builds (<= 0.16) used this alias instead. Set both for safety.
export OPENAI_API_BASE="$OPENAI_BASE_URL"
# Authenticate to the gateway, not to OpenAI directly
export OPENAI_API_KEY="fagi_sk_live_..."
# Optional: pin a default model alias that the gateway will route on
export CODEX_MODEL="claude-opus-4-7-via-gateway"
Set these in ~/.zshrc (or ~/.bashrc), reload, and you’re ready for the gateway-side configuration.
Setup walkthrough
Five steps, each with the exact code you need. We use Future AGI Agent Command Center for the first walkthrough because the routing config is declarative; the same shapes work for Portkey and LiteLLM with minor key-name differences (called out in the provider notes section below).
Step 1: Override OPENAI_BASE_URL
Codex CLI honors OPENAI_BASE_URL as the canonical override. Set it once in your shell profile and every codex invocation inherits it.
# ~/.zshrc
export OPENAI_BASE_URL="https://gateway.futureagi.com/v1"
export OPENAI_API_KEY="fagi_sk_live_xxxxxxxxxxxxxxxxxxxx"
# Reload
source ~/.zshrc
# Confirm
codex --help 2>&1 | head -3
If you’re wiring a CI environment or a remote workstation, set the same two variables in the runner’s environment. Codex CLI doesn’t read a config file by default; the env vars are the source of truth.
Step 2: Configure gateway routing
The routing config tells the gateway which model alias maps to which underlying provider model, and which provider key to use. This is declarative YAML on the Future AGI gateway and on Portkey; it’s Python on LiteLLM. Future AGI’s shape:
# /etc/fagi-gateway/routes.yaml
routes:
- alias: "gpt-5.1"
provider: "openai"
model: "gpt-5.1-2026-04-15"
api_key_ref: "openai_team_key"
- alias: "claude-opus-4-7-via-gateway"
provider: "anthropic"
model: "claude-opus-4-7-20260420"
api_key_ref: "anthropic_team_key"
translation: "openai_responses_v1"
- alias: "gemini-2.5-pro-via-gateway"
provider: "google"
model: "gemini-2.5-pro"
api_key_ref: "google_ai_studio_key"
translation: "openai_responses_v1"
- alias: "mistral-large-via-gateway"
provider: "mistral"
model: "mistral-large-2-2026"
api_key_ref: "mistral_team_key"
translation: "openai_responses_v1"
- alias: "llama-4-405b-via-gateway"
provider: "openai_compatible"
base_url: "http://vllm-internal:8000/v1"
model: "meta-llama/Llama-4-405B-Instruct"
translation: "passthrough"
routing_policy:
default: "gpt-5.1"
rules:
- if: "input_tokens < 8000"
route_to: "gemini-2.5-pro-via-gateway"
- if: "tools_include('apply_patch') and input_tokens > 30000"
route_to: "claude-opus-4-7-via-gateway"
attributes:
fi.attributes.user.id: "${headers.x-developer-email}"
fi.attributes.repo: "${headers.x-repo}"
The translation: "openai_responses_v1" key is doing the heavy lifting. It tells the gateway: accept an OpenAI Responses-API request, translate the body to the target provider’s native format, dispatch, and translate the response back, including the tool-call blocks. The attributes block tags each request with developer and repo metadata so the Agent Command Center dashboard can slice cost by both.
Step 3: Map model aliases at the Codex CLI side
Codex CLI takes the model name from a few places. In rough precedence order: the --model flag on the command line, the model field in ~/.codex/config.toml, the CODEX_MODEL environment variable, and finally its built-in default of gpt-5.1.
Set the alias to match a route in the gateway config:
# ~/.codex/config.toml
[default]
model = "claude-opus-4-7-via-gateway"
max_tokens = 8192
temperature = 0.2
[profiles.frontend]
model = "gemini-2.5-pro-via-gateway"
[profiles.refactor]
model = "claude-opus-4-7-via-gateway"
[profiles.oss]
model = "llama-4-405b-via-gateway"
Now codex chat defaults to the Anthropic route; codex --profile frontend chat flips to Gemini; codex --profile oss chat lands on the self-hosted Llama-4 served by vLLM. Codex CLI doesn’t know any of this, it just sends model: "claude-opus-4-7-via-gateway" in the JSON body, and the gateway’s routing table resolves it.
Step 4: Verify with a curl
Before running a real Codex CLI session, confirm the gateway is translating correctly. Two curls (one OpenAI passthrough, one Anthropic translation) should both return OpenAI-shaped responses:
# OpenAI passthrough — should hit gpt-5.1 directly
curl -sS "$OPENAI_BASE_URL/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.1",
"input": "Say hello in three words.",
"max_output_tokens": 32
}' | jq '.output[0].content[0].text'
# Expected output (string): "Hi there now."
# Anthropic translation — should hit claude-opus-4-7 but return OpenAI-shaped JSON
curl -sS "$OPENAI_BASE_URL/responses" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-7-via-gateway",
"input": "Say hello in three words.",
"max_output_tokens": 32
}' | jq '.output[0].content[0].text'
# Expected output (string): "Hello there friend."
If the second curl returns the same shape as the first (a responses payload with output[0].content[0].text populated), the translation is working. If it returns Anthropic’s native shape (content[0].text at the top level), the gateway isn’t translating, recheck the translation key in the route config.
Step 5: Run Codex CLI through the gateway
With the env vars set, the gateway running, and the curl verified, the actual Codex CLI invocation is unchanged from a normal OpenAI run:
codex chat "Refactor the auth handler in src/api/auth.ts to use the new SessionManager"
Watch the gateway logs (or the Agent Command Center traces tab), you should see a span with provider=anthropic, model=claude-opus-4-7-20260420, and a tool_calls block carrying the bash and apply_patch invocations Codex CLI fires during the refactor. The CLI sees standard OpenAI shapes; the gateway dispatches against Anthropic; both sides are happy.
Provider-specific notes
Each provider has one or two gotchas the gateway has to handle. If you’re evaluating a gateway, ask explicitly whether each is covered.
Anthropic Claude
- Tool-use translation. Anthropic returns tool calls as
tool_usecontent blocks; Codex CLI expects OpenAI’s flattool_callsarray. The gateway has to rewrite the block on every response. - System-prompt placement. OpenAI accepts
systemas a role insideinput; Anthropic accepts it as a top-levelsystemfield outside themessagesarray. The gateway has to move it. - Streaming chunks. Anthropic streams
content_block_deltaevents; OpenAI streamsdelta.contentanddelta.tool_calls.function.arguments. The gateway has to re-emit SSE in OpenAI’s shape or Codex CLI’s renderer breaks. anthropic-versionheader. Pin it explicitly. Tool-use behaviour silently differs between2023-06-01and2026-04-15.
Google Gemini
- Function-call shape. Gemini returns
functionCallobjects withnameandargs. Codex CLI expectstool_calls[].function.nameandtool_calls[].function.arguments(arguments stringified as JSON). The gateway re-keys and stringifies. - Safety filters. Gemini’s default safety filters block code completions that mention auth, crypto, or network patterns. Set
safety_settingsto permissive at the gateway or you will see empty responses on normal refactor turns. - Vertex AI vs. AI Studio. Vertex needs Google service-account auth; AI Studio uses a simple API key. Pick one in the gateway config.
Mistral
- OpenAI-compatible endpoint. Mistral’s API is closer to OpenAI’s shape than Anthropic’s or Gemini’s, so the translation is lighter, most gateways use a
passthroughmode. - Tool calling. Matches OpenAI’s exactly for
mistral-large-2-2026and newer. Pin the new model. - EU residency. Point at
api.mistral.ai/eu/v1and confirm the gateway preserves the regional endpoint.
OSS models via vLLM
- OpenAI-compatible by design. vLLM ships an OpenAI-compatible server; the gateway just routes (
translation: "passthrough"). - Tool calling. Llama-4-405B-Instruct and Qwen-3-235B-Code support it; older Llama-3.x finetunes often don’t. Test with a
tool_choice: requiredcurl first. - Context window. If you route a 100K-token Codex CLI turn to a 32K-context OSS model, the gateway should reject the request, confirm rejection happens before the CLI hangs.
Five gateways that ship the translation layer
The walkthrough above used Future AGI as the reference because the routing config is declarative and the trace data feeds back into the optimizer. The other four picks all ship the OpenAI-to-other-provider translation in production today. Scored on five axes weighted toward implementation friction: OpenAI-compatible passthrough, multi-provider translation depth, tool-call fidelity, declarative routing config, and self-host posture.
1. Future AGI Agent Command Center
Endpoint: https://gateway.futureagi.com/v1
Walkthrough fit. The YAML in Step 2 is taken verbatim from the Future AGI gateway. Codex CLI points at the gateway with no SDK changes; the translation key per route handles OpenAI-Responses-to-Anthropic-Messages (or Gemini, or Mistral) rewrites including tool calls. Coverage: OpenAI, Anthropic, Gemini, Mistral, Bedrock, Azure, Cohere, Groq, Together, Fireworks, plus any OpenAI-compatible OSS server (Ollama, vLLM, LM Studio).
The loop. Every Codex CLI turn becomes a span tree via traceAI (Apache 2.0). fi.evals scores tool-use accuracy, code correctness, and task completion. Low-scoring turns cluster by failure mode in the Agent Command Center, “Opus called on a turn with <8K input where Sonnet would have done it” surfaces automatically. fi.opt.optimizers (ProTeGi, BayesianSearchOptimizer, GEPAOptimizer) rewrites the routing policy against the clusters; the next deploy uses the updated route. Teams typically see Codex CLI spend drop 22-34% in four weeks without changing developer behaviour. Three OSS building blocks (traceAI, ai-evaluation, agent-opt) are all Apache 2.0.
Protect (prompt-injection and PII guardrail) runs inline at ~67ms text overhead per arXiv 2510.13351, fast enough to leave on by default for Codex CLI traffic carrying web-scraped tokens.
Pricing. Free tier with 100K traces/month. Scale from $99/month. Enterprise custom with SOC 2 Type II certified, BAA, AWS Marketplace.
Score: Passthrough, yes (base_url swap). Multi-provider, 11+. Tool-call fidelity, confirmed on gpt-5.1, claude-opus-4-7, gemini-2.5-pro. Declarative routing, yes (YAML). Self-host. Apache 2.0, BYOC, air-gapped. 5/5.
2. Portkey
Endpoint: https://api.portkey.ai/v1
Walkthrough fit. Drop-in alternative for the base-URL swap. Requires an x-portkey-api-key header alongside OPENAI_API_KEY. Codex CLI has no generic “extra-headers” config, so a small wrapper script injects it. 250+ adapters, the broadest library here. YAML routing with conditions on token count, model, and metadata.
Caveat. Palo Alto Networks announced intent to acquire Portkey on April 30, 2026; the deal closes in PANW’s fiscal Q4 2026, with the gateway becoming the AI Gateway for Prisma AIRS. Verify standalone-product continuity before signing multi-year. No optimizer.
Score: Passthrough, yes (with header). Multi-provider, 250+. Tool-call fidelity, confirmed. Declarative routing, yes. Self-host. MIT core + closed control plane, BYOC supported. 4.5/5.
3. LiteLLM
Endpoint: http://<your-litellm-proxy>:4000/v1
Walkthrough fit. Source-available Python proxy you run inside your VPC. 100+ providers behind an OpenAI-compatible surface. Routing config is config.yaml plus optional pre-call hooks for token-count-aware rules. Tool-call passthrough works cleanly for Anthropic and Gemini in the May 2026 release line.
Caveat. March 24, 2026 PyPI supply-chain compromise on 1.82.7 and 1.82.8 (Datadog Security Labs TeamPCP writeup); remediated past 1.83.7. Pin commit hashes or version-lock past 1.83.7 and rotate credentials touched by affected installs. Python runtime ~35ms P95 same-provider vs ~18ms for Go binaries; under high concurrency the gap widens.
Score: Passthrough, yes. Multi-provider, 100+. Tool-call fidelity, confirmed. Declarative routing, partial (YAML + Python hook). Self-host. MIT, full self-host. 4/5.
4. Maxim Bifrost
Endpoint: https://bifrost.<your-region>.maxim.ai/v1
Walkthrough fit. Go-binary gateway tuned for throughput, vendor cites ~11µs mean overhead at 5,000 RPS on t3.xlarge. Translates OpenAI Responses to Anthropic, Gemini, Mistral, Bedrock, Azure. Declarative routing config. Bifrost’s Code Mode pitch is more directly aimed at Claude Code than Codex CLI, but the OpenAI-compatible surface works either way.
Score: Passthrough, yes. Multi-provider, ~15 providers. Tool-call fidelity, confirmed. Declarative routing, yes. Self-host, yes (Go binary). 4/5.
5. OpenRouter
Endpoint: https://openrouter.ai/api/v1
Walkthrough fit. Lowest-friction option for solo developers or 3-5 person teams. One API key, one base URL, 200+ models. Address any model by its OpenRouter slug (anthropic/claude-opus-4-7, google/gemini-2.5-pro, meta-llama/llama-4-maverick-405b).
Caveat. Cost-aware routing is caller-side. To route easy turns to a cheaper model you need a wrapper around Codex CLI. OpenRouter doesn’t have a declarative “if input < 8K → route here” config. No semantic cache, no per-virtual-key budgets, no self-host. Closed source.
Score: Passthrough, yes. Multi-provider, 200+. Tool-call fidelity, confirmed. Declarative routing, no. Self-host, no. 3.5/5.
Common mistakes
| Mistake | What goes wrong | Fix |
|---|---|---|
Setting OPENAI_API_KEY but forgetting OPENAI_BASE_URL | Codex CLI keeps hitting api.openai.com directly with the gateway key, returns 401 | Set both env vars; verify with env | grep OPENAI_ |
Pointing the gateway at Anthropic without the tool_use → tool_calls translation | Codex CLI sees Anthropic’s native shape, fires no tool calls, hangs | Confirm the gateway’s translation field is set (Future AGI), or that the adapter version handles tool-call rewriting (Portkey, LiteLLM, OpenRouter all do as of May 2026) |
| Forgetting to pin model versions in the gateway config | The gateway routes to a model that updated between your eval run and prod, behaviour drifts | Pin explicit versions: gpt-5.1-2026-04-15, claude-opus-4-7-20260420, gemini-2.5-pro |
| Buffering streaming responses through the gateway | Codex CLI’s progress UI freezes mid-turn, developer thinks the agent hung | Confirm SSE pass-through, not buffer-and-batch — the curl in Step 4 should stream tokens, not return all at once |
| Routing every turn to the flagship model | Burns 2.5-4x more tokens than necessary on the 60%+ of easy turns | Add a token-count routing rule: under 8-10K input → cheaper model; over → flagship |
| Setting hard budget caps without a soft alert at 80% | Codex CLI pauses mid-conversation, breaking the developer’s flow | Soft-alert at 80% (Slack), hard-pause at 110% (HTTP 429) |
| Skipping the verification curl in Step 4 | First real Codex CLI session fails silently, hours of debugging | Always run the two-curl sanity check before pointing the CLI at the gateway |
Where this fits in the Future AGI loop
The setup above implements multi-provider routing as a one-time configuration. To make it self-improving, wire fi.evals to score every turn (tool-use accuracy, code correctness, task completion) and feed low-score traces into fi.opt.optimizers. The optimizer rewrites the routing policy against clustered failures; the next request uses the updated route. That’s the closed loop Future AGI ships end-to-end, three OSS components (traceAI, ai-evaluation, agent-opt), all Apache 2.0; the hosted Agent Command Center adds the failure-cluster view, RBAC, and procurement.
The other gateways are observation and translation layers. Codex CLI gets multi-provider routing, but the policy is static. Future AGI’s version is the same translation layer with the loop wired in, so the policy gets better at choosing the cheaper model for easy turns and the stronger model for hard turns every week instead of staying flat.
Related reading
- Best 5 AI Gateways to Route Codex CLI to Any Model in 2026
- Best 5 AI Gateways to Monitor Claude Code Token Usage in 2026
- What Is an AI Gateway? The 2026 Definition
- Best AI Gateways for Agentic AI in 2026
Sources
- OpenAI Codex CLI repository and configuration docs, github.com/openai/codex
- Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
- Portkey AI gateway, portkey.ai
- LiteLLM proxy, github.com/BerriAI/litellm
- Maxim Bifrost, getmaxim.ai/bifrost
- OpenRouter models directory, openrouter.ai/models
- Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey-to-secure-the-rise-of-ai-agents
- Datadog Security Labs writeup on LiteLLM PyPI compromise (TeamPCP campaign, March 24, 2026), securitylabs.datadoghq.com/articles/litellm-compromised-pypi-teampcp-supply-chain-campaign
- Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
- Anthropic Messages API reference, docs.anthropic.com/en/api/messages
- Google Gemini API reference, ai.google.dev/api
- Mistral API reference, docs.mistral.ai/api
- vLLM OpenAI-compatible server, docs.vllm.ai/en/latest/serving/openai_compatible_server.html
Frequently asked questions
Does Codex CLI support `OPENAI_BASE_URL` or do I need `OPENAI_API_BASE`?
Can I route Codex CLI to multiple providers in the same session?
Will tool calls (`bash`, `apply_patch`) work when routed to Claude or Gemini?
How much latency does the gateway add per Codex CLI turn?
Is it safe to send source code from Codex CLI through a hosted gateway?
A Director of Engineering Productivity buyer's brief for the AI gateway in front of Codex CLI at 1000+ engineer scale. Three pillars — governance, cost, provider flexibility — scored across seven axes with five picks.
Five AI gateways scored for MCP tool-level observability with Codex CLI in 2026: per-tool latency, tool-call success rate, argument validation, MCP server auth, and where each one falls short.
A 2026 architecture essay on why MCP traffic blows up coding-agent token bills in Claude Code and Codex CLI — and the five named mechanisms by which an MCP gateway compresses the cost.