Guides

Using OpenAI Codex CLI with Multiple Model Providers in 2026: A Gateway Setup Guide

Step-by-step walkthrough for pointing OpenAI Codex CLI at Anthropic, Gemini, Mistral, and OSS models through an AI gateway in 2026, with five gateway picks scored on five axes.

·
12 min read
ai-gateway 2026 codex-cli
Editorial cover image for Using OpenAI Codex CLI with Multiple Model Providers in 2026: A Gateway Setup Guide
Table of Contents

OpenAI Codex CLI ships with one assumption hard-coded into it: every model on the other side of OPENAI_API_KEY is an OpenAI model speaking OpenAI’s Responses API. Point it directly at api.anthropic.com or generativelanguage.googleapis.com and the very first bash tool call returns a 401 or a malformed function-call block. The CLI loops, you stare at a frozen progress indicator, the logs say “tool_calls field missing.”

The fix is to put an AI gateway in front of Codex CLI. The gateway accepts the OpenAI-shaped request, translates the body and the tool-call JSON for whichever provider you want to land on, and streams a response Codex CLI can render without modification. This guide walks through the setup end-to-end (environment variables, routing config, model aliases, verification curl) and names five gateways that ship the translation layer in production today.

This is the implementation-side companion to the picker post on Codex CLI routing. If you already know which gateway you want and just need the wiring, you’re in the right place.


The problem in one paragraph

Codex CLI reads OPENAI_API_KEY and posts to api.openai.com/v1/responses by default. It sends tool_calls in OpenAI’s function-call shape, expects OpenAI’s SSE delta format on the way back, and uses OpenAI’s specific response_format semantics for structured output. Three things go wrong the moment you change providers without a gateway:

  1. API surface drift. Anthropic’s Messages API is a different endpoint and payload shape; Gemini’s generateContent is different again. Codex CLI doesn’t know how to speak either.
  2. Tool-call shape drift. Anthropic returns tool_use content blocks; Gemini returns functionCall objects. Codex CLI expects tool_calls. A naive proxy that flattens these to text silently breaks the agent, every tool turn returns a string, the CLI sees no structured call, and the loop hangs.
  3. Streaming shape drift. OpenAI streams delta.content and delta.tool_calls.function.arguments chunks. Anthropic streams content_block_delta with a different chunk schema. The CLI’s progress UI is wired to OpenAI’s chunk format; the wrong shape means a frozen terminal.

A gateway built for multi-provider routing handles all three translations inline. The rest of this guide shows the exact configuration.


Prereqs

Before starting, confirm the following versions and accounts:

ComponentMinimum version (May 2026)Notes
Codex CLI0.18.x or laterEarlier builds read OPENAI_API_BASE; newer ones prefer OPENAI_BASE_URL. Both work.
Node.js20.x LTSCodex CLI runtime.
Gateway endpointA live URLHosted (e.g. gateway.futureagi.com/v1) or self-hosted (e.g. http://litellm.internal:4000).
Provider API keysAnthropic, Google AI Studio, Mistral, etc.One per non-OpenAI provider you want to route to.
Shellbash or zshExamples below assume zsh.

The four environment variables that matter for Codex CLI in this configuration:

# Replace OpenAI's default endpoint with the gateway
export OPENAI_BASE_URL="https://gateway.futureagi.com/v1"

# Older Codex CLI builds (<= 0.16) used this alias instead. Set both for safety.
export OPENAI_API_BASE="$OPENAI_BASE_URL"

# Authenticate to the gateway, not to OpenAI directly
export OPENAI_API_KEY="fagi_sk_live_..."

# Optional: pin a default model alias that the gateway will route on
export CODEX_MODEL="claude-opus-4-7-via-gateway"

Set these in ~/.zshrc (or ~/.bashrc), reload, and you’re ready for the gateway-side configuration.


Setup walkthrough

Five steps, each with the exact code you need. We use Future AGI Agent Command Center for the first walkthrough because the routing config is declarative; the same shapes work for Portkey and LiteLLM with minor key-name differences (called out in the provider notes section below).

Step 1: Override OPENAI_BASE_URL

Codex CLI honors OPENAI_BASE_URL as the canonical override. Set it once in your shell profile and every codex invocation inherits it.

# ~/.zshrc
export OPENAI_BASE_URL="https://gateway.futureagi.com/v1"
export OPENAI_API_KEY="fagi_sk_live_xxxxxxxxxxxxxxxxxxxx"

# Reload
source ~/.zshrc

# Confirm
codex --help 2>&1 | head -3

If you’re wiring a CI environment or a remote workstation, set the same two variables in the runner’s environment. Codex CLI doesn’t read a config file by default; the env vars are the source of truth.

Step 2: Configure gateway routing

The routing config tells the gateway which model alias maps to which underlying provider model, and which provider key to use. This is declarative YAML on the Future AGI gateway and on Portkey; it’s Python on LiteLLM. Future AGI’s shape:

# /etc/fagi-gateway/routes.yaml
routes:
  - alias: "gpt-5.1"
    provider: "openai"
    model: "gpt-5.1-2026-04-15"
    api_key_ref: "openai_team_key"

  - alias: "claude-opus-4-7-via-gateway"
    provider: "anthropic"
    model: "claude-opus-4-7-20260420"
    api_key_ref: "anthropic_team_key"
    translation: "openai_responses_v1"

  - alias: "gemini-2.5-pro-via-gateway"
    provider: "google"
    model: "gemini-2.5-pro"
    api_key_ref: "google_ai_studio_key"
    translation: "openai_responses_v1"

  - alias: "mistral-large-via-gateway"
    provider: "mistral"
    model: "mistral-large-2-2026"
    api_key_ref: "mistral_team_key"
    translation: "openai_responses_v1"

  - alias: "llama-4-405b-via-gateway"
    provider: "openai_compatible"
    base_url: "http://vllm-internal:8000/v1"
    model: "meta-llama/Llama-4-405B-Instruct"
    translation: "passthrough"

routing_policy:
  default: "gpt-5.1"
  rules:
    - if: "input_tokens < 8000"
      route_to: "gemini-2.5-pro-via-gateway"
    - if: "tools_include('apply_patch') and input_tokens > 30000"
      route_to: "claude-opus-4-7-via-gateway"

attributes:
  fi.attributes.user.id: "${headers.x-developer-email}"
  fi.attributes.repo: "${headers.x-repo}"

The translation: "openai_responses_v1" key is doing the heavy lifting. It tells the gateway: accept an OpenAI Responses-API request, translate the body to the target provider’s native format, dispatch, and translate the response back, including the tool-call blocks. The attributes block tags each request with developer and repo metadata so the Agent Command Center dashboard can slice cost by both.

Step 3: Map model aliases at the Codex CLI side

Codex CLI takes the model name from a few places. In rough precedence order: the --model flag on the command line, the model field in ~/.codex/config.toml, the CODEX_MODEL environment variable, and finally its built-in default of gpt-5.1.

Set the alias to match a route in the gateway config:

# ~/.codex/config.toml
[default]
model = "claude-opus-4-7-via-gateway"
max_tokens = 8192
temperature = 0.2

[profiles.frontend]
model = "gemini-2.5-pro-via-gateway"

[profiles.refactor]
model = "claude-opus-4-7-via-gateway"

[profiles.oss]
model = "llama-4-405b-via-gateway"

Now codex chat defaults to the Anthropic route; codex --profile frontend chat flips to Gemini; codex --profile oss chat lands on the self-hosted Llama-4 served by vLLM. Codex CLI doesn’t know any of this, it just sends model: "claude-opus-4-7-via-gateway" in the JSON body, and the gateway’s routing table resolves it.

Step 4: Verify with a curl

Before running a real Codex CLI session, confirm the gateway is translating correctly. Two curls (one OpenAI passthrough, one Anthropic translation) should both return OpenAI-shaped responses:

# OpenAI passthrough — should hit gpt-5.1 directly
curl -sS "$OPENAI_BASE_URL/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1",
    "input": "Say hello in three words.",
    "max_output_tokens": 32
  }' | jq '.output[0].content[0].text'

# Expected output (string): "Hi there now."

# Anthropic translation — should hit claude-opus-4-7 but return OpenAI-shaped JSON
curl -sS "$OPENAI_BASE_URL/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-7-via-gateway",
    "input": "Say hello in three words.",
    "max_output_tokens": 32
  }' | jq '.output[0].content[0].text'

# Expected output (string): "Hello there friend."

If the second curl returns the same shape as the first (a responses payload with output[0].content[0].text populated), the translation is working. If it returns Anthropic’s native shape (content[0].text at the top level), the gateway isn’t translating, recheck the translation key in the route config.

Step 5: Run Codex CLI through the gateway

With the env vars set, the gateway running, and the curl verified, the actual Codex CLI invocation is unchanged from a normal OpenAI run:

codex chat "Refactor the auth handler in src/api/auth.ts to use the new SessionManager"

Watch the gateway logs (or the Agent Command Center traces tab), you should see a span with provider=anthropic, model=claude-opus-4-7-20260420, and a tool_calls block carrying the bash and apply_patch invocations Codex CLI fires during the refactor. The CLI sees standard OpenAI shapes; the gateway dispatches against Anthropic; both sides are happy.


Provider-specific notes

Each provider has one or two gotchas the gateway has to handle. If you’re evaluating a gateway, ask explicitly whether each is covered.

Anthropic Claude

  • Tool-use translation. Anthropic returns tool calls as tool_use content blocks; Codex CLI expects OpenAI’s flat tool_calls array. The gateway has to rewrite the block on every response.
  • System-prompt placement. OpenAI accepts system as a role inside input; Anthropic accepts it as a top-level system field outside the messages array. The gateway has to move it.
  • Streaming chunks. Anthropic streams content_block_delta events; OpenAI streams delta.content and delta.tool_calls.function.arguments. The gateway has to re-emit SSE in OpenAI’s shape or Codex CLI’s renderer breaks.
  • anthropic-version header. Pin it explicitly. Tool-use behaviour silently differs between 2023-06-01 and 2026-04-15.

Google Gemini

  • Function-call shape. Gemini returns functionCall objects with name and args. Codex CLI expects tool_calls[].function.name and tool_calls[].function.arguments (arguments stringified as JSON). The gateway re-keys and stringifies.
  • Safety filters. Gemini’s default safety filters block code completions that mention auth, crypto, or network patterns. Set safety_settings to permissive at the gateway or you will see empty responses on normal refactor turns.
  • Vertex AI vs. AI Studio. Vertex needs Google service-account auth; AI Studio uses a simple API key. Pick one in the gateway config.

Mistral

  • OpenAI-compatible endpoint. Mistral’s API is closer to OpenAI’s shape than Anthropic’s or Gemini’s, so the translation is lighter, most gateways use a passthrough mode.
  • Tool calling. Matches OpenAI’s exactly for mistral-large-2-2026 and newer. Pin the new model.
  • EU residency. Point at api.mistral.ai/eu/v1 and confirm the gateway preserves the regional endpoint.

OSS models via vLLM

  • OpenAI-compatible by design. vLLM ships an OpenAI-compatible server; the gateway just routes (translation: "passthrough").
  • Tool calling. Llama-4-405B-Instruct and Qwen-3-235B-Code support it; older Llama-3.x finetunes often don’t. Test with a tool_choice: required curl first.
  • Context window. If you route a 100K-token Codex CLI turn to a 32K-context OSS model, the gateway should reject the request, confirm rejection happens before the CLI hangs.

Five gateways that ship the translation layer

The walkthrough above used Future AGI as the reference because the routing config is declarative and the trace data feeds back into the optimizer. The other four picks all ship the OpenAI-to-other-provider translation in production today. Scored on five axes weighted toward implementation friction: OpenAI-compatible passthrough, multi-provider translation depth, tool-call fidelity, declarative routing config, and self-host posture.

1. Future AGI Agent Command Center

Endpoint: https://gateway.futureagi.com/v1

Walkthrough fit. The YAML in Step 2 is taken verbatim from the Future AGI gateway. Codex CLI points at the gateway with no SDK changes; the translation key per route handles OpenAI-Responses-to-Anthropic-Messages (or Gemini, or Mistral) rewrites including tool calls. Coverage: OpenAI, Anthropic, Gemini, Mistral, Bedrock, Azure, Cohere, Groq, Together, Fireworks, plus any OpenAI-compatible OSS server (Ollama, vLLM, LM Studio).

The loop. Every Codex CLI turn becomes a span tree via traceAI (Apache 2.0). fi.evals scores tool-use accuracy, code correctness, and task completion. Low-scoring turns cluster by failure mode in the Agent Command Center, “Opus called on a turn with <8K input where Sonnet would have done it” surfaces automatically. fi.opt.optimizers (ProTeGi, BayesianSearchOptimizer, GEPAOptimizer) rewrites the routing policy against the clusters; the next deploy uses the updated route. Teams typically see Codex CLI spend drop 22-34% in four weeks without changing developer behaviour. Three OSS building blocks (traceAI, ai-evaluation, agent-opt) are all Apache 2.0.

Protect (prompt-injection and PII guardrail) runs inline at ~67ms text overhead per arXiv 2510.13351, fast enough to leave on by default for Codex CLI traffic carrying web-scraped tokens.

Pricing. Free tier with 100K traces/month. Scale from $99/month. Enterprise custom with SOC 2 Type II certified, BAA, AWS Marketplace.

Score: Passthrough, yes (base_url swap). Multi-provider, 11+. Tool-call fidelity, confirmed on gpt-5.1, claude-opus-4-7, gemini-2.5-pro. Declarative routing, yes (YAML). Self-host. Apache 2.0, BYOC, air-gapped. 5/5.

2. Portkey

Endpoint: https://api.portkey.ai/v1

Walkthrough fit. Drop-in alternative for the base-URL swap. Requires an x-portkey-api-key header alongside OPENAI_API_KEY. Codex CLI has no generic “extra-headers” config, so a small wrapper script injects it. 250+ adapters, the broadest library here. YAML routing with conditions on token count, model, and metadata.

Caveat. Palo Alto Networks announced intent to acquire Portkey on April 30, 2026; the deal closes in PANW’s fiscal Q4 2026, with the gateway becoming the AI Gateway for Prisma AIRS. Verify standalone-product continuity before signing multi-year. No optimizer.

Score: Passthrough, yes (with header). Multi-provider, 250+. Tool-call fidelity, confirmed. Declarative routing, yes. Self-host. MIT core + closed control plane, BYOC supported. 4.5/5.

3. LiteLLM

Endpoint: http://<your-litellm-proxy>:4000/v1

Walkthrough fit. Source-available Python proxy you run inside your VPC. 100+ providers behind an OpenAI-compatible surface. Routing config is config.yaml plus optional pre-call hooks for token-count-aware rules. Tool-call passthrough works cleanly for Anthropic and Gemini in the May 2026 release line.

Caveat. March 24, 2026 PyPI supply-chain compromise on 1.82.7 and 1.82.8 (Datadog Security Labs TeamPCP writeup); remediated past 1.83.7. Pin commit hashes or version-lock past 1.83.7 and rotate credentials touched by affected installs. Python runtime ~35ms P95 same-provider vs ~18ms for Go binaries; under high concurrency the gap widens.

Score: Passthrough, yes. Multi-provider, 100+. Tool-call fidelity, confirmed. Declarative routing, partial (YAML + Python hook). Self-host. MIT, full self-host. 4/5.

4. Maxim Bifrost

Endpoint: https://bifrost.<your-region>.maxim.ai/v1

Walkthrough fit. Go-binary gateway tuned for throughput, vendor cites ~11µs mean overhead at 5,000 RPS on t3.xlarge. Translates OpenAI Responses to Anthropic, Gemini, Mistral, Bedrock, Azure. Declarative routing config. Bifrost’s Code Mode pitch is more directly aimed at Claude Code than Codex CLI, but the OpenAI-compatible surface works either way.

Score: Passthrough, yes. Multi-provider, ~15 providers. Tool-call fidelity, confirmed. Declarative routing, yes. Self-host, yes (Go binary). 4/5.

5. OpenRouter

Endpoint: https://openrouter.ai/api/v1

Walkthrough fit. Lowest-friction option for solo developers or 3-5 person teams. One API key, one base URL, 200+ models. Address any model by its OpenRouter slug (anthropic/claude-opus-4-7, google/gemini-2.5-pro, meta-llama/llama-4-maverick-405b).

Caveat. Cost-aware routing is caller-side. To route easy turns to a cheaper model you need a wrapper around Codex CLI. OpenRouter doesn’t have a declarative “if input < 8K → route here” config. No semantic cache, no per-virtual-key budgets, no self-host. Closed source.

Score: Passthrough, yes. Multi-provider, 200+. Tool-call fidelity, confirmed. Declarative routing, no. Self-host, no. 3.5/5.


Common mistakes

MistakeWhat goes wrongFix
Setting OPENAI_API_KEY but forgetting OPENAI_BASE_URLCodex CLI keeps hitting api.openai.com directly with the gateway key, returns 401Set both env vars; verify with env | grep OPENAI_
Pointing the gateway at Anthropic without the tool_usetool_calls translationCodex CLI sees Anthropic’s native shape, fires no tool calls, hangsConfirm the gateway’s translation field is set (Future AGI), or that the adapter version handles tool-call rewriting (Portkey, LiteLLM, OpenRouter all do as of May 2026)
Forgetting to pin model versions in the gateway configThe gateway routes to a model that updated between your eval run and prod, behaviour driftsPin explicit versions: gpt-5.1-2026-04-15, claude-opus-4-7-20260420, gemini-2.5-pro
Buffering streaming responses through the gatewayCodex CLI’s progress UI freezes mid-turn, developer thinks the agent hungConfirm SSE pass-through, not buffer-and-batch — the curl in Step 4 should stream tokens, not return all at once
Routing every turn to the flagship modelBurns 2.5-4x more tokens than necessary on the 60%+ of easy turnsAdd a token-count routing rule: under 8-10K input → cheaper model; over → flagship
Setting hard budget caps without a soft alert at 80%Codex CLI pauses mid-conversation, breaking the developer’s flowSoft-alert at 80% (Slack), hard-pause at 110% (HTTP 429)
Skipping the verification curl in Step 4First real Codex CLI session fails silently, hours of debuggingAlways run the two-curl sanity check before pointing the CLI at the gateway

Where this fits in the Future AGI loop

The setup above implements multi-provider routing as a one-time configuration. To make it self-improving, wire fi.evals to score every turn (tool-use accuracy, code correctness, task completion) and feed low-score traces into fi.opt.optimizers. The optimizer rewrites the routing policy against clustered failures; the next request uses the updated route. That’s the closed loop Future AGI ships end-to-end, three OSS components (traceAI, ai-evaluation, agent-opt), all Apache 2.0; the hosted Agent Command Center adds the failure-cluster view, RBAC, and procurement.

The other gateways are observation and translation layers. Codex CLI gets multi-provider routing, but the policy is static. Future AGI’s version is the same translation layer with the loop wired in, so the policy gets better at choosing the cheaper model for easy turns and the stronger model for hard turns every week instead of staying flat.



Sources

  • OpenAI Codex CLI repository and configuration docs, github.com/openai/codex
  • Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
  • Portkey AI gateway, portkey.ai
  • LiteLLM proxy, github.com/BerriAI/litellm
  • Maxim Bifrost, getmaxim.ai/bifrost
  • OpenRouter models directory, openrouter.ai/models
  • Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey-to-secure-the-rise-of-ai-agents
  • Datadog Security Labs writeup on LiteLLM PyPI compromise (TeamPCP campaign, March 24, 2026), securitylabs.datadoghq.com/articles/litellm-compromised-pypi-teampcp-supply-chain-campaign
  • Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (67ms text, 109ms image)
  • Anthropic Messages API reference, docs.anthropic.com/en/api/messages
  • Google Gemini API reference, ai.google.dev/api
  • Mistral API reference, docs.mistral.ai/api
  • vLLM OpenAI-compatible server, docs.vllm.ai/en/latest/serving/openai_compatible_server.html

Frequently asked questions

Does Codex CLI support `OPENAI_BASE_URL` or do I need `OPENAI_API_BASE`?
Both work. Codex CLI `0.18+` prefers `OPENAI_BASE_URL`; earlier builds read `OPENAI_API_BASE`. Set both and you are covered across versions.
Can I route Codex CLI to multiple providers in the same session?
Yes, with a routing rule keyed on input-token count or tool-call presence. Future AGI, Portkey, LiteLLM, and Maxim Bifrost support this declaratively. OpenRouter requires a caller-side wrapper.
Will tool calls (`bash`, `apply_patch`) work when routed to Claude or Gemini?
Yes, if the gateway translates `tool_use` (Anthropic) or `functionCall` (Gemini) back into OpenAI's `tool_calls` shape. All five gateways above do this as of May 2026. Older proxies flattened tool calls into text — confirm the test matrix before adopting.
How much latency does the gateway add per Codex CLI turn?
Future AGI averages ~18ms P95 same-provider and ~42ms cross-provider. Maxim Bifrost cites ~11µs mean at 5,000 RPS. Portkey ~25ms / ~55ms. LiteLLM ~35ms / ~70ms (Python runtime). OpenRouter ~22ms. Cross-provider hops are slower because the translation pass costs real work.
Is it safe to send source code from Codex CLI through a hosted gateway?
For hosted gateways, the path is gateway → provider; both endpoints already see the code. If compliance forbids the hosted hop, pick self-hosted LiteLLM or Future AGI's BYOC, with provider traffic egressing through your own network. OpenRouter is cloud-only.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.