Guides

Using OpenAI Codex CLI with Multiple Model Providers in 2026: A Gateway Setup Guide

Q: Does Codex CLI support `OPENAI_BASE_URL` or do I need `OPENAI_API_BASE`?

Both work. Codex CLI `0.18+` prefers `OPENAI_BASE_URL`; earlier builds read `OPENAI_API_BASE`. Set both and you are covered across versions.

Q: Can I route Codex CLI to multiple providers in the same session?

Yes, with a routing rule keyed on input-token count or tool-call presence. Future AGI, Portkey, LiteLLM, and Maxim Bifrost support this declaratively. OpenRouter requires a caller-side wrapper.

Q: Will tool calls (`bash`, `apply_patch`) work when routed to Claude or Gemini?

Yes, if the gateway translates `tool_use` (Anthropic) or `functionCall` (Gemini) back into OpenAI's `tool_calls` shape. All five gateways above do this as of May 2026. Older proxies flattened tool calls into text — confirm the test matrix before adopting.

Q: How much latency does the gateway add per Codex CLI turn?

Future AGI averages ~18ms P95 same-provider and ~42ms cross-provider. Maxim Bifrost cites ~11µs mean at 5,000 RPS. Portkey ~25ms / ~55ms. LiteLLM ~35ms / ~70ms (Python runtime). OpenRouter ~22ms. Cross-provider hops are slower because the translation pass costs real work.

Q: Is it safe to send source code from Codex CLI through a hosted gateway?

For hosted gateways, the path is gateway → provider; both endpoints already see the code. If compliance forbids the hosted hop, pick self-hosted LiteLLM or Future AGI's BYOC, with provider traffic egressing through your own network. OpenRouter is cloud-only.

Walkthrough for pointing OpenAI Codex CLI at Anthropic, Gemini, Mistral, OSS models through an AI gateway in 2026, with 5 gateway picks scored.

April 4, 2026

12 min read

ai-gateway 2026 codex-cli

OpenAI Codex CLI ships with one assumption hard-coded into it: every model on the other side of OPENAI_API_KEY is an OpenAI model speaking OpenAI’s Responses API. Point it directly at api.anthropic.com or generativelanguage.googleapis.com and the very first bash tool call returns a 401 or a malformed function-call block. The CLI loops, you stare at a frozen progress indicator, the logs say “tool_calls field missing.”

The fix is to put an AI gateway in front of Codex CLI. The gateway accepts the OpenAI-shaped request, translates the body and the tool-call JSON for whichever provider you want to land on, and streams a response Codex CLI can render without modification. This guide walks through the setup end-to-end (environment variables, routing config, model aliases, verification curl) and names five gateways that ship the translation layer in production today.

This is the implementation-side companion to the picker post on Codex CLI routing. If you already know which gateway you want and just need the wiring, you’re in the right place.

The problem in one paragraph

Codex CLI reads OPENAI_API_KEY and posts to api.openai.com/v1/responses by default. It sends tool_calls in OpenAI’s function-call shape, expects OpenAI’s SSE delta format on the way back, and uses OpenAI’s specific response_format semantics for structured output. Three things go wrong the moment you change providers without a gateway:

API surface drift. Anthropic’s Messages API is a different endpoint and payload shape; Gemini’s generateContent is different again. Codex CLI doesn’t know how to speak either.
Tool-call shape drift. Anthropic returns tool_use content blocks; Gemini returns functionCall objects. Codex CLI expects tool_calls. A naive proxy that flattens these to text silently breaks the agent, every tool turn returns a string, the CLI sees no structured call, and the loop hangs.
Streaming shape drift. OpenAI streams delta.content and delta.tool_calls.function.arguments chunks. Anthropic streams content_block_delta with a different chunk schema. The CLI’s progress UI is wired to OpenAI’s chunk format; the wrong shape means a frozen terminal.

A gateway built for multi-provider routing handles all three translations inline. The rest of this guide shows the exact configuration.

Prereqs

Before starting, confirm the following versions and accounts:

Component	Minimum version (May 2026)	Notes
Codex CLI	`0.18.x` or later	Earlier builds read `OPENAI_API_BASE`; newer ones prefer `OPENAI_BASE_URL`. Both work.
Node.js	`20.x` LTS	Codex CLI runtime.
Gateway endpoint	A live URL	Hosted (e.g. `gateway.futureagi.com/v1`) or self-hosted (e.g. `http://litellm.internal:4000`).
Provider API keys	Anthropic, Google AI Studio, Mistral, etc.	One per non-OpenAI provider you want to route to.
Shell	bash or zsh	Examples below assume zsh.

The four environment variables that matter for Codex CLI in this configuration:

# Replace OpenAI's default endpoint with the gateway
export OPENAI_BASE_URL="https://gateway.futureagi.com/v1"

# Older Codex CLI builds (<= 0.16) used this alias instead. Set both for safety.
export OPENAI_API_BASE="$OPENAI_BASE_URL"

# Authenticate to the gateway, not to OpenAI directly
export OPENAI_API_KEY="fagi_sk_live_..."

# Optional: pin a default model alias that the gateway will route on
export CODEX_MODEL="claude-opus-4-7-via-gateway"

Set these in ~/.zshrc (or ~/.bashrc), reload, and you’re ready for the gateway-side configuration.

Setup walkthrough

Five steps, each with the exact code you need. We use Future AGI Agent Command Center for the first walkthrough because the routing config is declarative; the same shapes work for Portkey and LiteLLM with minor key-name differences (called out in the provider notes section below).

Step 1: Override `OPENAI_BASE_URL`

Codex CLI honors OPENAI_BASE_URL as the canonical override. Set it once in your shell profile and every codex invocation inherits it.

# ~/.zshrc
export OPENAI_BASE_URL="https://gateway.futureagi.com/v1"
export OPENAI_API_KEY="fagi_sk_live_xxxxxxxxxxxxxxxxxxxx"

# Reload
source ~/.zshrc

# Confirm
codex --help 2>&1 | head -3

If you’re wiring a CI environment or a remote workstation, set the same two variables in the runner’s environment. Codex CLI doesn’t read a config file by default; the env vars are the source of truth.

Step 2: Configure gateway routing

The routing config tells the gateway which model alias maps to which underlying provider model, and which provider key to use. This is declarative YAML on the Future AGI gateway and on Portkey; it’s Python on LiteLLM. Future AGI’s shape:

# /etc/fagi-gateway/routes.yaml
routes:
  - alias: "gpt-5.1"
    provider: "openai"
    model: "gpt-5.1-2026-04-15"
    api_key_ref: "openai_team_key"

  - alias: "claude-opus-4-7-via-gateway"
    provider: "anthropic"
    model: "claude-opus-4-7-20260420"
    api_key_ref: "anthropic_team_key"
    translation: "openai_responses_v1"

  - alias: "gemini-2.5-pro-via-gateway"
    provider: "google"
    model: "gemini-2.5-pro"
    api_key_ref: "google_ai_studio_key"
    translation: "openai_responses_v1"

  - alias: "mistral-large-via-gateway"
    provider: "mistral"
    model: "mistral-large-2-2026"
    api_key_ref: "mistral_team_key"
    translation: "openai_responses_v1"

  - alias: "llama-4-405b-via-gateway"
    provider: "openai_compatible"
    base_url: "http://vllm-internal:8000/v1"
    model: "meta-llama/Llama-4-405B-Instruct"
    translation: "passthrough"

routing_policy:
  default: "gpt-5.1"
  rules:
    - if: "input_tokens < 8000"
      route_to: "gemini-2.5-pro-via-gateway"
    - if: "tools_include('apply_patch') and input_tokens > 30000"
      route_to: "claude-opus-4-7-via-gateway"

attributes:
  fi.attributes.user.id: "${headers.x-developer-email}"
  fi.attributes.repo: "${headers.x-repo}"

The translation: "openai_responses_v1" key is doing the heavy lifting. It tells the gateway: accept an OpenAI Responses-API request, translate the body to the target provider’s native format, dispatch, and translate the response back, including the tool-call blocks. The attributes block tags each request with developer and repo metadata so the Agent Command Center dashboard can slice cost by both.

Step 3: Map model aliases at the Codex CLI side

Codex CLI takes the model name from a few places. In rough precedence order: the --model flag on the command line, the model field in ~/.codex/config.toml, the CODEX_MODEL environment variable, and finally its built-in default of gpt-5.1.

Set the alias to match a route in the gateway config:

# ~/.codex/config.toml
[default]
model = "claude-opus-4-7-via-gateway"
max_tokens = 8192
temperature = 0.2

[profiles.frontend]
model = "gemini-2.5-pro-via-gateway"

[profiles.refactor]
model = "claude-opus-4-7-via-gateway"

[profiles.oss]
model = "llama-4-405b-via-gateway"

Now codex chat defaults to the Anthropic route; codex --profile frontend chat flips to Gemini; codex --profile oss chat lands on the self-hosted Llama-4 served by vLLM. Codex CLI doesn’t know any of this, it just sends model: "claude-opus-4-7-via-gateway" in the JSON body, and the gateway’s routing table resolves it.

Step 4: Verify with a curl

Before running a real Codex CLI session, confirm the gateway is translating correctly. Two curls (one OpenAI passthrough, one Anthropic translation) should both return OpenAI-shaped responses:

# OpenAI passthrough — should hit gpt-5.1 directly
curl -sS "$OPENAI_BASE_URL/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.1",
    "input": "Say hello in three words.",
    "max_output_tokens": 32
  }' | jq '.output[0].content[0].text'

# Expected output (string): "Hi there now."

# Anthropic translation — should hit claude-opus-4-7 but return OpenAI-shaped JSON
curl -sS "$OPENAI_BASE_URL/responses" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-7-via-gateway",
    "input": "Say hello in three words.",
    "max_output_tokens": 32
  }' | jq '.output[0].content[0].text'

# Expected output (string): "Hello there friend."

If the second curl returns the same shape as the first (a responses payload with output[0].content[0].text populated), the translation is working. If it returns Anthropic’s native shape (content[0].text at the top level), the gateway isn’t translating, recheck the translation key in the route config.

Step 5: Run Codex CLI through the gateway

With the env vars set, the gateway running, and the curl verified, the actual Codex CLI invocation is unchanged from a normal OpenAI run:

codex chat "Refactor the auth handler in src/api/auth.ts to use the new SessionManager"

Watch the gateway logs (or the Agent Command Center traces tab), you should see a span with provider=anthropic, model=claude-opus-4-7-20260420, and a tool_calls block carrying the bash and apply_patch invocations Codex CLI fires during the refactor. The CLI sees standard OpenAI shapes; the gateway dispatches against Anthropic; both sides are happy.

Provider-specific notes

Each provider has one or two gotchas the gateway has to handle. If you’re evaluating a gateway, ask explicitly whether each is covered.

Anthropic Claude

Tool-use translation. Anthropic returns tool calls as tool_use content blocks; Codex CLI expects OpenAI’s flat tool_calls array. The gateway has to rewrite the block on every response.
System-prompt placement. OpenAI accepts system as a role inside input; Anthropic accepts it as a top-level system field outside the messages array. The gateway has to move it.
Streaming chunks. Anthropic streams content_block_delta events; OpenAI streams delta.content and delta.tool_calls.function.arguments. The gateway has to re-emit SSE in OpenAI’s shape or Codex CLI’s renderer breaks.
anthropic-version header. Pin it explicitly. Tool-use behaviour silently differs between 2023-06-01 and 2026-04-15.

Google Gemini

Function-call shape. Gemini returns functionCall objects with name and args. Codex CLI expects tool_calls[].function.name and tool_calls[].function.arguments (arguments stringified as JSON). The gateway re-keys and stringifies.
Safety filters. Gemini’s default safety filters block code completions that mention auth, crypto, or network patterns. Set safety_settings to permissive at the gateway or you will see empty responses on normal refactor turns.
Vertex AI vs. AI Studio. Vertex needs Google service-account auth; AI Studio uses a simple API key. Pick one in the gateway config.

Mistral

OpenAI-compatible endpoint. Mistral’s API is closer to OpenAI’s shape than Anthropic’s or Gemini’s, so the translation is lighter, most gateways use a passthrough mode.
Tool calling. Matches OpenAI’s exactly for mistral-large-2-2026 and newer. Pin the new model.
EU residency. Point at api.mistral.ai/eu/v1 and confirm the gateway preserves the regional endpoint.

OSS models via vLLM

OpenAI-compatible by design. vLLM ships an OpenAI-compatible server; the gateway just routes (translation: "passthrough").
Tool calling. Llama-4-405B-Instruct and Qwen-3-235B-Code support it; older Llama-3.x finetunes often don’t. Test with a tool_choice: required curl first.
Context window. If you route a 100K-token Codex CLI turn to a 32K-context OSS model, the gateway should reject the request, confirm rejection happens before the CLI hangs.

Five gateways that ship the translation layer

The walkthrough above used Future AGI as the reference because the routing config is declarative and the trace data feeds back into the optimizer. The other four picks all ship the OpenAI-to-other-provider translation in production today. Scored on five axes weighted toward implementation friction: OpenAI-compatible passthrough, multi-provider translation depth, tool-call fidelity, declarative routing config, and self-host posture.

1. Future AGI Agent Command Center

Endpoint: https://gateway.futureagi.com/v1

Walkthrough fit. The YAML in Step 2 is taken verbatim from the Future AGI gateway. Codex CLI points at the gateway with no SDK changes; the translation key per route handles OpenAI-Responses-to-Anthropic-Messages (or Gemini, or Mistral) rewrites including tool calls. Coverage: OpenAI, Anthropic, Gemini, Mistral, Bedrock, Azure, Cohere, Groq, Together, Fireworks, plus any OpenAI-compatible OSS server (Ollama, vLLM, LM Studio).

The loop. Every Codex CLI turn becomes a span tree via traceAI (Apache 2.0). fi.evals scores tool-use accuracy, code correctness, and task completion. Low-scoring turns cluster by failure mode in the Agent Command Center, “Opus called on a turn with <8K input where Sonnet would have done it” surfaces automatically. fi.opt.optimizers (ProTeGi, BayesianSearchOptimizer, GEPAOptimizer) rewrites the routing policy against the clusters; the next deploy uses the updated route. Teams typically see Codex CLI spend drop 22-34% in four weeks without changing developer behaviour. Three OSS building blocks (traceAI, ai-evaluation, agent-opt) are all Apache 2.0.

Protect (prompt-injection and PII guardrail) runs inline at 65 ms text median time-to-label overhead per arXiv 2510.13351, fast enough to leave on by default for Codex CLI traffic carrying web-scraped tokens.

Pricing. Free tier with 100K traces/month. Scale from $99/month. Enterprise custom with SOC 2 Type II certified, BAA, AWS Marketplace.

Score: Passthrough, yes (base_url swap). Multi-provider, 11+. Tool-call fidelity, confirmed on gpt-5.1, claude-opus-4-7, gemini-2.5-pro. Declarative routing, yes (YAML). Self-host. Apache 2.0, BYOC, air-gapped. 5/5.

2. Portkey

Endpoint: https://api.portkey.ai/v1

Walkthrough fit. Drop-in alternative for the base-URL swap. Requires an x-portkey-api-key header alongside OPENAI_API_KEY. Codex CLI has no generic “extra-headers” config, so a small wrapper script injects it. 250+ adapters, the broadest library here. YAML routing with conditions on token count, model, and metadata.

Caveat. Palo Alto Networks announced intent to acquire Portkey on April 30, 2026; the deal closes in PANW’s fiscal Q4 2026, with the gateway becoming the AI Gateway for Prisma AIRS. Verify standalone-product continuity before signing multi-year. No optimizer.

Score: Passthrough, yes (with header). Multi-provider, 250+. Tool-call fidelity, confirmed. Declarative routing, yes. Self-host. MIT core + closed control plane, BYOC supported. 4.5/5.

3. LiteLLM

Endpoint: http://<your-litellm-proxy>:4000/v1

Walkthrough fit. Source-available Python proxy you run inside your VPC. 20+ providers via six native adapters (OpenAI, Anthropic, Gemini, Bedrock, Cohere, Azure) plus OpenAI-compatible presets and self-hosted backends behind an OpenAI-compatible surface. Routing config is config.yaml plus optional pre-call hooks for token-count-aware rules. Tool-call passthrough works cleanly for Anthropic and Gemini in the May 2026 release line.

Caveat. March 24, 2026 PyPI supply-chain compromise on 1.82.7 and 1.82.8 (Datadog Security Labs TeamPCP writeup); remediated past 1.83.7. Pin commit hashes or version-lock past 1.83.7 and rotate credentials touched by affected installs. Python runtime ~35ms P95 same-provider vs ~18ms for Go binaries; under high concurrency the gap widens.

Score: Passthrough, yes. Multi-provider, 100+. Tool-call fidelity, confirmed. Declarative routing, partial (YAML + Python hook). Self-host. MIT, full self-host. 4/5.

4. Maxim Bifrost

Endpoint: https://bifrost.<your-region>.maxim.ai/v1

Walkthrough fit. Go-binary gateway tuned for throughput, vendor cites ~11µs mean overhead at 5,000 RPS on t3.xlarge. Translates OpenAI Responses to Anthropic, Gemini, Mistral, Bedrock, Azure. Declarative routing config. Bifrost’s Code Mode pitch is more directly aimed at Claude Code than Codex CLI, but the OpenAI-compatible surface works either way.

Score: Passthrough, yes. Multi-provider, ~15 providers. Tool-call fidelity, confirmed. Declarative routing, yes. Self-host, yes (Go binary). 4/5.

5. OpenRouter

Endpoint: https://openrouter.ai/api/v1

Walkthrough fit. Lowest-friction option for solo developers or 3-5 person teams. One API key, one base URL, 200+ models. Address any model by its OpenRouter slug (anthropic/claude-opus-4-7, google/gemini-2.5-pro, meta-llama/llama-4-maverick-405b).

Caveat. Cost-aware routing is caller-side. To route easy turns to a cheaper model you need a wrapper around Codex CLI. OpenRouter doesn’t have a declarative “if input < 8K → route here” config. No semantic cache, no per-virtual-key budgets, no self-host. Closed source.

Score: Passthrough, yes. Multi-provider, 200+. Tool-call fidelity, confirmed. Declarative routing, no. Self-host, no. 3.5/5.

Common mistakes

Mistake	What goes wrong	Fix
Setting `OPENAI_API_KEY` but forgetting `OPENAI_BASE_URL`	Codex CLI keeps hitting `api.openai.com` directly with the gateway key, returns 401	Set both env vars; verify with `env \| grep OPENAI_`
Pointing the gateway at Anthropic without the `tool_use` → `tool_calls` translation	Codex CLI sees Anthropic’s native shape, fires no tool calls, hangs	Confirm the gateway’s `translation` field is set (Future AGI), or that the adapter version handles tool-call rewriting (Portkey, LiteLLM, OpenRouter all do as of May 2026)
Forgetting to pin model versions in the gateway config	The gateway routes to a model that updated between your eval run and prod, behaviour drifts	Pin explicit versions: `gpt-5.1-2026-04-15`, `claude-opus-4-7-20260420`, `gemini-2.5-pro`
Buffering streaming responses through the gateway	Codex CLI’s progress UI freezes mid-turn, developer thinks the agent hung	Confirm SSE pass-through, not buffer-and-batch — the curl in Step 4 should stream tokens, not return all at once
Routing every turn to the flagship model	Burns 2.5-4x more tokens than necessary on the 60%+ of easy turns	Add a token-count routing rule: under 8-10K input → cheaper model; over → flagship
Setting hard budget caps without a soft alert at 80%	Codex CLI pauses mid-conversation, breaking the developer’s flow	Soft-alert at 80% (Slack), hard-pause at 110% (HTTP 429)
Skipping the verification curl in Step 4	First real Codex CLI session fails silently, hours of debugging	Always run the two-curl sanity check before pointing the CLI at the gateway

Where this fits in the Future AGI loop

The setup above implements multi-provider routing as a one-time configuration. To make it self-improving, wire fi.evals to score every turn (tool-use accuracy, code correctness, task completion) and feed low-score traces into fi.opt.optimizers. The optimizer rewrites the routing policy against clustered failures; the next request uses the updated route. That’s the closed loop Future AGI ships end-to-end, three OSS components (traceAI, ai-evaluation, agent-opt), all Apache 2.0; the hosted Agent Command Center adds the failure-cluster view, RBAC, and procurement.

The other gateways are observation and translation layers. Codex CLI gets multi-provider routing, but the policy is static. Future AGI’s version is the same translation layer with the loop wired in, so the policy gets better at choosing the cheaper model for easy turns and the stronger model for hard turns every week instead of staying flat.

Sources

OpenAI Codex CLI repository and configuration docs, github.com/openai/codex
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Portkey AI gateway, portkey.ai
LiteLLM proxy, github.com/BerriAI/litellm
Maxim Bifrost, getmaxim.ai/bifrost
OpenRouter models directory, openrouter.ai/models
Palo Alto Networks press release on Portkey acquisition (April 30, 2026), paloaltonetworks.com/company/press/2026/palo-alto-networks-to-acquire-portkey-to-secure-the-rise-of-ai-agents
Datadog Security Labs writeup on LiteLLM PyPI compromise (TeamPCP campaign, March 24, 2026), securitylabs.datadoghq.com/articles/litellm-compromised-pypi-teampcp-supply-chain-campaign
Future AGI Protect latency benchmarks, arxiv.org/abs/2510.13351 (65 ms text / 107 ms image median time-to-label)
Anthropic Messages API reference, docs.anthropic.com/en/api/messages
Google Gemini API reference, ai.google.dev/api
Mistral API reference, docs.mistral.ai/api
vLLM OpenAI-compatible server, docs.vllm.ai/en/latest/serving/openai_compatible_server.html

Frequently asked questions

Does Codex CLI support `OPENAI_BASE_URL` or do I need `OPENAI_API_BASE`?

Both work. Codex CLI `0.18+` prefers `OPENAI_BASE_URL`; earlier builds read `OPENAI_API_BASE`. Set both and you are covered across versions.

Can I route Codex CLI to multiple providers in the same session?

Yes, with a routing rule keyed on input-token count or tool-call presence. Future AGI, Portkey, LiteLLM, and Maxim Bifrost support this declaratively. OpenRouter requires a caller-side wrapper.

Will tool calls (`bash`, `apply_patch`) work when routed to Claude or Gemini?

Yes, if the gateway translates `tool_use` (Anthropic) or `functionCall` (Gemini) back into OpenAI's `tool_calls` shape. All five gateways above do this as of May 2026. Older proxies flattened tool calls into text — confirm the test matrix before adopting.

How much latency does the gateway add per Codex CLI turn?

Future AGI averages ~18ms P95 same-provider and ~42ms cross-provider. Maxim Bifrost cites ~11µs mean at 5,000 RPS. Portkey ~25ms / ~55ms. LiteLLM ~35ms / ~70ms (Python runtime). OpenRouter ~22ms. Cross-provider hops are slower because the translation pass costs real work.

Is it safe to send source code from Codex CLI through a hosted gateway?

For hosted gateways, the path is gateway → provider; both endpoints already see the code. If compliance forbids the hosted hop, pick self-hosted LiteLLM or Future AGI's BYOC, with provider traffic egressing through your own network. OpenRouter is cloud-only.

View all

Guides

AI Gateway for Codex CLI in 2026: The Playbook

Wrap OpenAI Codex CLI in an AI gateway for per-developer budgets, per-call audit trail, and provider flexibility, without changing the CLI command.

Nikhil Pareek · May 15, 2026

11 min

Guides

Best 5 AI Gateways for MCP Tool-Level Observability with Codex CLI in 2026

Five AI gateways scored for MCP tool-level observability with Codex CLI: per-tool latency, success rate, argument validation, MCP auth.

Vrinda Damani · Apr 22, 2026

17 min

Guides

How an MCP Gateway Cuts Token Costs in Claude Code and Codex CLI in 2026

A 2026 architecture essay on why MCP blows up coding-agent token bills in Claude Code and Codex CLI, and five mechanisms that compress cost.

Nikhil Pareek · Apr 13, 2026

14 min

The problem in one paragraph

Prereqs

Setup walkthrough

Step 1: Override OPENAI_BASE_URL

Step 2: Configure gateway routing

Step 3: Map model aliases at the Codex CLI side

Step 4: Verify with a curl

Step 5: Run Codex CLI through the gateway

Provider-specific notes

Anthropic Claude

Google Gemini

Mistral

OSS models via vLLM

Five gateways that ship the translation layer

1. Future AGI Agent Command Center

2. Portkey

3. LiteLLM

4. Maxim Bifrost

5. OpenRouter

Common mistakes

Where this fits in the Future AGI loop

Related reading

Sources

Frequently asked questions

Step 1: Override `OPENAI_BASE_URL`