Guides

Best 5 LiteLLM Alternatives in 2026

Five LiteLLM alternatives on supply-chain posture, self-host burden, UI polish, optimization. What each actually fixes after the March 2026 PyPI incident.

March 2, 2026

16 min read

ai-gateway 2026 alternatives litellm

On March 24, 2026 two malicious LiteLLM releases, 1.82.7 and 1.82.8, landed on PyPI before the maintainers yanked them. The window was under twelve hours, but the package had pulled more than 9 million downloads in the prior 90 days, and several Fortune-500 platform teams discovered they had pinned to litellm>=1.82 in a base image that rebuilt nightly. The incident didn’t change LiteLLM’s fundamentals. MIT-licensed, OpenAI-compatible, the most popular self-hosted AI proxy on GitHub. But it accelerated a conversation already underway: at what scale does running your own Python proxy stop being worth it, and what do you do when supply-chain risk on a critical-path dependency lands on the CISO’s desk?

This guide ranks five gateways worth migrating to, names what each fixes, and walks through the migration that always trips teams up: keeping the OpenAI-compatible API contract intact so that swapping BASE_URL is the only client-side change.

TL;DR: pick by exit reason

Why you are leaving LiteLLM	Pick	Why
You want trace data to feed back into routing and prompts	Future AGI Agent Command Center	Closes the loop from trace through eval to optimizer to route
You want a hosted developer-experience layer with prompt registry	Portkey	Polished UI, virtual keys, prompt studio (note Palo Alto acquisition pending integration)
You need enterprise SLA and plugin ecosystem	Kong AI Gateway	Extends an existing Kong stack with AI-specific policies
You need raw throughput for high-concurrency workloads	Maxim Bifrost	Go-based gateway tuned for low-latency, high-RPS routing
You want lightweight hosted observability with less surface area	Helicone	Drop-in proxy with per-request cost and session traces

Why people are leaving LiteLLM in 2026

Five exit drivers show up in HN threads on the March compromise, /r/LLMDevs discussions, and the LiteLLM GitHub issue tracker.

1. The March 24, 2026 PyPI compromise (1.82.7 / 1.82.8)

The two compromised releases shipped a credential-harvesting payload that scanned the host environment for OPENAI_API_KEY, ANTHROPIC_API_KEY, AWS credentials, and database URLs and exfiltrated them via DNS tunneling. BerriAI maintainers detected the anomaly within the day, yanked both versions, rotated signing keys, and published a postmortem on March 26. Pinning to 1.82.6 is one line of code; explaining to a CISO why a credential-harvesting proxy ran in production for any window at all is a multi-week project. For regulated industries the incident didn’t break LiteLLM, but it broke its risk profile.

2. Operational burden of self-host at scale

LiteLLM is a Python proxy you deploy and operate yourself. At low scale this is a feature: one container, no vendor bill, full source-audit posture. At higher scale the operational tax compounds. A typical cluster at 10 to 20M req/mo runs Postgres for keys and budgets, Redis for rate-limiting, an OTel stack for observability, Grafana dashboards, and a CI pipeline that rebuilds against upstream on every release, a small platform someone has to keep running 24x7. Several teams report their footprint at 20M req/mo is comparable to a hosted bill plus engineering time, and the engineering time is the part that doesn’t scale.

3. Limited polished UI

LiteLLM’s admin UI covers virtual keys, budgets, team and user views, and model lists. It doesn’t ship a session-level trace viewer engineers use to debug agent behavior day-to-day, and it doesn’t ship a polished prompt registry. The bundled UI is the weakest in this cohort by community consensus. Teams that want a developer-experience layer typically pair LiteLLM with Langfuse, FAGI, or Helicone for traces and an in-repo Jinja2 store for templates, a three- or four-product stack some alternatives ship as a single surface.

4. No built-in optimization loop

LiteLLM’s traces inform humans, not the gateway itself. There’s no eval suite scoring every captured trace, no failure-clusterer, no optimizer that rewrites a prompt or shifts a routing rule based on accumulated quality and cost data. Teams that want the loop wire LiteLLM to a separate eval product and a separate optimizer, then write the glue. For workloads where prompt quality and routing policy are the long-term cost driver, improvement velocity depends on engineering time rather than the system improving itself between deploys.

5. Supply-chain risk for regulated industries

The March incident was the trigger but not the cause. Open-source Python proxies in critical-path positions carry irreducible supply-chain risk: every release pulls a transitive dependency graph no single team audits in full, and regulated industries increasingly require an SBOM and a documented vulnerability-response process. Hosted gateways with a single vendor on the hook for SOC 2 Type II, ISO 27001, and a published incident-response policy are easier to defend to an external auditor than a self-hosted proxy the platform team owns end-to-end.

What to look for in a LiteLLM replacement

Score replacements on the seven axes that map to the surfaces you’re actually migrating off:

Axis	What it measures
1. Supply-chain posture	Is there a single vendor on the hook for SBOM, audit, and incident response?
2. Operational burden	How much platform-engineering time does the gateway consume at 10–20M req/mo?
3. UI polish	Is the session-level trace viewer and prompt-registry UX production-grade out of the box?
4. Eval + optimizer loop	Does the gateway use its own trace data to improve routing and prompts?
5. OpenAI-compatible contract	Can you swap `BASE_URL` without rewriting client code?
6. Virtual-key parity	Can you issue per-identity keys that fan out to one provider key?
7. Migration tooling	Are there published scripts or importers for LiteLLM specifically?

1. Future AGI Agent Command Center: Best for closing the loop

Verdict: Future AGI is the only gateway in this list that fixes LiteLLM’s two structural gaps in one product (the missing optimization loop and the missing polished session-level UI) without giving up the OpenAI-compatible contract. Agent Command Center captures the trace, scores it, clusters failures, runs the optimizer, and pushes the updated route or prompt back into the gateway on the next request. The other four are observation or routing layers; FAGI is an observation layer wired to an optimizer, plus the session UI LiteLLM never built.

What it fixes versus LiteLLM:

Supply-chain posture. FAGI’s hosted Command Center sits behind a single vendor with SOC 2 Type II, AWS Marketplace procurement, and a published incident-response policy. OSS instrumentation (traceAI, ai-evaluation, agent-opt, all Apache 2.0) ships from separate repos with signed releases.
Operational burden goes away. Postgres, Redis, OTel, Grafana, all hosted. The platform team owns identity federation and network policy, not 24x7 proxy ops.
Native session-level UI. Every trace is a session timeline with cost, latency, tool calls, eval scores, and the input/output for each LLM call inline.
The optimization loop. Trace -> eval -> failure cluster -> optimizer -> updated prompt or route. Eval rubrics (task-completion, faithfulness, tool-use) score every trace by default. The optimizer (agent-opt) rewrites prompts via six optimizers — ProTeGi, GEPA, Bayesian, MetaPrompt, RandomSearch, PromptWizard.
OpenAI-compatible contract preserved. Migration from LiteLLM is BASE_URL plus a key swap for the common case.
Protect guardrails inline. ~67 ms median text-mode latency (arXiv 2510.13351), fast enough to ship on every request without breaking a sub-second SLO.

Migration from LiteLLM: Provider keys, per-identity virtual keys, routing rules, and metadata headers map directly. The OpenAI-compatible contract is preserved end-to-end. FAGI imports the LiteLLM team_id / user_id inventory and generates the equivalent key set. In-repo Jinja2 prompts are accepted by FAGI’s prompt registry without rewriting. Timeline: five to eight engineering days for under 50 virtual keys and a sub-200-prompt registry, including shadow traffic.

Where it falls short:

agent-opt is opt-in, start with traceAI + ai-evaluation in week one and turn the optimizer on once eval baselines stabilize. The loop compounds value over weeks rather than at day one.
Self-host of the full Command Center is BYOC only, not single-binary. Teams who want strict on-prem keep using the OSS libraries (traceAI, ai-evaluation, agent-opt) standalone.

Pricing: Free tier with 100K traces/month. Scale tier from $99/month with linear per-trace scaling above 5M (no add-on multipliers). Enterprise with SOC 2 Type II and AWS Marketplace.

Score: 7 of 7 axes.

2. Portkey: Best for hosted developer experience

Verdict: Portkey is the pick when the LiteLLM exit reason is “we want a polished hosted UI with a prompt studio and virtual keys, and we don’t want to operate a Python proxy anymore.” Portkey ships the developer-experience surface LiteLLM doesn’t, and its OpenAI-compatible contract makes migration a BASE_URL change for the common case. One caveat: Palo Alto Networks announced the acquisition of Portkey on April 30, 2026; the public position is “no change for 12 months,” but the standalone SMB SKU’s roadmap is now tied to the Prisma AIRS integration timeline.

What it fixes versus LiteLLM:

Polished hosted UI. Session-level trace viewer, prompt studio with version diffs, virtual-key dashboard, per-key and per-route cost breakdown.
Prompt registry. Prompt Studio stores prompts as versioned objects, references them by prompt_id, and renders them server-side at request time.
Virtual keys with bulk-pricing fanout. Per-developer or per-service Portkey-issued keys fan out to one underlying provider key. LiteLLM has the concept (team_id / user_id) but the UX is thinner.
Operational burden goes away. Hosted by default. No Postgres, Redis, or OTel collector to run.

Migration from LiteLLM: OpenAI-compatible endpoint, provider keys, virtual-key concept, and routing rules all map cleanly. In-repo Jinja2 prompts need rewriting to Portkey’s {{handlebars}}-shaped dialect if you adopt Prompt Studio; teams that skip the rewrite keep prompts in-repo and use Portkey only for routing and observability. Timeline: five to seven engineering days for the proxy cutover, plus another sprint if you adopt Prompt Studio.

Where it falls short:

The Palo Alto acquisition creates roadmap uncertainty for the standalone SMB SKU.
Pricing escalates above 5M requests/month, a workload at $499/month at 4M can grow to $2,400–$3,200/month at 12M with add-ons.
No optimizer. Traces inform humans, not the gateway.
Prompt Studio’s proprietary template dialect creates the same lock-in vector LiteLLM teams sometimes leave to escape.

Pricing: Free tier with limits. Scale from $99/month. Enterprise custom.

Score: 5 of 7 axes (missing: optimizer, vendor-independence post-acquisition).

3. Kong AI Gateway: Best for enterprise platform teams

Verdict: Kong AI Gateway is the pick when your platform team already runs Kong for the company’s REST APIs and the path of least resistance is to extend the existing stack with AI-specific policies. Strengths: SLA, plugin ecosystem, ops familiarity, supply-chain posture. Weakness: AI-specific surfaces (prompt registry, eval, optimizer) live in plugins or upstream tools, not the product.

What it fixes versus LiteLLM:

Enterprise SLA and procurement. Kong has been a Tier-1 API-gateway vendor for a decade. SOC 2, ISO 27001, HIPAA-eligible compliance clears the bar for regulated industries that flagged the LiteLLM supply-chain incident.
Plugin ecosystem. Existing Kong customers reuse rate-limiting, auth, request-transformation, and OTel plugins. The AI Proxy plugin (Kong 3.6+) handles Anthropic and OpenAI passthrough, including tool calls.
Operational footprint platform teams already own. If your team runs Kong for the REST tier, adding the AI Proxy plugin is one more plugin on a stack the on-call rotation already understands.
OpenAI-compatible contract preserved. The AI Proxy plugin terminates an OpenAI-compatible endpoint; the client-side migration is a BASE_URL change.

Migration from LiteLLM: OpenAI-compatible endpoint via the AI Proxy plugin, consumer + tag pattern as a virtual-key analog, OTel plugin for observability, all map directly. Kong has no first-party prompt registry; teams pair it with Langfuse, Future AGI, or an in-house Git-backed store. LiteLLM’s bundled budget-and-quota UI has no direct equivalent. Kong’s plugin equivalents need wiring. Timeline: ten to fifteen engineering days because work spans platform and application teams.

Where it falls short:

AI-specific observability is plugin-driven, not native. Default dashboard is the API-gateway view, not the LLM-cost view.
No optimizer, no prompt registry, no eval library.
Two-week-plus setup means migration ROI shows up later than lighter alternatives. If the LiteLLM exit is urgent, Kong is slower than a hosted swap.

Pricing: Kong AI Gateway is open source. Konnect (managed) starts free. Enterprise plans from ~$1.5K/month.

Score: 5 of 7 axes (missing: native prompt registry, optimizer, native AI cost dashboard).

4. Maxim Bifrost: Best for raw throughput

Verdict: Bifrost is the pick when the workload is high-concurrency and the gateway’s own latency budget matters more than UI polish. Written in Go, designed for low-latency routing, it benchmarks above Python proxies on RPS per node. For teams whose LiteLLM exit reason is “the Python proxy’s overhead shows up in our p99 SLO,” Bifrost is the direct upgrade.

What it fixes versus LiteLLM:

Throughput per node. Go runtime plus connection-pooling gives Bifrost higher RPS per node than Python-based proxies on the same hardware. Maxim’s published benchmarks claim sub-millisecond overhead at p50.
Operational footprint is a Go binary. Container, helm chart, or static binary on a VM. The Python-dependency-graph audit problem the March incident highlighted is structurally smaller for a Go binary.
Tight integration with Maxim’s eval stack. If your team also evaluates agents with Maxim, the gateway and the eval pipeline share data models, closer to the loop FAGI ships, but stops short of an automated optimizer.

Migration from LiteLLM: OpenAI-compatible endpoint, provider keys, and basic routing rules map directly. Bifrost’s API-key concept is leaner than LiteLLM’s team_id / user_id model, per-developer fanout needs more wiring upstream. Timeline: five to eight engineering days plus prompt-registry replacement if applicable.

Where it falls short:

No optimizer.
Younger than LiteLLM, Portkey, or Kong; the ecosystem (Terraform providers, off-the-shelf dashboards) is thinner.
Throughput is the headline; teams that left LiteLLM for UI polish or supply-chain posture rather than latency won’t feel the upside.

Pricing: Bifrost is open source. Maxim’s hosted gateway pricing is custom, typically anchored to the eval product’s usage.

Score: 4 of 7 axes (missing: optimizer, native prompt registry, mature ecosystem).

5. Helicone: Best for lightweight hosted observability

Verdict: Helicone is the right pick if your reason for leaving LiteLLM is operational burden and you don’t need routing depth or a prompt registry. Drop-in proxy with per-request cost telemetry, session traces, and a clean dashboard, the hosted observation layer LiteLLM doesn’t ship. One wrinkle: Helicone acquired Mintlify in March 2026, and parts of the docs surface have folded into Mintlify’s stack.

What it fixes versus LiteLLM:

Polished hosted dashboard. Per-request cost breakdown, session traces, user-level aggregation.
Operational burden goes away. Hosted by default. Self-host (Apache 2.0) is available but runs on Postgres + ClickHouse, comparable to LiteLLM’s footprint rather than smaller.
OpenAI-compatible contract preserved. Drop-in proxy. Helicone-User-Id header replaces LiteLLM’s per-key attribution. Migration is a BASE_URL change for the common case.
Simpler surface area. If you used LiteLLM only for traces and cost, Helicone covers the same ground with one-third of the configuration.

Migration from LiteLLM: OpenAI-compatible endpoint maps directly. Provider keys move over. Custom properties replace LiteLLM tags. Helicone’s Prompts product is less feature-rich than a full registry, so teams that used LiteLLM with in-repo Jinja2 keep that pattern post-migration. Timeline: three to five engineering days for the cutover.

Where it falls short:

No optimizer.
Routing intelligence is basic (round-robin and failover); cost-aware model routing requires upstream code.
Self-host operations get harder above a few hundred RPS, the operational-burden upside is only realized on the hosted tier.
The Mintlify acquisition is recent enough that some surfaces are still in flux.

Pricing: Free tier with 10K requests/month. Pro from $25/month. Enterprise custom.

Score: 5 of 7 axes (missing: optimizer, deep routing, mature prompt registry).

Capability matrix

Axis	Future AGI	Portkey	Kong AI Gateway	Maxim Bifrost	Helicone
Supply-chain posture	SOC 2 + signed OSS	SOC 2 (Palo Alto pending)	SOC 2 + ISO 27001	Go binary, tighter deps	SOC 2
Operational burden	Hosted, no platform tax	Hosted	Plugin on existing Kong	Single Go binary	Hosted
UI polish	Native sessions + RBAC	Polished, Prompt Studio	Plugin UI + Grafana	Lighter admin UI	Polished dashboard
Eval + optimizer loop	Yes (`ai-evaluation` + `agent-opt`)	No	No	Tied to Maxim eval	No
OpenAI-compatible contract	Yes, native	Yes, native	Yes, via AI Proxy plugin	Yes, native	Yes, native
Virtual-key parity	Native	Native	Consumer + tag pattern	Lean key model	Proxy keys (lighter)
LiteLLM migration tooling	Key + prompt importer	Header mapping docs	Manual setup	Manual setup	Header mapping docs

Migration notes: what breaks when leaving LiteLLM

Preserving the OpenAI-compatible contract

The reason LiteLLM is sticky despite the operational tax is that it speaks the OpenAI Chat Completions API. Every alternative in this list does the same. The migration tip is to keep that contract intact end-to-end and make the cutover a BASE_URL change rather than a client rewrite.

Services hard-code three things. SDK initialization (OpenAI(base_url=...)), runtime config (often an env var like OPENAI_BASE_URL), and the deployment manifest (env-var injection in the Helm chart). The migration checklist needs all three. Done right, the client diff is a single environment variable per service and the cutover is a feature-flag flip.

Places this breaks: LiteLLM-specific custom headers (x-litellm-*) some teams adopted for routing hints, the litellm-namespaced error envelope (clients that parse it rather than re-raise need a small adapter), and the team_id / user_id model if you embedded LiteLLM IDs in client-side logging. Each is a small mechanical fix.

Remapping virtual keys

Three steps. Inventory: dump the team_id and user_id table via the admin API or directly from Postgres. Remap table: generate equivalents on the destination. FAGI, the hosted gateway, and the lightweight proxy all support per-identity keys; Kong uses consumer + tag; Bifrost’s key model is leaner and may need wiring above the gateway. Cutover: stand up the new gateway in shadow mode for a week or two, validate parity by replaying production traffic, then flip services one at a time using your existing feature-flag mechanism. For teams running budgets per team_id, double-check the destination’s quota model is expressive enough. Helicone and Bifrost handle spend-limits lighter than LiteLLM.

Decision framework: Choose X if

Choose Future AGI if your reason for leaving is more than supply-chain posture or operational burden, you also want trace data to drive prompt rewrites and routing-policy updates. Pick this when production agent workloads are becoming a significant line item and the OSS instrumentation (traceAI, ai-evaluation, agent-opt, all Apache 2.0) plus the hosted Command Center together justify the migration. Strongest pick for regulated industries that flagged the March incident as a procurement blocker.

Choose Portkey if your reason for leaving is “we want a polished hosted UI with a prompt studio and we don’t want to run a Python proxy anymore”, and you can tolerate roadmap uncertainty from the Palo Alto acquisition.

Choose Kong AI Gateway if your platform team already runs Kong and the path of least resistance is to extend the existing stack. Pick this when SLA, plugin ecosystem, and operational familiarity outweigh AI-specific shallowness.

Choose Maxim Bifrost if you’re leaving for gateway latency at high concurrency, or if your supply-chain concern is specifically Python dependency graphs.

Choose Helicone if you’re leaving for operational burden and run well below 10M requests/month with no need for sophisticated routing or a prompt registry.

What we did not include

Three products from other 2026 listicles we left out: OpenRouter (consumer marketplace, wrong shape for virtual-key and team-quota replacement); Cloudflare AI Gateway (prompt-registry and per-developer chargeback surfaces are thinner as of May 2026); TrueFoundry (no LiteLLM-specific migration tooling published yet).

Sources

LiteLLM PyPI compromise postmortem, March 26, 2026, github.com/BerriAI/litellm/security
Hacker News thread on the LiteLLM 1.82.7/1.82.8 incident, news.ycombinator.com
Reddit /r/LLMDevs migration discussions, March-May 2026
LiteLLM GitHub repository, github.com/BerriAI/litellm
LiteLLM Enterprise documentation, docs.litellm.ai/docs/enterprise
Portkey product page, portkey.ai
Palo Alto Networks press release on Portkey acquisition, April 30, 2026, paloaltonetworks.com/company/press
Kong AI Gateway product page, konghq.com/products/kong-ai-gateway
Maxim Bifrost product page and benchmarks, getmaxim.ai/bifrost
Helicone open-source self-host, github.com/Helicone/helicone
Helicone acquisition of Mintlify, March 2026, helicone.ai/blog
Future AGI Agent Command Center, futureagi.com/platform/monitor/command-center
Future AGI traceAI, github.com/future-agi/traceAI (Apache 2.0)
Future AGI ai-evaluation, github.com/future-agi/ai-evaluation (Apache 2.0)
Future AGI agent-opt, github.com/future-agi/agent-opt (Apache 2.0)
Future AGI Protect latency benchmark, arxiv.org/abs/2510.13351 (67 ms text, 109 ms image)

Frequently asked questions

Why are people moving off LiteLLM in 2026?

Five reasons: the March 24, 2026 PyPI compromise (1.82.7 / 1.82.8); operational burden compounds at 10–20M req/mo; the bundled UI is the weakest in this cohort; no built-in optimization loop; and an open-source Python dependency graph is harder to defend to an external auditor than a single hosted-vendor SBOM.

What is the closest like-for-like alternative?

Portkey for a hosted gateway with virtual keys and a prompt registry. FAGI Agent Command Center for the same plus eval and optimizer. Helicone for the cheapest hosted observation layer.

Can I keep my OpenAI-compatible client code?

Yes — every alternative speaks the OpenAI Chat Completions API. Migration is a `BASE_URL` swap in three places: SDK init, runtime config, and deployment manifest. LiteLLM custom headers and the litellm-namespaced error envelope need small adapters.

Is the March 2026 incident a reason to abandon LiteLLM entirely?

No. The maintainers yanked the compromised releases within twelve hours, rotated signing keys, and published a postmortem. The fundamentals — MIT, OpenAI-compatible, most popular self-hosted AI proxy on GitHub — are unchanged. It is a reason to revisit whether your team's risk profile still fits a self-operated Python proxy.

How do I migrate virtual keys?

Dump the `team_id` / `user_id` table via the admin API or directly from Postgres. Generate equivalents on the destination — FAGI, Portkey, and Helicone all support per-identity keys; Kong uses consumer + tag. Run both gateways in shadow mode for a week, then flip services one at a time.

Which alternative has the lowest operational burden?

The hosted gateways — FAGI, Portkey, Helicone — eliminate the Postgres + Redis + OTel + Grafana stack. Among self-hosted, Bifrost has the tightest footprint (single Go binary).

How does Future AGI Agent Command Center compare to LiteLLM?

LiteLLM is a self-hosted Python proxy with admin UI and per-key budgets. FAGI is a hosted gateway plus eval suite plus optimizer, so trace data drives prompt rewrites and routing-policy updates over time. Both speak OpenAI-compatible; FAGI's instrumentation libraries are Apache 2.0 if you want parts of the stack self-hosted.

View all

Guides

Best 5 Pydantic AI Alternatives in 2026

Five Pydantic AI alternatives on multi-agent depth, language reach, observability without Logfire, optimizer. What each actually fixes past type-system.

Vrinda Damani · May 17, 2026

15 min

Guides

Evaluating LiteLLM Multi-Provider Apps in 2026

How to evaluate LiteLLM-routed apps: paired comparison across providers on your data, tool-call parity, latency parity, and the gateway alternative.

Vrinda Damani · May 17, 2026

12 min

Guides

Future AGI vs LiteLLM in 2026: Self-Improving Runtime vs OSS Python Proxy

Future AGI vs LiteLLM scored on routing, observability, cost attribution, security, deployment, DX. Honest verdict, March 2026 PyPI compromise context.

NVJK Kartik · May 15, 2026

16 min

TL;DR: pick by exit reason

Why people are leaving LiteLLM in 2026

1. The March 24, 2026 PyPI compromise (1.82.7 / 1.82.8)

2. Operational burden of self-host at scale

3. Limited polished UI

4. No built-in optimization loop

5. Supply-chain risk for regulated industries

What to look for in a LiteLLM replacement

1. Future AGI Agent Command Center: Best for closing the loop

2. Portkey: Best for hosted developer experience

3. Kong AI Gateway: Best for enterprise platform teams

4. Maxim Bifrost: Best for raw throughput

5. Helicone: Best for lightweight hosted observability

Capability matrix

Migration notes: what breaks when leaving LiteLLM

Preserving the OpenAI-compatible contract

Remapping virtual keys

Decision framework: Choose X if

What we did not include

Related reading

Sources

Frequently asked questions