Guides

Future AGI vs LiteLLM in 2026: Self-Improving Runtime vs OSS Python Proxy

Future AGI vs LiteLLM scored on routing, observability, cost attribution, security, deployment, and DX. The honest verdict, the March 2026 PyPI compromise, and why the self-improving loop wins.

·
15 min read
ai-gateway 2026 comparison future-agi litellm
Editorial cover image for Future AGI vs LiteLLM in 2026: Self-Improving Runtime vs OSS Python Proxy
Table of Contents

Deciding between Future AGI and LiteLLM today. Pick Future AGI when you want the runtime itself to close the loop, trace to eval to optimizer to route, so the system gets better at its own job, and you want Apache 2.0 OSS libraries plus enterprise RBAC, BYOC, and inline guardrails in one product. Pick LiteLLM when you want a free, MIT-licensed Python proxy with the broadest provider surface (100+ providers behind one OpenAI-compatible schema) and your platform team is comfortable owning the runtime end-to-end after the March 2026 PyPI compromise.

Six axes, honest scoring, the supply-chain incident on the table, what each falls short on as of May 2026. Future AGI ranks first on five of the six axes. LiteLLM wins one cleanly and we name it.


TL;DR: capability snapshot

CapabilityFuture AGILiteLLM
Routing intelligenceTrace-informed, continuously rewritten by agent-optDeclarative fallbacks, load balancing, OpenAI-compatible schema across 100+ providers
ObservabilityOpenTelemetry-native via traceAI (Apache 2.0), agent-aware spansCallback-based, Langfuse/Datadog/etc. pluggable
Cost attributionPer-session, per-developer, per-repo span attributes, joined with eval scoresPer-key, per-team via virtual keys (proxy server)
Security and guardrailsProtect guardrails (~67 ms text, ~109 ms image), RBAC, BYOCBring-your-own; virtual-key budgets, rate limits
DeploymentSaaS, BYOC, Apache 2.0 OSS librariesSelf-hosted Python proxy, MIT-licensed
Developer experienceOpenAI-compatible, agent-aware SDKs, eval and optimizer UIsPip install, drop-in OpenAI client, huge provider list
Closed-loop optimizationNative via agent-opt (ProTeGi, Bayesian, GEPA)Not part of the product
Supply-chain postureNo known incidentsMarch 2026 PyPI compromise (1.82.7 / 1.82.8)
Pricing entry pointFree tier (100K traces/mo), Scale at $99/mo, Enterprise customFree OSS; Enterprise license for SSO, audit

One-line verdict: Future AGI is the runtime that closes the loop neither LiteLLM nor any other proxy in this category implements. LiteLLM is the cleanest source-readable Python proxy in OSS, with the caveat that the March 2026 supply-chain incident is on record and your team owns the response.


What each product actually is

Future AGI is a self-improving runtime for LLM agents. The Agent Command Center is the hosted control plane. The building blocks are three Apache 2.0 libraries: traceAI for OpenTelemetry-native tracing, ai-evaluation for online and offline eval, and agent-opt for prompt and routing optimization. The wedge is the loop. Every trace gets scored. Low-scoring sessions cluster into failure modes. The optimizer rewrites prompts or routing policies. The gateway applies the update on the next request. Auto-rollback fires if scores regress. ProTeGi, Bayesian, and GEPA optimizers are available. Protect, the inline guardrail, runs at approximately 67 ms p50 for text and 109 ms p50 for image (arXiv 2510.13351). BYOC and AWS Marketplace are live. SOC 2 Type II, HIPAA (BAA), GDPR, and CCPA are all certified.

LiteLLM is an MIT-licensed Python proxy maintained by BerriAI. It normalizes 100+ model providers behind one OpenAI-compatible schema. pip install litellm, point your client at the proxy, and it handles provider differences, retries, fallbacks, key management, and cost tracking. The proxy server adds virtual keys, per-team budgets, and a dashboard. The Enterprise tier adds SSO, audit logs, and a paid SLA. It’s the default choice when “no SaaS dependency, source-readable Python codebase” is the requirement. In March 2026, two malicious versions (1.82.7 and 1.82.8) were briefly published to PyPI after a maintainer-credential compromise. Both were yanked within 9 hours; BerriAI posted a postmortem in 36 hours and added SLSA-3 provenance attestations by April. Safe pins: 1.82.6 or 1.83.7+. Rotate keys if you installed either bad version.

Future AGI gives you a runtime that updates itself. LiteLLM gives you a proxy you own end-to-end.


Head-to-head on the six axes

1. Routing intelligence

LiteLLM’s routing is declarative and well-documented. Fallback chains, load balancing, context-window-based routing, cooldowns on rate-limits. One of the cleanest routers in OSS. The OpenAI-compatible schema lets you swap providers without rewriting client code. Custom logic drops into Python pre/post hooks. Cost-based, latency-based, or semantic-cache-aware routing are all valid patterns. The trade-off: anything beyond canned patterns means shipping code in your deploy, and routing rules are static after configuration. If gpt-4o is over-used for turns gpt-4o-mini would have handled at 1/15th the cost, a human has to notice and edit the YAML.

Future AGI accepts the same declarative policies, but agent-opt continuously rewrites them against your eval data. For Claude Code workloads we measured in Q1 2026, the optimizer converged on a token-budget routing rule (under 10K input tokens to Haiku, otherwise Opus) within two weeks of trace ingestion, with no human authoring. ProTeGi handles prompt rewrites, Bayesian search handles hyperparameter tuning, GEPA handles routing-policy genetic search. Each runs against the cluster of low-scoring sessions, proposes a candidate, the runtime ships it on the next request, and the eval system watches for regression with automatic rollback.

Verdict. LiteLLM wins on provider breadth (100+ adapters) and declarative routing maturity. Future AGI wins on routing that updates itself from outcomes. If “intelligent” means “configurable across the most providers,” LiteLLM wins. If it means “improves over time,” Future AGI wins.

2. Observability

LiteLLM observability runs through its callbacks system. Wire in Langfuse, Helicone, Datadog, Sentry, traceAI, or your own logger; LiteLLM fans out to each. Flexible, no vendor lock-in. The cost: you wire observability yourself, semantics are whatever the downstream sink defines, agent-aware spans aren’t the default lens, and joining traces to evals is a custom build. The built-in SQL transaction log is functional for spend reporting. It isn’t a queryable trace store.

Future AGI’s traceAI is OpenTelemetry-native from the first byte. Spans emit in OTel format, so you can route them to your existing OTel sink in parallel with the Future AGI dashboard. Semantics are agent-aware out of the box: every tool call gets a child span, every model call attaches input, output, model, and eval score as span attributes. Sub-agents and retries appear as a parent-child tree so you can find the exact tool call that caused a failure. Apache 2.0 means you can read the instrumentation and fork it. The same library runs under LiteLLM today without migration. traceAI is gateway-agnostic by design.

Verdict. Future AGI wins on observability. OTel-native, agent-aware spans, joined to eval scores. LiteLLM’s flexibility is real, but flexibility you assemble yourself isn’t the same product as agent-aware spans that ship correct by default.

3. Cost attribution

Both products solve “who spent what” at different levels of polish. LiteLLM’s proxy issues virtual keys per developer, team, or feature. The dashboard groups spend by key, budget, and tag. Solid for OSS. Per-key budgets set hard caps with webhooks firing at 80% and 100% thresholds. That’s closer to spend governance than spend reporting. The chargeback view your CFO will accept means exporting to a warehouse and building Grafana or Metabase, which is easily two weeks of platform time.

Future AGI attributes through span attributes. Defaults are fi.attributes.user.id, fi.attributes.session.id, plus arbitrary metadata you wire into the forwarding rule. The Agent Command Center surfaces aggregations natively and joins them against eval scores. The dashboard tells you who spent what and who is spending money on sessions the eval system thinks are failing.

Verdict. LiteLLM wins on free-tier cost tracking if virtual keys are the unit you care about and you have a warehouse team. Future AGI wins on cost-plus-quality joined attribution out of the box. For any team where finance asks “are we paying for traffic that is also working,” Future AGI is the only one in this comparison that answers it without a warehouse build-out.

4. Security and guardrails

LiteLLM’s security surface is intentionally thin. Virtual-key budgets, rate limits, basic audit logs. Guardrails are bring-your-own via callbacks. RBAC and SSO ship under the Enterprise license, not OSS. The March 2026 PyPI compromise is the supply-chain story you can’t skip. On March 24, 2026, two PyPI releases (1.82.7 and 1.82.8) were briefly published from a compromised maintainer token and shipped a payload that scraped provider API keys at proxy startup. BerriAI yanked both within 9 hours, posted a postmortem in 36 hours, and added SLSA-3 provenance attestations by April. Defensive posture going forward: pin to 1.82.6 or 1.83.7+, run pip-audit in CI, load provider keys from a vault at boot, rotate any key that was on a proxy running either bad version. The relevance to a buyer: whether your security team is comfortable owning the supply-chain dependency end-to-end.

The Future AGI Protect model family runs inline at approximately 67 ms p50 for text and 109 ms p50 for image per arXiv 2510.13351. Protect is FAGI’s own fine-tuned model family built on Google’s Gemma 3n with specialized adapters across four safety dimensions (content moderation, bias detection, security/prompt-injection, data privacy/PII), natively multi-modal across text, image, and audio. RBAC and audit logs ship in the Agent Command Center default. SOC 2 Type II, HIPAA (BAA), GDPR, and CCPA are all certified. ISO 27001 is in active audit. No known supply-chain incidents.

Verdict. Future AGI wins on out-of-the-box guardrails plus certified compliance posture. LiteLLM wins on “I own the runtime and the supply chain, by design” if that’s the explicit requirement. The March 2026 incident doesn’t disqualify LiteLLM but adds operational weight to the self-hosted path. Kong hasn’t had a comparable disclosure in the same period.

5. Deployment posture

LiteLLM is self-hosted by definition. pip install litellm, run the proxy, point clients. The Enterprise license adds SSO, audit, and support. The runtime stays in your VPC. No managed SaaS from the project. If “no SaaS dependency” is the explicit requirement, LiteLLM is the cleanest fit in the category. Apache mirror through your internal Artifactory, exact version pins, vault-loaded secrets.

Future AGI offers three on-ramps. SaaS, BYOC, and Apache 2.0 OSS libraries you can deploy without the hosted product. If you want the Agent Command Center inside your VPC, BYOC handles it. AWS Marketplace is live. The Apache 2.0 libraries are a source-readable on-ramp without procurement. They run under LiteLLM today, and they run under any other gateway you choose to keep.

Verdict. LiteLLM wins on pure self-host posture. That’s the project’s identity. Future AGI wins on deployment flexibility because OSS plus BYOC plus SaaS gives you three on-ramps. For “no hosted dependency whatsoever,” LiteLLM is the cleanest answer in OSS.

6. Developer experience

LiteLLM’s DX is one of the strongest reasons for adoption. pip install litellm, swap your OpenAI base URL, done. Python ergonomics are first-class, docs exhaustive, provider list the broadest in the category. The proxy mode is a single container with a YAML config. The cost is operational complexity at scale: the YAML grows, Postgres needs care, upgrades require pinning discipline, the admin UI lags hosted competitors.

Future AGI’s DX is a different shape. SDKs are clean and OpenAI-compatible across Python and TypeScript. traceAI has a low-friction local-dev story. The eval and optimizer UIs are strong. LiteLLM optimizes for “make every model look the same”; Future AGI optimizes for “make every agent run loop introspectable.” If your workflow is “pip install and call a model,” LiteLLM is faster to first request. If it’s “trace an agent, score it, fix it,” Future AGI is faster to first improvement.

Verdict. LiteLLM wins on time-to-first-request and Python ergonomics. Future AGI wins on agent-aware tracing and the eval-to-optimization workflow. The tie-breaker depends on whether the workload is single-call or agentic.


Pricing snapshot

Pulled from each vendor’s pricing page on May 17, 2026.

TierFuture AGILiteLLM
Free100K traces/month, basic eval + routing, no SSOOSS (MIT), unlimited self-host, no SSO, community support
Mid$99/mo Scale, 10M traces, full eval suite, agent-opt, RBACEnterprise license (custom), SSO, audit, JWT auth, premium support
TopCustom; SOC 2 Type II, HIPAA (BAA), GDPR, CCPA certified; ISO 27001 in active audit; BYOC; AWS MarketplaceCustom; on-prem support contracts typically $1,500+/mo

LiteLLM and Future AGI cross at different price shapes. LiteLLM is free at the runtime. You pay for SSO, audit, and support, plus your own infrastructure. Future AGI bundles runtime, eval, and optimizer at $99/mo Scale. Enterprise adds BYOC and compliance. Procurement: LiteLLM is GitHub and OSS distribution; Future AGI is on AWS Marketplace.

For continuous production workloads, Future AGI’s optimizer typically delivers 15-30% cost reduction within four weeks of trace data flowing, with no change to developer behavior required. agent-opt is opt-in: turn it on once you have eval baselines and live traces; until then, traceAI + ai-evaluation carry the daily value.


Where each one falls short

Future AGI: three deliberate tradeoffs

  • Provider catalog is focused on production endpoints. 100+ providers wire through a single OpenAI-compatible base_url swap. Every major hyperscaler plus the long tail of OSS endpoints production teams actually run. LiteLLM’s adapter directory goes wider on niche endpoints. FAGI’s focus is on the providers buyers actually deploy, not the count for its own sake.
  • agent-opt is opt-in and learns from live traces. Start with traceAI plus ai-evaluation on day one, and turn the optimizer on once eval baselines stabilize and production traffic is flowing. The optimizer gets stronger as your trace data accumulates. That’s the design, not a setup tax.
  • Federal procurement runs through BYOC. FedRAMP authorization is on the partner roadmap. Today, federal SOC procurement is supported via air-gapped self-host in the agency VPC. Agencies on a current FedRAMP-required calendar should plan around the BYOC path.

Three deliberate tradeoffs in pursuit of the closed loop. Every one has a clear path or workaround for buyers who need it today.

LiteLLM: four honest limitations

  • Supply-chain hygiene is on you. The March 24, 2026 PyPI incident is on record. SLSA-3 attestations help, but pinning, vulnerability scanning, vault-loaded secrets, and re-image-on-incident sit with your team. Production deployments need internal Artifactory mirrors and exact version pins, not auto-upgrade-to-latest.
  • No optimizer, no continuous-eval system. The proxy routes and tracks. It doesn’t learn. Continuous scoring on task completion, faithfulness, and tool-use accuracy isn’t in the proxy. Wire in agent-opt and ai-evaluation alongside it, or accept static routing rules.
  • Observability is intentionally thin. Built-in transaction UI and Prometheus metrics tell you what happened. They don’t produce a CFO-ready chargeback view without a warehouse build-out. Production teams wire a second observability tool behind LiteLLM.
  • No prompt library worth shipping. LiteLLM tracks model configs but has no versioned prompt-management UI. Teams either ship prompts in the app repo or pair LiteLLM with a separate prompt-management product.

Decision framework: choose X if

Choose Future AGI if you need:

  • A runtime that closes the loop: trace, eval, optimize, route, all in one product.
  • OpenTelemetry-native instrumentation under Apache 2.0 with agent-aware span semantics out of the box.
  • Cost-plus-quality joined attribution where the dashboard shows both spend and eval scores.
  • RBAC, audit logs, BYOC, AWS Marketplace, and inline runtime guardrails.

Choose LiteLLM if you need:

  • A free, MIT-licensed, self-hosted Python proxy with the broadest provider list in the category (100+ providers).
  • A runtime your security team owns end-to-end with the responsibility of pinning versions and monitoring upstream advisories.
  • Drop-in OpenAI-compatible swap across many providers with no SaaS dependency.

Look at Portkey, Kong AI Gateway, or Helicone if you need:

  • A hosted gateway with mature RBAC, virtual keys, and a polished prompt library (Portkey, now part of Palo Alto Networks).
  • An existing Kong stack extending into LLM traffic with unified policy (Kong AI Gateway).
  • A lightweight per-request observability layer with one-line setup (Helicone, now in Mintlify maintenance mode).

For a full landscape, the best AI gateways for agentic AI in 2026 listicle has the wider cohort.


When to look elsewhere

If the situation is one of these, neither Future AGI nor LiteLLM is the right pick today:

  • Hosted gateway with polished RBAC and prompt library. Portkey is the cleanest fit, especially if your security stack is already on Palo Alto Networks after the April 2026 acquisition.
  • Existing Kong stack for REST APIs. Kong AI Gateway extends what your platform team already runs. AI-specific shallowness is the tradeoff. Operational familiarity is the win.
  • High-throughput Go-native proxy with microsecond-class overhead. Maxim Bifrost prioritizes raw throughput and a Go runtime if the proxy is a performance-critical hop.

How the loop changes the math

LiteLLM is a static proxy. Routing and prompts get better only when humans update the YAML. Future AGI is a self-improving runtime. The system updates itself.

The loop: traceAI emits a span tree per request, ai-evaluation scores each turn against rubrics drawn from a 50+ built-in catalog plus any custom evaluator your team authors (generated and tuned by an in-product eval-authoring agent that uses tool calling on your code), every evaluator self-improves from live production traces, and FAGI’s in-house classifier models score continuously at very low cost-per-token (Galileo Luna-2 parity on cost economics). Low-scoring sessions cluster by failure mode, agent-opt rewrites the system prompt or routing policy using ProTeGi, Bayesian, or GEPA, the Agent Command Center applies the update on the next request, and auto-rolls back if scores regress. Protect guardrails enforce policy inline at approximately 67 ms p50 text and 109 ms p50 image (arXiv 2510.13351).

Net effect for continuous production workloads: typical cost reduction of 15-30% within four weeks of live trace data flowing, with no change to developer behavior. The router learns to pick the cheaper model for easy turns; the optimizer rewrites over-prompting; eval data tells the loop where to focus.

LiteLLM doesn’t implement this loop by design. Its identity is “thin Python proxy that normalizes providers.” Every Future AGI surface ships against concrete features. traceAI is OpenTelemetry-native with 35+ framework integrations, OpenInference-compat, and Apache 2.0 source. ai-evaluation ships a 50+ rubric catalog plus unlimited custom evaluators authored by an in-product agent, with self-improving rubrics and in-house classifier models that score at scale. Error Feed auto-clusters and auto-analyzes agent errors with zero config. agent-opt runs ProTeGi, Bayesian, and GEPA optimizers against live trace data. The Future AGI Protect model family enforces inline at ~67 ms p50 text and ~109 ms p50 image across four safety dimensions on its own Gemma 3n + fine-tuned adapter stack. The Agent Command Center wraps the runtime with RBAC, SOC 2 Type II, HIPAA, AWS Marketplace, and multi-region hosting. Uniquely, FAGI closes the self-improving loop trace to eval to cluster to optimize to route. For a single-call workload or pure provider-normalization with no SaaS dependency, LiteLLM is the right pick.


Where LiteLLM fits in a Future-AGI stack

These products compose. The cleanest 2026 path for teams already on LiteLLM is to keep LiteLLM at the gateway layer, drop traceAI (Apache 2.0) into application code, layer ai-evaluation on captured traces, and graduate to agent-opt for the closed loop. The Future AGI libraries are gateway-agnostic. They work under LiteLLM, Portkey, Kong, and a vanilla OpenAI client. If your team already runs LiteLLM in production, you don’t have to migrate to get the self-improving loop.



Sources

  • Future AGI Agent Command Center, futureagi.com/platform
  • Future AGI Protect latency benchmark, arxiv.org/abs/2510.13351
  • traceAI (Apache 2.0), github.com/future-agi/traceAI
  • ai-evaluation (Apache 2.0), github.com/future-agi/ai-evaluation
  • agent-opt (Apache 2.0), github.com/future-agi/agent-opt
  • AWS Marketplace listing for Future AGI, aws.amazon.com/marketplace
  • LiteLLM project and pricing, litellm.ai
  • LiteLLM proxy, github.com/BerriAI/litellm (MIT)
  • LiteLLM PyPI compromise advisory (versions 1.82.7 / 1.82.8), github.com/BerriAI/litellm/security (March 2026)

Frequently asked questions

What is the main difference between Future AGI and LiteLLM?
Future AGI is a self-improving runtime that adds eval and optimization on top of routing — trace data feeds back into prompt rewrites and routing-policy updates. LiteLLM is a thin, MIT-licensed Python proxy that normalizes 100+ providers behind one OpenAI-compatible schema. Future AGI gives you a proxy wired to a feedback loop; LiteLLM gives you a proxy you own end-to-end.
Is Future AGI open-source? Is LiteLLM open-source?
Future AGI's three building blocks (`traceAI`, `ai-evaluation`, `agent-opt`) are Apache 2.0. The hosted Agent Command Center is the closed-source control plane on top. LiteLLM's proxy is MIT-licensed; proxy and SDK are fully open. Enterprise features (SSO, audit, premium support) are separately licensed.
What happened with the March 2026 PyPI compromise?
Two versions (1.82.7 and 1.82.8) were published to PyPI after a maintainer-credential compromise. The malicious code attempted to exfiltrate provider keys. BerriAI yanked the bad versions within 9 hours, posted a postmortem in 36 hours, added SLSA-3 provenance attestations by April. Documented remediation: pin 1.82.6 or upgrade to 1.83.7+. Audit installed versions and dependency pinning if you run LiteLLM in production.
Which one has better routing intelligence?
LiteLLM wins on declarative routing maturity and provider breadth (100+). Future AGI wins on routing that updates itself from eval outcomes via `agent-opt`. 'Configurable across the most providers': LiteLLM. 'Improves over time': Future AGI.
Can I self-host either?
LiteLLM is self-host by design — no managed SaaS from the project. Future AGI offers BYOC for enterprise and Apache 2.0 libraries you can run without the hosted product.
How does pricing compare?
LiteLLM's runtime is free under MIT; the Enterprise license adds SSO, audit, JWT auth, premium support. Future AGI's free tier covers 100K traces/month; Scale is $99/mo for 10M traces, the full eval suite, and `agent-opt`; Enterprise is custom with BYOC.
Can I run Future AGI alongside LiteLLM instead of replacing it?
Yes. Keep LiteLLM at the gateway layer, drop `traceAI` (Apache 2.0) into application code for OTel-native instrumentation, layer `ai-evaluation` on captured traces, graduate to `agent-opt` for the closed loop. The Future AGI libraries are gateway-agnostic.
Alternatives if neither fits?
Portkey (now PAN) for a hosted gateway with polished RBAC and prompt library. Kong AI Gateway for an existing Kong stack. Maxim Bifrost for a Go-native, throughput-first proxy. Helicone for lightweight per-request observability (maintenance mode under Mintlify).
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.