LiteLLM Compromised in 2026: Developer Guide to Incident Response, Alternatives, and Gateway Migration
Full breakdown of the March 24 2026 LiteLLM supply chain attack: timeline, three-stage payload, detection commands, and a managed-gateway migration path.
Table of Contents
LiteLLM Compromised in 2026: What Happened and What To Do Next
On March 24, 2026, the LiteLLM Python package was published with a credential-stealing backdoor. Versions 1.82.7 and 1.82.8, uploaded by threat actor TeamPCP, harvested SSH keys, cloud credentials, Kubernetes secrets, and cryptocurrency wallets from every machine where the package was installed. PyPI quarantined the package at 11:25 UTC the same day. This guide covers the timeline, detection commands, the credential rotation list, and the migration path to a managed gateway that does not put a Python proxy in your dependency tree.
TL;DR: LiteLLM March 2026 Supply Chain Attack at a Glance
| Question | Short Answer |
|---|---|
| Affected versions | LiteLLM 1.82.7 and 1.82.8, published March 24, 2026 |
| Attack vector | PyPI publish token stolen via compromised Trivy GitHub Action on March 19 |
| Worst payload | litellm_init.pth runs on every Python process startup, not only on import litellm |
| What to rotate | Every SSH key, cloud token, LLM API key, DB credential, Kubernetes SA token, wallet file on affected machines |
| Detection one-liner | pip show litellm + find / -name “litellm_init.pth” + egress logs for models.litellm[.]cloud |
| Permanent fix | Move LLM routing to a managed gateway with no client-side dependency tree |
| Future AGI position | Agent Command Center (gateway.futureagi.com) is a managed BYOK gateway with built-in evals + observability |
How the LiteLLM Supply Chain Attack on March 24 2026 Backdoored AI Infrastructure at Scale
On March 24, 2026, LiteLLM was backdoored with credential-stealing malware. Versions 1.82.7 and 1.82.8, published by threat actor TeamPCP, contained a three-stage payload that harvested SSH keys, cloud credentials, Kubernetes secrets, and cryptocurrency wallets from every machine where the package was installed. PyPI quarantined the two malicious versions (1.82.7 and 1.82.8) at approximately 11:25 UTC the same day.
This was the third strike in a coordinated supply chain campaign that started with Aqua Security’s Trivy scanner on March 19, escalated through Checkmarx’s KICS GitHub Actions on March 23, and reached LiteLLM on March 24. The attack worked because LiteLLM sits at the center of AI infrastructure: a self-hosted Python proxy present in 36% of cloud environments, often pulled in as a transitive dependency by agent frameworks developers never audited.
This guide covers a full technical breakdown, an incident response playbook, the structural case against self-hosted Python LLM proxies, and a complete migration path to Future AGI’s Agent Command Center gateway.
What Happened to LiteLLM: Timeline, Weaponized Versions, and Blast Radius of the March 2026 Attack
Timeline: How the Trivy Compromise on March 19 Led to LiteLLM Backdoor on March 24 in Five Days
| Date | Time (UTC) | Target | Attack Method | Official Source |
|---|---|---|---|---|
| March 19 | 17:43 to 20:38 | Aqua Security Trivy | Force-pushed 76/77 version tags in trivy-action and all 7 setup-trivy tags to credential-stealing malware; published malicious binary v0.69.4 | GHSA-69fq-xp46-6x23 |
| March 23 | 12:58 to 16:50 | Checkmarx KICS GitHub Actions | Hijacked all 35 tags via compromised cx-plugins-releases service account; deployed stealer via checkmarx[.]zone C2. Also compromised OpenVSX extensions at 12:53 UTC | kics-github-action#152, Checkmarx Update |
| March 24 | 10:39 to 11:25 | LiteLLM | Used stolen PyPI token (from Trivy compromise in LiteLLM’s CI/CD) to publish v1.82.7 at 10:39 and v1.82.8 at 10:52. PyPI quarantined at ~11:25 UTC | litellm#24512, LiteLLM Update |
As confirmed in LiteLLM’s official security update, the project’s CI/CD pipeline ran Trivy without a pinned version. The compromised Trivy action exfiltrated the PYPI_PUBLISH token from the GitHub Actions runner, which TeamPCP used to publish malicious packages directly to PyPI.
How Versions 1.82.7 and 1.82.8 Were Weaponized: Three-Stage Payload with Credential Harvesting and Persistent Backdoor
Version 1.82.7 embedded the payload in proxy/proxy_server.py, activating on import. Version 1.82.8 was more aggressive: it included a .pth file named litellm_init.pth that executed on every Python process startup, not just when LiteLLM was imported. Python’s site module processes all .pth files in site-packages during interpreter initialization, as documented in the GitHub issue. The payload used double base64 encoding:
import os, subprocess, sys
subprocess.Popen([
sys.executable, "-c",
"import base64; exec(base64.b64decode('...'))"
])
Once executed, the payload ran a three-stage attack:
- Stage 1 (Credential Harvesting): Collected SSH keys, AWS/GCP/Azure tokens, environment variables including API keys and tokens,
.envfiles, Kubernetes configs, Docker configs, database credentials, shell history, browser cookies, and cryptocurrency wallets. The harvester scanned every common credential storage location on the host. - Stage 2 (Kubernetes Lateral Movement): Deployed privileged Alpine pods into the
kube-systemnamespace on every reachable node, accessing cluster secrets and service account tokens for broader infrastructure compromise. - Stage 3 (Persistent Backdoor): Installed
sysmon.pyas a systemd service that polledcheckmarx[.]zone/rawfor additional binaries, providing persistent access even after the initial malware was discovered.
All harvested data was encrypted and exfiltrated via POST request to models.litellm[.]cloud, a lookalike domain controlled by TeamPCP. The irony is hard to miss: LiteLLM is an API key management gateway. The attacker targeted the one package that, by design, has access to every LLM API key in the organization.
Blast Radius Beyond Direct LiteLLM Users: How the pth Mechanism Infected Every Python Process on Affected Machines
The .pth mechanism means the malware fired on every Python process on any machine where LiteLLM 1.82.8 was installed, even if the user never ran import LiteLLM. A data scientist running Jupyter, a DevOps engineer running Ansible: all compromised if the package existed anywhere in their Python environment. The key distinction is that you did not need to install it yourself. If another package in your dependency tree pulled it in, the malware still executed. As reported in GitHub issue #24512, the researcher who discovered this attack found it because their Cursor IDE pulled LiteLLM through an MCP plugin without explicit installation.
How to Check If You Are Affected by the LiteLLM Compromise: Version Detection, Log Scanning, and Dependency Audit
Check Installed Versions: How to Detect LiteLLM 1.82.7 and 1.82.8 Across Local, CI/CD, Docker, and Production
pip show litellm | grep Version
pip cache list litellm
find / -name "litellm_init.pth" 2>/dev/null
Run this across local machines, CI/CD runners, Docker images, staging, and production. Check Docker layer histories too.
Scan Egress Logs for Exfiltration: How to Identify Traffic to models.litellm.cloud and checkmarx.zone
Any traffic to models.litellm[.]cloud or checkmarx[.]zone is a confirmed breach:
# CloudWatch
fields @timestamp, @message | filter @message like /models\.litellm\.cloud|checkmarx\.zone/
# Nginx
grep -E "models\.litellm\.cloud|checkmarx\.zone" /var/log/nginx/access.log
Audit Transitive Dependencies: How to Find LiteLLM Pulled in Without Explicit Installation
pip show litellm # Check "Required-by" field
If other packages listed there are in your dependency tree, LiteLLM entered your environment without your explicit consent.
Incident Response Playbook: How to Isolate, Rotate Credentials, and Remove All LiteLLM Artifacts
Isolate and Rotate: How to Kill LiteLLM Docker Containers and Scale Down Kubernetes Deployments Immediately
docker ps | grep litellm | awk '{print $1}' | xargs docker kill
kubectl scale deployment litellm-proxy --replicas=0 -n your-namespace
The first step is to stop all running LiteLLM containers and scale down any Kubernetes deployments that use the compromised package. These commands kill active Docker containers matching “litellm” and set the proxy deployment replica count to zero, which immediately halts all traffic flowing through the infected gateway.
Credential Rotation Checklist: How to Rotate Cloud Tokens, SSH Keys, API Keys, and Kubernetes Service Accounts
Credential rotation means replacing every secret, key, and password that existed on an affected machine with a new one, then revoking the old value. Because the malware harvested everything it could find, any credential that was stored on or accessible from the compromised environment should be treated as known to the attacker.
| Credential Type | What to Rotate |
|---|---|
| Cloud Provider Tokens | AWS access keys, GCP service account keys, Azure AD tokens |
| SSH Keys | All keys in ~/.ssh/, regenerate and redistribute |
| Database Credentials | Connection strings, passwords in .env files |
| API Keys | OpenAI, Anthropic, Gemini, all LLM provider keys |
| Service Account Tokens | Kubernetes service accounts, CI/CD tokens, PyPI tokens |
| Crypto Wallets | Move funds immediately if wallet files were on the machine |
Audit Kubernetes and Remove All Artifacts: How to Find Lateral Movement Pods and Delete Persistent Backdoor Files
# Check for lateral movement
kubectl get pods -n kube-system | grep -i "node-setup"
find / -name "sysmon.py" 2>/dev/null
# Full removal
pip uninstall litellm -y && pip cache purge
rm -rf ~/.cache/uv
find $(python -c "import site; print(site.getsitepackages()[0])") \
-name "litellm_init.pth" -delete
rm -rf ~/.config/sysmon/ ~/.config/systemd/user/sysmon.service
docker build --no-cache -t your-image:clean .
The malware deployed privileged pods into the kube-system namespace and installed a persistent backdoor (sysmon.py) as a systemd service, so you need to check for both before cleaning up. The commands above scan for unauthorized Kubernetes pods and persistence artifacts, then fully remove LiteLLM, its cached packages, the malicious .pth file, and all backdoor files before rebuilding your Docker images from clean base layers.
Do not downgrade. Remove entirely and replace.
Why Self-Hosted Python LLM Proxies Are a Structural Risk After the LiteLLM Attack
The Dependency Tree Problem: How Hundreds of Transitive Dependencies Create an Unauditable Trust Surface
LiteLLM’s Python proxy inherits hundreds of transitive dependencies spanning ML frameworks, data processing libraries, and provider SDKs. Every dependency is a trust decision most teams make automatically with pip install --upgrade. When you add LiteLLM to your project, you are not just trusting LiteLLM. You are trusting every package it depends on, every package those packages depend on, and every maintainer account associated with each one.
The .pth attack vector is particularly dangerous because most supply chain scanning tools focus on setup.py, __init__.py, and entry points. The .pth mechanism is a legitimate Python feature for path configuration that has been largely overlooked as an injection vector for Python dependency attacks. Expect this technique in future attacks against open source LLM gateway packages. Traditional security scanning would not have caught this.
You Own the Blast Radius: Why Slow Maintainer Response to the Trivy Disclosure Made LiteLLM Compromise Inevitable
The LiteLLM maintainers did not rotate their CI/CD credentials for five days after the Trivy disclosure on March 19. If the maintainers could not respond fast enough, most teams using their software had no chance. This is an inherent LLM gateway security problem with the self-hosted model.
How Future AGI Agent Command Center Eliminates the LiteLLM Supply Chain Risk
Managed Gateway with Zero Client Dependencies: How Agent Command Center Routes Across Major LLM Providers Without a Python Proxy
Agent Command Center is Future AGI’s managed LLM gateway. It sits between your application and LLM providers as a hosted proxy layer at gateway.futureagi.com. Instead of installing a Python package like LiteLLM to route requests across multiple providers, Agent Command Center handles that routing as a cloud service.
In practical terms, it does what LiteLLM did (route requests to major LLM providers like OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Cohere, Mistral, Groq, and self-hosted Ollama or vLLM through a single API) without requiring you to run anything in your own infrastructure. You send requests using the standard OpenAI API format, and the gateway handles provider translation, failover, caching, guardrails, cost tracking, and streaming on its end.
The key difference from LiteLLM: your attack surface is an API key and a URL, not a Python environment with hundreds of transitive dependencies. You can read the full docs.
Agent Command Center works like any other LLM provider from your application’s perspective. You point your existing OpenAI SDK at a new base URL, swap in a gateway API key, and your code runs without any other changes. There is no library to install, no proxy to deploy, and no dependency tree to audit. The gateway is exposed inside the Future AGI platform at /platform/monitor/command-center.
One Config Change to Migrate from LiteLLM: Python and TypeScript Code Examples
Before (LiteLLM):
from litellm import completion
response = completion(
model="gpt-5-2025-08-07",
messages=[{"role": "user", "content": "Hello"}],
)
After (Agent Command Center):
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.futureagi.com",
api_key="sk-prism-your-key",
)
response = client.chat.completions.create(
model="gpt-5-2025-08-07",
messages=[{"role": "user", "content": "Hello"}],
)
Same OpenAI SDK format, same model naming, same response schema. Here is the TypeScript equivalent:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://gateway.futureagi.com",
apiKey: "sk-prism-your-key",
});
const response = await client.chat.completions.create({
model: "gpt-5-2025-08-07",
messages: [{ role: "user", content: "Hello" }],
});
Provider keys are configured once in the Agent Command Center dashboard. No environment variables scattered across your codebase or stored in .env files on developer machines where credential theft malware can reach them.
Semantic Caching and Guardrails: How Agent Command Center Reduces LLM Costs and Applies Built-In Safety Checks
Every LLM API call costs money and adds latency. When your application sends the same question (or a slightly different version of it) hundreds of times a day, you are paying the provider for every single call and waiting for a full inference cycle each time. Agent Command Center caching reduces that waste by storing LLM responses at the gateway level and returning them instantly on repeat queries, without ever hitting the provider.
The gateway supports two caching modes. Exact match caching returns a stored response when the request parameters are identical, which works well for deterministic queries like template-based prompts or FAQ bots. Semantic caching goes further: it uses vector embeddings to match queries that mean the same thing but are worded differently. For example, “What is your return policy?” and “How do I return an item?” would hit the same cache entry even though the words are different.
Namespace isolation keeps caches partitioned across environments like prod, staging, and dev, so test data never leaks into production responses. Cached responses return with X-Prism-Cost: 0, which means zero provider charges for every cache hit. Caching mode, TTL, and namespace are configured in the Agent Command Center dashboard or via request headers, so the application code does not need any additional Python dependency:
curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-key" \
-H "x-prism-cache-mode: semantic" \
-H "x-prism-cache-ttl: 5m" \
-H "x-prism-cache-namespace: prod" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"What is AI?"}]}'
Agent Command Center applies a built-in guardrail layer before requests reach the LLM provider, including PII detection, prompt injection prevention, and content moderation for output safety. For a deeper look at this layer, see the AI guardrailing tools comparison.
Cost Tracking, Major Provider Coverage, and How Agent Command Center Compares to Self-Hosted LiteLLM
Every response includes X-Prism-Cost. Budget limits block requests when exceeded. OpenAI, Anthropic, Gemini, Bedrock, Azure, Cohere, Groq, Mistral, Ollama, vLLM: switch providers by changing the model name. The gateway translates non-OpenAI APIs automatically.
When evaluating LiteLLM alternatives after this incident, here is how the key options compare:
| Feature | LiteLLM (Pre-Compromise) | Agent Command Center (Future AGI) | Other Managed Gateways |
|---|---|---|---|
| Deployment | Self-hosted Python proxy | Managed HTTPS endpoint | Varies (self-hosted/managed) |
| Dependency Risk | Hundreds of transitive Python deps | No gateway-specific client dependency (standard OpenAI SDK only) | Varies by architecture |
| Built-in Guardrails | Limited | Built-in layer (PII, injection, output safety) | Typically minimal |
| Evaluation Integration | None | Observability via traceAI (Apache 2.0) + ai-evaluation SDK (Apache 2.0) wired into the platform | Requires third-party tools |
| Semantic Caching | Basic exact match | Exact + semantic with namespaces | Some offer semantic |
| Supply Chain Exposure | Full Python dependency tree | API key + URL only | Depends on deployment model |
Agent Command Center is the managed LLM gateway from Future AGI that pairs routing with built-in evaluation and observability in a single platform. For a wider field comparison, see the best LLM gateways in 2026 guide.
Migration Scenarios: How to Move from LiteLLM Proxy Server to Future AGI in Production
From LiteLLM Proxy Server on Docker and Kubernetes: How to Remove the Proxy and Update Base URL Configuration
Remove the proxy infrastructure entirely. Update your application’s base URL:
env:
- name: LLM_BASE_URL
value: "https://gateway.futureagi.com" # was http://litellm-proxy:4000
- name: LLM_API_KEY
value: "sk-prism-your-key"
Delete the LiteLLM pod, its service, Postgres, and Redis. That is infrastructure you no longer maintain, patch, or worry about during the next AI infrastructure security incident.
Adding Post-Migration Controls: How to Use Gateway Headers for Cache Refresh and Request Management
curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-key" \
-H "x-prism-cache-force-refresh: true" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"What is AI?"}]}'
Wire In Evaluations Alongside the Gateway
Future AGI’s open source SDKs let you evaluate gateway traffic without adding dependencies to the live gateway request path. The evaluation SDK runs in a separate offline or async worker that pulls traffic from logs, so the production hot path remains the standard OpenAI SDK call to the gateway:
from fi.evals import evaluate
result = evaluate(
"faithfulness",
output="Paris is the capital of France.",
context="France is a country in Western Europe. Its capital is Paris.",
)
print(result.score, result.reason)
ai-evaluation and traceAI are both Apache 2.0. Set FI_API_KEY and FI_SECRET_KEY to forward results to the platform.
What the LiteLLM Compromise Changes for AI Infrastructure: Compliance, Dependency Pinning, and Architecture
Compliance Consequences: How EU Cyber Resilience Act and SOC 2 Hold Teams Liable for LiteLLM Credential Exposure
The EU Cyber Resilience Act has tightened software supply-chain accountability for products that ship to the EU market, and SOC 2 Type II audits routinely scrutinize dependency management practices in 2026. “We install the latest version from PyPI” is a much harder answer to defend during a controls review than it was a year ago. If your product uses LiteLLM and your customers’ credentials were exfiltrated, the operational and reputational consequences fall on you, not on the open-source maintainer. Specific legal obligations vary by jurisdiction and product category, so confirm with counsel. For our take on AI compliance and LLM security controls, see the enterprise compliance guide.
Dependency Pinning Is Not Enough: Why Hash Verification and Managed Gateways Are the Real Security Controls
Pinning blocks pulling a new malicious version, but it does not protect against an already-compromised pinned version, a compromised transitive dependency, a mutable source reference (Git tags, branches), or an unverified artifact pulled at install time. Hash verification (pip install --hash=sha256:<exact_hash>) is the control that actually constrains the artifact. A managed LLM gateway removes the LiteLLM proxy dependency from the application path entirely, but teams should still pin and verify the remaining application dependencies.
The Architecture Decision: How to Choose Between Self-Hosted Proxy Risk and Managed Gateway Safety After March 2026
For most teams, the practical choice in 2026 is between accepting self-hosted proxy supply-chain ownership (private mirrors, locked builds, hash-verified installs, vendor appliances) or moving routing to a managed gateway with a smaller operational trust boundary. Hybrid patterns are also fine: a managed gateway for production plus a hardened self-hosted proxy for an air-gapped environment, for example. After March 24, 2026, the risk calculus has shifted. The question is not whether your open source LLM gateway will be targeted by a PyPI supply chain attack. It is whether your architecture limits the damage when it happens.
Why Migrating to a Managed LLM Gateway Is the Permanent Fix for Supply Chain Risk After LiteLLM
The LiteLLM compromise is not a one-off event. Teams that self-host Python LLM proxies are inheriting supply chain risk they cannot realistically manage. The dependency trees are too deep, the release cadence is too fast, and pulling the latest version from PyPI is exactly the behavior attackers exploit.
Rotating credentials and pinning to a safe version solves today’s problem. Migrating to a managed gateway that removes the dependency chain entirely solves the category of problem.
Future AGI’s Agent Command Center gateway handles routing to major providers, caching, guardrails, and cost tracking without requiring a Python proxy or trust that every package in a dependency tree has not been tampered with. Because the gateway is part of Future AGI’s evaluation and observability platform, teams get a closed loop from request routing to response evaluation, with full tracing through traceAI at every step.
If your team was using LiteLLM behind the OpenAI SDK, the migration to Agent Command Center is a single config change: swap the base URL and key. Teams that used the LiteLLM Python SDK directly (from litellm import completion) replace those calls with the OpenAI SDK equivalents shown above, which is a small, mechanical code change. The risk profile shifts in either case.
Get started with Agent Command Center | Request a demo | Explore Future AGI
Frequently asked questions
Which LiteLLM versions are compromised in the March 2026 supply chain attack?
How did the attacker get a PyPI token for LiteLLM?
I never installed LiteLLM directly. Am I still at risk?
What credentials should I rotate after a LiteLLM 1.82.7 or 1.82.8 detection?
Is Future AGI Agent Command Center an open source LiteLLM alternative?
Does Future AGI Agent Command Center support the same providers as LiteLLM?
How long does migration from LiteLLM to a managed gateway actually take?
What is the right architecture decision after the March 24 2026 LiteLLM compromise?
Voice AI evaluation infrastructure in 2026: five testing layers, STT/LLM/TTS metrics, synthetic test harness, traceAI instrumentation, and Future AGI Simulate.
OpenAI Frontier vs Claude Cowork 2026 head-to-head: agent execution, governance, security, pricing, and the eval layer every CTO needs on top of both.
How engineering teams ship safe AI in 2026. CI/CD guardrails, drift detection, adversarial robustness, monitoring. Future AGI Protect + Guardrails as #1 stack.