LLMs

AI Agents

LiteLLM Compromised: A Developer's Guide to Incident Response, Alternatives, and LLM Gateway Migration

LiteLLM Compromised: A Developer's Guide to Incident Response, Alternatives, and LLM Gateway Migration

LiteLLM Compromised: A Developer's Guide to Incident Response, Alternatives, and LLM Gateway Migration

LiteLLM Compromised: A Developer's Guide to Incident Response, Alternatives, and LLM Gateway Migration

LiteLLM Compromised: A Developer's Guide to Incident Response, Alternatives, and LLM Gateway Migration

Last Updated

Mar 25, 2026

By

Rishav Hada
Rishav Hada

Time to read

1 min read

Table of Contents

TABLE OF CONTENTS

1. Introduction

On March 24, 2026, LiteLLM was backdoored with credential-stealing malware. Versions 1.82.7 and 1.82.8, published by threat actor TeamPCP, contained a three-stage payload that harvested SSH keys, cloud credentials, Kubernetes secrets, and cryptocurrency wallets from every machine where the package was installed. The entire LiteLLM package is now quarantined on PyPI.

This was the third strike in a coordinated supply chain campaign that started with Aqua Security's Trivy scanner on March 19, escalated through Checkmarx's KICS GitHub Actions on March 23, and reached LiteLLM on March 24. The attack worked because LiteLLM sits at the center of AI infrastructure: a self-hosted Python proxy present in 36% of cloud environments, often pulled in as a transitive dependency by agent frameworks developers never audited.

This guide covers a full technical breakdown, an incident response playbook, the structural case against self-hosted Python LLM proxies, and a complete migration path to Future AGI's Prism gateway.

2: What Happened to LiteLLM

2.1 Timeline: From Trivy to LiteLLM in Five Days

Date

Time (UTC)

Target

Attack Method

Official Source

March 19

17:43 - 20:38

Aqua Security Trivy

Force-pushed 76/77 version tags in trivy-action and all 7 setup-trivy tags to credential-stealing malware; published malicious binary v0.69.4

GHSA-69fq-xp46-6x23

March 23

12:58 - 16:50

Checkmarx KICS GitHub Actions

Hijacked all 35 tags via compromised cx-plugins-releases service account; deployed stealer via checkmarx[.]zone C2. Also compromised OpenVSX extensions at 12:53 UTC

kics-github-action#152, Checkmarx Update

March 24

10:39 - 11:25

LiteLLM

Used stolen PyPI token (from Trivy compromise in LiteLLM's CI/CD) to publish v1.82.7 at 10:39 and v1.82.8 at 10:52. PyPI quarantined at ~11:25 UTC

litellm#24512, LiteLLM Update

As confirmed in LiteLLM's official security update, the project's CI/CD pipeline ran Trivy without a pinned version. The compromised Trivy action exfiltrated the PYPI_PUBLISH token from the GitHub Actions runner, which TeamPCP used to publish malicious packages directly to PyPI.

2.2 How Versions 1.82.7 and 1.82.8 Were Weaponized

Version 1.82.7 embedded the payload in proxy/proxy_server.py, activating on import. Version 1.82.8 was more aggressive: it included a .pth file named litellm_init.pth that executed on every Python process startup, not just when LiteLLM was imported. Python's site module processes all .pth files in site-packages during interpreter initialization, as documented in the GitHub issue. The payload used double base64 encoding:

import os, subprocess, sys
subprocess.Popen([
    sys.executable, "-c",
    "import base64; exec(base64.b64decode('...'))"
])

Once executed, the payload ran a three-stage attack:

  • Stage 1 (Credential Harvesting): Collected SSH keys, AWS/GCP/Azure tokens, environment variables including API keys and tokens, .env files, Kubernetes configs, Docker configs, database credentials, shell history, browser cookies, and cryptocurrency wallets. The harvester scanned every common credential storage location on the host.

  • Stage 2 (Kubernetes Lateral Movement): Deployed privileged Alpine pods into the kube-system namespace on every reachable node, accessing cluster secrets and service account tokens for broader infrastructure compromise.

  • Stage 3 (Persistent Backdoor): Installed sysmon.py as a systemd service that polled checkmarx[.]zone/raw for additional binaries, providing persistent access even after the initial malware was discovered.

All harvested data was encrypted and exfiltrated via POST request to models.litellm[.]cloud, a lookalike domain controlled by TeamPCP. The irony is hard to miss: LiteLLM is an API key management gateway. The attacker targeted the one package that, by design, has access to every LLM API key in the organization.

2.3 The Blast Radius Extends Beyond Direct LiteLLM Users

The .pth mechanism means the malware fired on every Python process on any machine where LiteLLM 1.82.8 was installed, even if the user never ran import LiteLLM. A data scientist running Jupyter, a DevOps engineer running Ansible: all compromised if the package existed anywhere in their Python environment. The key distinction is that you did not need to install it yourself. If another package in your dependency tree pulled it in, the malware still executed. As reported in GitHub issue #24512, the researcher who discovered this attack found it because their Cursor IDE pulled LiteLLM through an MCP plugin without explicit installation.

3: How to Check If You're Affected

3.1 Check Installed Versions

pip show litellm | grep Version
pip cache list litellm
find / -name "litellm_init.pth" 2>/dev/null

Run this across local machines, CI/CD runners, Docker images, staging, and production. Check Docker layer histories too.

3.2 Scan Egress Logs for Exfiltration

Any traffic to models.litellm[.]cloud or checkmarx[.]zone is a confirmed breach:

# CloudWatch
fields @timestamp, @message
| filter @message like /models\.litellm\.cloud|checkmarx\.zone/

# Nginx
grep -E "models\.litellm\.cloud|checkmarx\.zone" /var/log/nginx/access.log

3.3 Audit Transitive Dependencies

pip show litellm  # Check "Required-by" field

If other packages listed there are in your dependency tree, LiteLLM entered your environment without your explicit consent.

4: Incident Response Playbook

4.1 Isolate and Rotate

docker ps | grep litellm | awk '{print $1}' | xargs docker kill
kubectl scale deployment litellm-proxy --replicas=0 -n your-namespace

The first step is to stop all running LiteLLM containers and scale down any Kubernetes deployments that use the compromised package. These commands kill active Docker containers matching "litellm" and set the proxy deployment replica count to zero, which immediately halts all traffic flowing through the infected gateway.

4.2 Credential Rotation Checklist

Credential rotation means replacing every secret, key, and password that existed on an affected machine with a new one, then revoking the old value. Because the malware harvested everything it could find, any credential that was stored on or accessible from the compromised environment should be treated as known to the attacker.

Credential Type

What to Rotate

Cloud Provider Tokens

AWS access keys, GCP service account keys, Azure AD tokens

SSH Keys

All keys in ~/.ssh/, regenerate and redistribute

Database Credentials

Connection strings, passwords in .env files

API Keys

OpenAI, Anthropic, Gemini, all LLM provider keys

Service Account Tokens

Kubernetes service accounts, CI/CD tokens, PyPI tokens

Crypto Wallets

Move funds immediately if wallet files were on the machine

4.3 Audit Kubernetes and Remove All Artifacts

# Check for lateral movement
kubectl get pods -n kube-system | grep -i "node-setup"
find / -name "sysmon.py" 2>/dev/null

# Full removal
pip uninstall litellm -y && pip cache purge
rm -rf ~/.cache/uv
find $(python -c "import site; print(site.getsitepackages()[0])") \
    -name "litellm_init.pth" -delete
rm -rf ~/.config/sysmon/ ~/.config/systemd/user/sysmon.service
docker build --no-cache -t your-image:clean .

The malware deployed privileged pods into the kube-system namespace and installed a persistent backdoor (sysmon.py) as a systemd service, so you need to check for both before cleaning up. The commands below scan for unauthorized Kubernetes pods and persistence artifacts, then fully remove LiteLLM, its cached packages, the malicious .pth file, and all backdoor files before rebuilding your Docker images from clean base layers.

Do not downgrade. Remove entirely and replace.

5: Why Self-Hosted Python LLM Proxies Are a Structural Risk

5.1 The Dependency Tree Problem

LiteLLM's Python proxy inherits hundreds of transitive dependencies spanning ML frameworks, data processing libraries, and provider SDKs. Every dependency is a trust decision most teams make automatically with pip install --upgrade. When you add LiteLLM to your project, you are not just trusting LiteLLM. You are trusting every package it depends on, every package those packages depend on, and every maintainer account associated with each one.

The .pth attack vector is particularly dangerous because most supply chain scanning tools focus on setup.py, __init__.py, and entry points. The .pth mechanism is a legitimate Python feature for path configuration that has been largely overlooked as an injection vector for Python dependency attacks. Expect this technique in future attacks against open source LLM gateway packages. Traditional security scanning would not have caught this.

5.2 You Own the Blast Radius

The LiteLLM maintainers did not rotate their CI/CD credentials for five days after the Trivy disclosure on March 19. If the maintainers could not respond fast enough, most teams using their software had no chance. This is an inherent LLM gateway security problem with the self-hosted model.

6: How Future AGI's Prism Gateway Eliminates This Risk

6.1 Managed Gateway, Zero Dependencies with Prism

Prism is Future AGI's AI Gateway. It sits between your application and LLM providers as a managed proxy layer hosted at gateway.futureagi.com. Instead of installing a Python package like LiteLLM to route requests across multiple providers, Prism handles that routing as a cloud service.

In practical terms, it does what LiteLLM did (route requests to 100+ LLM providers like OpenAI, Anthropic, Gemini, Bedrock, etc. through a single API), but without requiring you to run anything in your own infrastructure. You send requests using the standard OpenAI API format, and Prism handles provider translation, failover, caching, guardrails, cost tracking, and streaming on its end.

The key difference from LiteLLM: your attack surface is an API key and a URL, not a Python environment with hundreds of transitive dependencies. You can read the full docs.

Prism works like any other LLM provider from your application's perspective. You point your existing OpenAI SDK at a new base URL, swap in a Prism API key, and your code runs without any other changes. There is no library to install, no proxy to deploy, and no dependency tree to audit.

6.2 One Config Change to Migrate

Before (LiteLLM):

from litellm import completion
response = completion(model="gpt-5", messages=[{"role": "user", "content": "Hello"}])

After (Prism):

from openai import OpenAI
client = OpenAI(base_url="https://gateway.futureagi.com", api_key="sk-prism-your-key")
response = client.chat.completions.create(
    model="gpt-5", messages=[{"role": "user", "content": "Hello"}]
)

Same OpenAI SDK format, same model naming, same response schema. Here is the TypeScript equivalent:

import OpenAI from "openai";
const client = new OpenAI({
    baseURL: "https://gateway.futureagi.com",
    apiKey: "sk-prism-your-key"
});
const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }]
});

Provider keys are configured once in the Prism dashboard. No environment variables scattered across your codebase or stored in .env files on developer machines where credential theft malware can reach them.

6.3 Semantic Caching and Guardrails

Every LLM API call costs money and adds latency. When your application sends the same question (or a slightly different version of it) hundreds of times a day, you are paying the provider for every single call and waiting for a full inference cycle each time. Prism caching eliminates that waste by storing LLM responses at the gateway level and returning them instantly on repeat queries, without ever hitting the provider.

Prism supports two caching modes. Exact match caching returns a stored response when the request parameters are identical, which works well for deterministic queries like template-based prompts or FAQ bots. Semantic caching goes further: it uses vector embeddings to match queries that mean the same thing but are worded differently. For example, "What is your return policy?" and "How do I return an item?" would hit the same cache entry even though the words are completely different.

Namespace isolation keeps caches partitioned across environments like prod, staging, and dev, so test data never leaks into production responses. Cached responses return with X-Prism-Cost: 0, which means zero provider charges for every cache hit.

from prism import Prism, GatewayConfig, CacheConfig
client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(
        cache=CacheConfig(enabled=True, mode="semantic", ttl="5m", namespace="prod"),
    ),
)

Prism applies 18+ built-in guardrails at the gateway layer before requests reach the LLM provider. These include PII detection, prompt injection prevention, and content moderation for output safety. 

6.4 Cost Tracking, 100+ Providers, and How Prism Compares

Every response includes X-Prism-Cost. Budget limits block requests when exceeded. OpenAI, Anthropic, Gemini, Bedrock, Azure, Cohere, Groq, Mistral, Ollama, vLLM: switch providers by changing the model name. Prism translates non-OpenAI APIs automatically.

When evaluating LiteLLM alternatives after this incident, here is how the key options compare:

Feature

LiteLLM (Pre-Compromise)

Prism (Future AGI)

Other Managed Gateways

Deployment

Self-hosted Python proxy

Managed HTTPS endpoint

Varies (self-hosted/managed)

Dependency Risk

Hundreds of transitive Python deps

Zero client dependencies

Varies by architecture

Built-in Guardrails

Limited

18+ (PII, injection, secrets)

0-5 typically

Evaluation Integration

None

Full platform (traceAI + evals)

Requires third-party tools

Semantic Caching

Basic exact match

Exact + semantic with namespaces

Some offer semantic

Supply Chain Exposure

Full Python dependency tree

API key + URL only

Depends on deployment model

Among LiteLLM alternatives, Prism is the only managed LLM gateway that pairs routing with built-in evaluation and observability in a single platform.

7: Migration Scenarios

7.1 From LiteLLM Proxy Server (Docker/Kubernetes)

Remove the proxy infrastructure entirely. Update your application's base URL:

env:
  - name: LLM_BASE_URL
    value: "https://gateway.futureagi.com"  # was http://litellm-proxy:4000
  - name: LLM_API_KEY
    value: "sk-prism-your-key"

Delete the LiteLLM pod, its service, Postgres, and Redis. That is infrastructure you no longer maintain, patch, or worry about during the next AI infrastructure security incident.

7.2 Adding Post-Migration Controls

curl -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "x-prism-cache-force-refresh: true" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"What is AI?"}]}'

8: What This Changes for AI Infrastructure

8.1 Compliance Consequences

The EU Cyber Resilience Act makes organizations legally responsible for the security of open-source components in their products. SOC 2 Type II audits scrutinize dependency management practices. "We install the latest version from PyPI" is no longer an acceptable answer during a controls review. If your product uses LiteLLM and your customers' credentials were exfiltrated, the liability is yours, not the open-source maintainer's. For details on AI compliance and LLM security, see our enterprise guide.

8.2 Dependency Pinning Is Not Enough

Pinning prevents pulling a new malicious version but not a compromised maintainer overwriting an existing tag. Hash verification (pip install --hash=sha256:<exact_hash>) is the real control. A managed LLM gateway eliminates the need for pinning entirely because there is no dependency to pin.

8.3 The Architecture Decision

Every team running LLM applications now faces a binary choice: own your proxy infrastructure and inherit every supply chain risk, or use a managed gateway and reduce your trust boundary to an API endpoint. After March 24, 2026, the risk calculus has permanently shifted. The question is not whether your open source LLM gateway will be targeted by a PyPI supply chain attack. It is whether your architecture limits the damage when it happens.

9. Conclusion

The LiteLLM compromise is not a one-off event. Teams that self-host Python LLM proxies are inheriting supply chain risk they cannot realistically manage. The dependency trees are too deep, the release cadence is too fast, and pulling the latest version from PyPI is exactly the behavior attackers exploit.

Rotating credentials and pinning to a safe version solves today's problem. Migrating to a managed gateway that removes the dependency chain entirely solves the category of problem.

Future AGI's Prism gateway handles routing to 100+ providers, caching, guardrails, and cost tracking without requiring a Python proxy or trust that every package in a dependency tree has not been tampered with. Because Prism is part of Future AGI's evaluation and observability platform, teams get a closed loop from request routing to response evaluation, with full tracing through traceAI at every step.

If your team was using LiteLLM, the migration to Prism is a single config change. The code stays the same. The risk profile does not.

Get started with Prism | Request a demo | Explore Future AGI

Frequently Asked Questions

Is it safe to install any version of LiteLLM from PyPI right now?

Can dependency pinning alone prevent a PyPI supply chain attack like this?

How does a managed LLM gateway differ from a self-hosted proxy for credential security?

Were teams using LiteLLM's official Docker image affected?

Table of Contents

Table of Contents

Table of Contents

Rishav Hada is an Applied Scientist at Future AGI, specializing in AI evaluation and observability. Previously at Microsoft Research, he built frameworks for generative AI evaluation and multilingual language technologies. His research, funded by Twitter and Meta, has been published in top AI conferences and earned the Best Paper Award at FAccT’24.

Rishav Hada is an Applied Scientist at Future AGI, specializing in AI evaluation and observability. Previously at Microsoft Research, he built frameworks for generative AI evaluation and multilingual language technologies. His research, funded by Twitter and Meta, has been published in top AI conferences and earned the Best Paper Award at FAccT’24.

Related Articles

Related Articles

future agi background
Background image

Ready to deploy Accurate AI?

Book a Demo
Background image

Ready to deploy Accurate AI?

Book a Demo
Background image

Ready to deploy Accurate AI?

Book a Demo
Background image

Ready to deploy Accurate AI?

Book a Demo