CraftRigs
Architecture Guide

LiteLLM Was Compromised: How to Audit and Harden Your Local AI Stack

By Charlotte Stewart 9 min read

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.


**TL;DR: If you have LiteLLM installed anywhere — a server, a Docker container, a CI pipeline, a dev machine — run `pip show litellm` right now. If the version shows 1.82.7 or 1.82.8, your SSH keys, cloud API credentials, and .env files were exfiltrated to an attacker-controlled server on March 24, 2026. Rotate every key immediately. For most self-hosters, migrating to Ollama + local models removes cloud API keys from the architecture entirely — no middleware, no exposure.**

---

On March 24, 2026, at 10:39 UTC, threat actor TeamPCP published litellm 1.82.7 to PyPI. Thirteen minutes later, they pushed 1.82.8. For the next five and a half hours, anyone who ran `pip install litellm` — or used any tool that pulled it as a dependency — installed a credential harvester that auto-executed on every Python startup and silently shipped their secrets to a lookalike domain: `models.litellm[.]cloud`.

LiteLLM is downloaded roughly 3.4 million times per day and sits in 36% of cloud environments. That's not a niche package. That's the API routing layer of half the serious local AI builds running today.

This isn't a "patch and move on" situation. The malware didn't need LiteLLM to be running — it embedded itself in a `.pth` file that Python processes automatically at interpreter startup. Your secrets left your machine every time anything Python ran.

Here's exactly what to do about it.

---

## What LiteLLM Is (and Why the Compromise Hits Hard)

[LiteLLM](https://docs.litellm.ai) is a Python proxy that routes API calls across multiple LLM providers — OpenAI, Anthropic, Gemini, local Ollama endpoints — through a single unified interface. It handles rate limiting, load balancing, and credential management.

The reason it's in so many stacks is the same reason the compromise is so damaging: it sits at the layer where all your API keys live. You configure one place with your OpenAI key, your Anthropic key, your cloud credentials. LiteLLM handles the rest.

That centralization is the attack surface.

### Who Runs LiteLLM in Local AI Setups

The people most exposed here are running hybrid stacks: local models for most inference (fast, private, cheap), plus Claude API or GPT-4 as a fallback for tasks that need frontier reasoning. LiteLLM manages that routing and holds the keys to both sides. Power users managing rate limits across providers. Teams sharing API endpoints without burning through individual credentials.

Beyond intentional users: LiteLLM is a transitive dependency for a growing number of AI agent frameworks, MCP servers, and orchestration tools. If you didn't install it directly, check whether something else did.

---

## How the Attack Actually Worked

TeamPCP didn't find a vulnerability in LiteLLM's code. They compromised the supply chain upstream.

On March 19, they poisoned Trivy — the open-source security scanner that LiteLLM uses in its CI/CD pipeline — by backdooring a Trivy GitHub Action. That gave them LiteLLM's PyPI publish credentials. Four days later, on March 23, they used the same playbook against Checkmarx VS Code extensions. On March 24, they hit LiteLLM directly.

Versions 1.82.7 and 1.82.8 included two malicious files: `litellm_init.pth` and a modified `proxy_server.py`. Python processes every `.pth` file in a site-packages directory automatically when the interpreter starts. Not when LiteLLM runs. When Python runs. At all.

Once triggered, the payload ran a three-stage attack:

| Stage | What It Did
|---|---|
| **Credential harvest** | Collected SSH keys, cloud API tokens, Kubernetes secrets, crypto wallet files, and all .env files it could reach |
| **Lateral movement** | Attempted to deploy privileged pods across every node in any Kubernetes cluster it could reach |
| **Persistence** | Installed a systemd backdoor that polls for additional payloads from attacker infrastructure |

Everything collected in stage one was encrypted and exfiltrated to `models.litellm[.]cloud` — a domain designed to look like legitimate LiteLLM infrastructure.

The malicious versions were available for approximately five and a half hours before PyPI quarantined the packages.

> [!WARNING]
> LiteLLM can also arrive as a transitive dependency — pulled in by tools like AI agent frameworks, MCP servers, or orchestration libraries. Run `pip show litellm` on every environment, not just machines where you intentionally installed it.

---

## Check Your Version Right Now

This takes two minutes.

### Step 1: Find Every LiteLLM Installation

On a standard Python environment:

```bash
pip show litellm

Look for the Version: field. Safe version: 1.82.6 or earlier (or any verified release the LiteLLM team has cleared after March 24). Compromised: 1.82.7 or 1.82.8.

For Docker containers:

docker exec <container-name> pip show litellm

Run this against every container that runs Python. Check the version field on each one.

For systemd or PM2-managed services: run the command from the same user and shell that owns the service process — version mismatches between your local machine and the actual running service are common.

Step 2: Check for Transitive Dependencies

Even if you never ran pip install litellm yourself:

pip show litellm 2>/dev/null && echo "FOUND" || echo "not installed"

If it returns FOUND, trace what depends on it:

pip show litellm | grep "Required-by"

That tells you which package pulled it in.

Step 3: Run pip-audit

pip-audit is a free tool maintained by the Python Packaging Authority that scans your environment for known vulnerabilities. Install and run it:

pip install pip-audit
pip-audit

It will flag CVE-2025-45809 (the LiteLLM compromise) if your environment is affected.


Rotate Every Exposed Credential — Today

Do this even if you've already updated LiteLLM. The exfiltration happened when the malicious version ran, not when you patched it. Rotating today closes the exposure window.

Estimated time: 20–30 minutes for most setups.

OpenAI Keys

  1. Go to platform.openai.com/api-keys
  2. Find the key(s) used with LiteLLM — new project-scoped keys use the sk-proj- prefix; legacy user keys use sk-
  3. Delete each affected key
  4. Create a new key, copy it
  5. Test: curl -H "Authorization: Bearer <new-key>" https://api.openai.com/v1/models
  6. Confirm the old key shows no further activity in the usage dashboard

Anthropic/Claude Keys

  1. Go to console.anthropic.com/dashboard/keys
  2. Delete the key(s) that connected through LiteLLM — Anthropic invalidates deleted keys immediately
  3. Create a replacement, update your config
  4. Test: curl -H "x-api-key: <new-key>" -H "anthropic-version: 2023-06-01" https://api.anthropic.com/v1/models

AWS, GCP, Kubernetes

If LiteLLM had access to cloud provider credentials or ran inside a Kubernetes cluster, assume those credentials were collected too. Rotate IAM keys, revoke service account tokens, and audit your cluster for unfamiliar pods or systemd unit files — the attack attempted lateral movement and persistence installation.

.env Files and Shell History

# Check shell history for hardcoded keys
history | grep -i "api_key\|api_secret\|sk-\|anthropic"

# Check for keys baked into Docker image layers
docker history <image-name> --no-trunc | grep -i "key\|secret\|token"

Any key that appeared in those outputs on a compromised machine should be treated as exposed.

Note

Check your billing dashboards on OpenAI and Anthropic for unexpected API activity after March 24, 2026. Unusual charges — especially in regions you don't operate in — are a signal the stolen keys were used before you rotated them.


Update or Remove LiteLLM

You have two paths. Pick one based on whether you actually need cloud API routing.

Path A: Pin to a Safe Version (20 minutes)

If your architecture requires cloud API fallback and you need to stay on LiteLLM:

# Backup your config first
cp ~/.liteollm/config.yaml ~/.liteollm/config.yaml.backup

# Pin to last verified clean release
pip install litellm==1.82.6

# Or upgrade once a post-incident clean release is confirmed
pip install --upgrade litellm

# Verify
pip show litellm

# Restart your service
systemctl restart litellm
# or
pm2 restart litellm

But don't stop there. If you're staying on LiteLLM, API keys stored in .env files or environment variables is no longer acceptable. Move to a secret manager — HashiCorp Vault, 1Password CLI (op read op://vault/openai/key), or systemd credentials. The point is that if the process is ever compromised again, keys aren't sitting in plaintext config files.

Path B: Migrate to Ollama (1–2 hours)

If you're running models for coding assistance, summarization, or internal tooling — most workloads that don't require frontier reasoning — you probably don't need cloud APIs at all. Ollama handles this entirely locally.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start the server (binds to localhost:11434 by default)
ollama serve

# Pull a model
ollama pull llama3.1:70b    # for higher quality
ollama pull qwen2.5:14b     # for coding tasks

# Test
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen2.5:14b","messages":[{"role":"user","content":"Hello"}]}'

No API keys. No cloud endpoints. No middleware with access to your credentials.

For the tradeoffs between local and cloud inference, see our local vs cloud LLM comparison — the short version is that 14B+ models handle the majority of developer workloads at acceptable quality.

Warning

Ollama defaults to localhost (127.0.0.1) only — that's correct and safe. But if you're running it in Docker or on a VPS and have set OLLAMA_HOST=0.0.0.0, your inference API is accessible to the internet with no authentication. Verify with ss -tlnp | grep 11434 and lock it down with a firewall rule or reverse proxy before anything else.


Safer Alternatives to LiteLLM for Hybrid Stacks

If you need multi-provider routing and can't go fully local, here's how the alternatives compare:

Key Exposure Risk

None — no keys needed

None — no cloud routing

None

Low if keys are in Vault

Delegated to provider One architecture worth building toward: Ollama handles 80% of workloads (fast local 8B–14B models for coding and summarization), and you call the Claude API directly — no proxy — only when the task genuinely needs frontier reasoning. Fewer components, fewer attack surfaces, lower API spend.

For teams that can't avoid LiteLLM, pair it with the local LLM upgrade ladder to figure out how much of your cloud workload you can shift locally.


Rethinking API Key Management in Self-Hosted AI

The LiteLLM compromise is a specific incident. The underlying pattern — centralizing credentials in a middleware layer with a deep dependency chain and no pinned versions — is everywhere in the AI tooling space right now.

A few principles worth adopting regardless of which tools you use:

Never store keys in .env files checked into version control. Use .env.example with placeholder values and a secrets manager for the real credentials.

Pin dependency versions in production. pip install litellm with no version constraint installs whatever's latest. That's how you get 1.82.8. Use pip install litellm==1.82.6 or a lockfile.

Audit transitive dependencies. The package you installed isn't the whole picture. Run pip-audit quarterly and after any major dependency update.

Minimize the blast radius. If you're running LiteLLM with an OpenAI key that has full account permissions and no spend cap, that's a very expensive key to have stolen. Create scoped keys with spend limits at each provider.

Tip

The cleanest long-term security posture for self-hosted AI: run as many workloads as possible on local models (zero key exposure), create tightly scoped keys with spend caps for the cloud fallback cases, and store those keys in a dedicated secret manager rather than .env files or environment variables.

For a deeper dive on securing self-hosted AI infrastructure, the dual-GPU local LLM stack guide covers network isolation patterns for production setups.


FAQ

Which LiteLLM versions were compromised?

Specifically 1.82.7 and 1.82.8, published to PyPI on March 24, 2026. Version 1.82.6 is the last verified clean release. The LiteLLM team audited all releases back to 1.78.0 and cleared them. If your pip show litellm shows anything other than those two versions, you're not directly affected — but check for transitive installs anyway.

What exactly was stolen?

SSH private keys, cloud API tokens, Kubernetes secrets, crypto wallet files, and entire .env files accessible to the Python process. The payload encrypted the collection and sent it to models.litellm[.]cloud. It also attempted to deploy privileged Kubernetes pods for lateral movement and installed a systemd backdoor for persistence.

Should I stop using LiteLLM entirely?

If your architecture actually requires multi-provider cloud routing, pin to 1.82.6 or the latest cleared release and move keys into Vault or 1Password CLI. If you're running local models for most of your workload — which is most CraftRigs readers — Ollama eliminates the need for LiteLLM entirely. Fewer moving parts, zero cloud key exposure.

Is my machine still backdoored even if I uninstalled LiteLLM?

Possibly. The attack installed a persistent systemd backdoor (systemctl list-units --type=service | grep -i llm is a starting point, but the service name may vary). If you ran 1.82.7 or 1.82.8, treat the machine as compromised. Rotate credentials, audit systemd units, and consider a clean reinstall of the OS for any machine handling sensitive credentials.

How often should I audit my local AI stack?

After every major dependency update, and quarterly as a baseline. Run pip-audit in CI if you're deploying Python services, and set up GitHub Dependabot or similar for automatic CVE notifications on packages you depend on.

litellm local-llm-security supply-chain-attack api-key-management ollama

Technical Intelligence, Weekly.

Access our longitudinal study of hardware performance and architectural optimization benchmarks.