How do I know if I'm running a compromised version of LiteLLM?

Run `pip show litellm | grep Version`. If the version is 1.82.7 or 1.82.8 (released March 24, 2026), you're affected. Any version outside this range is safe. Check your active Python environment and all virtual environments in CI/CD pipelines.

What data could have been stolen from my system?

The malicious .pth file harvested SSH keys, AWS/GCP/Azure credentials, Kubernetes configs, API keys from .env files, and database passwords on every Python startup. If you ran versions 1.82.7–1.82.8 for any duration, assume all local secrets were exfiltrated.

Do I need to migrate away from LiteLLM entirely, or can I just upgrade?

Upgrading to 1.82.9+ removes the malicious code and is safe. But we recommend migrating to vLLM or Ollama because it eliminates the supply chain dependency entirely. Migration takes 2–3 hours; the security gain is permanent.

Which is better: vLLM or Ollama as a LiteLLM replacement?

vLLM if you need high concurrency and multi-model routing (production workloads, power users). Ollama if you want simplicity and single-model serving (development, consumer hardware, privacy-first). Both are supply-chain-safe alternatives with OpenAI-compatible APIs.

How long were the compromised versions available on PyPI?

Versions 1.82.7 and 1.82.8 were available for approximately 2–3 hours (March 24, 2026, starting at 10:52 UTC) before PyPI quarantined the package. The shorter exposure window limits the impact, but millions of downloads occurred during this time.

LiteLLM Compromised: Audit Your Local AI Stack [CVE-2026]

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.

TL;DR

Two LiteLLM versions (1.82.7 and 1.82.8) released on March 24, 2026, contained malicious code that stole SSH keys, cloud credentials, and Kubernetes secrets from every Python startup. If you're running LiteLLM in production, audit your version now. Safe versions: everything except 1.82.7–1.82.8. Our recommendation: migrate to vLLM 0.6.0+ (high-performance routing) or Ollama (simplicity). Both are supply-chain-safe, cost $0, and take 2–4 hours to migrate. Rotate all credentials after migration.

What LiteLLM Does (And Why It Matters for Your Stack)

LiteLLM is an API proxy layer that orchestrates multiple LLM models—it routes requests across Ollama, OpenAI, vLLM, and llama.cpp servers from a single endpoint. It handles routing logic, rate limiting, fallback strategies, and cost tracking. On paper, it's compelling: one proxy, unified interface, flexible backend switching.

But that centralization is also a liability. Every request and every credential flows through LiteLLM's process. If the proxy itself is compromised, attackers get access to your entire AI infrastructure—not just the models, but the keys to your cloud accounts, Kubernetes clusters, and databases.

The Attack Surface: Where Secrets Live

LiteLLM typically stores API keys for authentication to your inference endpoints. Every request passes through its logging and routing layers, potentially exposing prompts, API keys, or sensitive queries. A compromised proxy doesn't break your models—it silently exfiltrates your secrets while requests continue flowing normally. You'd never know until you check your cloud bills or see unauthorized access.

The March 24 Supply Chain Attack: What Actually Happened

On March 24, 2026, at 10:52 UTC, a threat actor group known as TeamPCP published two malicious versions of LiteLLM to PyPI: version 1.82.7 and 1.82.8. They obtained the maintainer's credentials by compromising Trivy, an open-source security scanner used in LiteLLM's CI/CD pipeline. It's a perfect example of supply chain attack nesting: compromise a security tool, use it to compromise the thing it's supposed to protect.

The malicious code was elegant in its persistence. Instead of injecting directly into source files (which might be spotted in code review), the attackers added a .pth file called litellm_init.pth to the site-packages directory. Python executes .pth files automatically on every interpreter startup—no import statement required. Every time you ran any Python script with LiteLLM installed, the malicious code fired.

What Did It Do?

The .pth file executed a three-stage credential harvester targeting over 50 categories of secrets:

Stage 1: SSH keys from ~/.ssh/, AWS credentials from ~/.aws/credentials, GCP application default credentials, Azure tokens, Kubernetes configs from ~/.kube/, API keys in .env files, database passwords, Discord/Slack tokens.
Stage 2: Kubernetes lateral movement toolkit—once it had cluster configs, it attempted to compromise the entire cluster through API server vulnerabilities.
Stage 3: Persistent backdoor for ongoing remote code execution.

All collected credentials were encrypted with a hardcoded RSA-4096 key and exfiltrated to attacker infrastructure.

Scope and Duration

Compromised versions available: ~2–3 hours (March 24, 2026, 10:52 UTC until PyPI quarantine)
LiteLLM's reach: 95 million monthly downloads, present in 36% of cloud environments
Estimated exposure: Millions of installations triggered the malicious code during that window
Detection: Security researchers noticed unusual network traffic patterns exfiltrating credentials; PyPI quarantined the package

Who's Actually at Risk?

Anyone running LiteLLM version 1.82.7 or 1.82.8 during March 24, 2026 (or later if the package wasn't updated)
Professional deployments with sensitive infrastructure: cloud accounts, Kubernetes clusters, databases
Multi-model setups where LiteLLM is the central routing hub
CI/CD pipelines using LiteLLM for automated inference workflows

What Was NOT Compromised

Ollama (unaffected—separate codebase)
vLLM (unaffected—separate codebase)
llama.cpp (unaffected—runs natively without proxy)
Direct model inference—only the proxy layer was compromised
Your model weights or fine-tuned data (the attack targeted infrastructure secrets, not model data)

Step 1: Audit Your Current Stack

Before deciding on a path forward, you need to know: Is LiteLLM in your deployment? Which version? What secrets does it have access to?

Check If LiteLLM Is Even Running

pip list | grep litellm

If nothing returns, you're not using LiteLLM. Skip this guide—you're safe.

Look for imports in your main application files:

grep -r "from litellm import\|import litellm" .

Check PM2 processes or systemd services for anything running litellm proxy:

pm2 list | grep litellm
systemctl list-units --type service | grep litellm

If LiteLLM is in your requirements.txt or pyproject.toml but not actively used, remove it now. Zero benefit, zero risk cost.

Identify Your LiteLLM Version

pip show litellm | grep Version

Version 1.82.7 or 1.82.8: You're affected. Proceed to Step 2 immediately.
Version 1.82.9 or later: The malicious code is gone, but credentials may have been exfiltrated. Follow Step 3 (credential rotation) and consider migrating anyway to eliminate the dependency.
Any other version: You're safe from this attack.

What Data Did LiteLLM Have Access To?

Review your configuration files or environment variables:

cat ~/.litellm/config.yaml  # if it exists
env | grep -i "api\|key\|token\|cred"

Check LiteLLM's log directory for files created on March 24, 2026:

ls -la ~/.litellm/logs/ 2>/dev/null | head -20
find ~/.litellm/logs/ -newermt "2026-03-24" -type f

Assume any data in those logs was exfiltrated. Prompts, model names, request routing decisions, API keys—all potentially compromised.

Step 2: Decide Your Path (Upgrade + Harden vs. Migrate)

You have two options:

Option A: Upgrade LiteLLM to 1.82.9+ (Short-term fix)

Pros:

Minimal code changes—existing applications continue working
Quick deployment (30 minutes)
Removes the malicious .pth file immediately

Cons:

Still trusts LiteLLM's supply chain and dependencies
One step removes the current attack but doesn't address the underlying risk
Doesn't prevent future vulnerabilities in LiteLLM or its own dependencies

When to choose: Development environments, non-critical inference, or if your infrastructure is heavily dependent on LiteLLM-specific features (advanced scheduling, cost tracking across 50+ providers).

Cost: 30 minutes, $0

Action: pip install litellm==1.82.9 and rotate credentials (Step 4).

Option B: Migrate to vLLM 0.6.0+ (Recommended for production)

vLLM is a high-performance inference server that handles batching, scheduling, and multi-model routing. It's actively maintained by UC Berkeley's Sky Computing Lab with a strong security posture.

Pros:

Drop-in OpenAI-compatible API replacement
Higher concurrency throughput (793 TPS vs. Ollama's 41 TPS at scale)
No dependency on LiteLLM's supply chain
Excellent for multi-GPU setups, production workloads

Cons:

More configuration than Ollama (30–60 minutes to learn)
Requires understanding vLLM's scheduling and model serving concepts

Performance example: A dual-GPU setup (RTX 4090 + RTX 4090) running vLLM achieves ~150–200 tokens/second on Llama 3.1 70B with quantization, versus ~120–150 tok/s without the proxy overhead. The routing cost is negligible.

When to choose: Professional deployments, power users, high-concurrency production inference, multi-model orchestration.

Cost: 2–3 hours, $0

Option C: Migrate to Ollama Direct API (Recommended for simplicity)

Ollama is a single-command local inference server. No proxy, no routing, no complexity—just one model at a time.

Pros:

Zero configuration; runs on localhost:11434 by default
Smallest attack surface (fewer dependencies, fewer moving parts)
Built-in API authentication (with reverse proxy)
Excellent for single-model deployments

Cons:

Maxes out at ~4 parallel requests by default (sufficient for most use cases)
If you need simultaneous access to multiple models, custom routing logic required
Less flexible than vLLM for advanced scheduling

When to choose: Single-model deployments, development, privacy-first stacks where you want the absolute minimum surface area, consumer hardware.

Cost: 30 minutes, $0

Step 3: Migrate Away From LiteLLM

I'll walk through both migration paths. Pick one.

Path A: Migrate to vLLM Proxy

Step 1: Install vLLM

pip install vllm[serving]

(The [serving] extra installs the OpenAI-compatible API server.)

Step 2: Create a vLLM config file

vLLM uses a YAML config to define models and routing rules. Create vllm_config.yaml:

served_model_names:
  - llama-3.1-70b
  - mistral-7b

model_configs:
  - model: meta-llama/Llama-3.1-70B-Instruct
    served_name: llama-3.1-70b
    gpu_memory_utilization: 0.9

  - model: mistralai/Mistral-7B-Instruct-v0.2
    served_name: mistral-7b
    gpu_memory_utilization: 0.9

(Replace with your actual models. See vLLM docs for the full schema.)

Step 3: Start vLLM

python -m vllm.entrypoints.openai.api_server \
  --config=vllm_config.yaml \
  --port=8000

Step 4: Test a single endpoint

curl -X POST http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "prompt": "Hello world",
    "max_tokens": 50
  }'

You should get a completion back.

Step 5: Update your application

Find any references to LiteLLM and swap them to vLLM:

# Before (LiteLLM)
from litellm import completion
response = completion(model="llama-3.1-70b", messages=[...])

# After (vLLM)
import openai
openai.api_base = "http://localhost:8000/v1"
response = openai.ChatCompletion.create(
    model="llama-3.1-70b",
    messages=[...]
)

vLLM's API is fully OpenAI-compatible, so your existing code often works with just the endpoint change.

Step 6: Verify all models load

curl http://localhost:8000/v1/models

You should see your model list. Test inference with your actual workload. Measure latency before/after to confirm there's no regression.

Step 7: Tear down LiteLLM

Once vLLM is stable and handling your workload:

pip uninstall litellm -y
rm -rf ~/.litellm/

Remove LiteLLM from requirements.txt and pyproject.toml.

Path B: Migrate to Ollama Direct API

Step 1: Ensure Ollama is running

ollama serve

(Runs on localhost:11434 by default.)

Step 2: Update your client code

import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "llama2",
        "prompt": "Hello, world!",
        "stream": False
    }
)
print(response.json()["response"])

Or use Ollama's OpenAI-compatible endpoint (if you have a reverse proxy with auth):

import openai
openai.api_base = "http://localhost:11434/v1"
response = openai.ChatCompletion.create(
    model="llama2",
    messages=[{"role": "user", "content": "Hello"}]
)

Step 3: Add API authentication (optional)

If you're exposing Ollama remotely, wrap it with a reverse proxy for authentication. Using nginx:

server {
    listen 8000;
    auth_basic "Ollama API";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://localhost:11434;
    }
}

(Create .htpasswd with htpasswd -c /etc/nginx/.htpasswd username.)

Step 4: Test your actual inference workload

Run your application against Ollama. Measure latency. Confirm all prompts are working.

Step 5: Uninstall LiteLLM

pip uninstall litellm -y

Step 4: Audit Your Logs and Rotate Credentials

Now that you've migrated away from the compromised proxy, you need to assess what was exposed.

What to Check

Find LiteLLM's logs (if available):

find ~/.litellm -name "*.log" -o -name "*.jsonl" 2>/dev/null
ls -la ~/.litellm/logs/ | grep "2026-03-24"

Any logs created on March 24 were potentially exfiltrated. Review them for:

API keys or authentication tokens
Full request/response bodies (which might include user prompts)
Model names or orchestration logic that could hint at your infrastructure

Check LiteLLM's configuration:

cat ~/.litellm/config.yaml 2>/dev/null

Every API key stored here was potentially compromised. Note each one.

Check .env files:

find ~ -maxdepth 3 -name ".env*" -type f 2>/dev/null

Any secrets in .env files on the same machine as LiteLLM were harvested.

Rotate All Credentials (Critical)

Assume every secret on your machine during March 24 was exfiltrated. Rotate proactively:

AWS credentials: Delete the old keys from IAM console, create new ones
GCP: Rotate service account keys, revoke old ADC tokens
Azure: Regenerate connection strings and access keys
Kubernetes: Regenerate cluster certificates and service account tokens
API keys (OpenAI, Anthropic, etc.): Revoke old keys, generate new ones
SSH keys: Generate new key pairs, update authorized_keys everywhere
Database passwords: Change all passwords for databases accessible from your machine

For each credential:

Note where it was used (which scripts, which config files)
Rotate the credential
Update your application or config to use the new one
Deploy the change
Monitor for access errors (indicates a place you missed)

Step 5: Harden Your New Stack

Whether you chose vLLM or Ollama, apply these hardening practices to prevent future supply chain risks.

API Key Management

Never store API keys in config files:

# Bad
API_KEY="sk-1234567890abcdef" in config.yaml

# Good
export API_KEY="sk-1234567890abcdef"  # from .env
source .env

Use python-dotenv to load from .env (make sure .gitignore includes .env):

from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("API_KEY")

For highly sensitive credentials, use OS-level secrets:

# macOS
security add-generic-password -a "$USER" -s "api_key" -w "secret_value"

# Linux with systemd
systemd-ask-password "Enter API key"

Rotate keys on a schedule: Monthly minimum, more frequently for production.

Request Logging — What to Log, What NOT to

DO log:

Timestamps
Model name
Token counts
Latency
Error codes
HTTP status

DO NOT log:

Full prompts or responses
API keys or authentication headers
User identities (unless anonymized/hashed)
Inference details that hint at your workload

If you must log prompts for auditing, encrypt them at rest and store them on a separate, secured machine—not your inference server.

Dependency Scanning

Check for known vulnerabilities weekly:

pip install pip-audit
pip-audit --desc

Add it to your CI/CD pipeline:

pip-audit --desc --fail-on=high

Pin dependencies to specific versions in requirements.txt to prevent automatic upgrades to compromised versions:

# Good
litellm==1.82.9
vllm==0.6.0

# Risky (auto-upgrades to potentially compromised versions)
litellm>=1.82.0
vllm>=0.6.0

Network Isolation

Run vLLM or Ollama on a private interface, not 0.0.0.0:

# vLLM: edit config or run with --host
python -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 8000

# Ollama: default is already localhost

Use firewall rules to whitelist only your application IPs:

sudo ufw allow from 192.168.1.100 to any port 8000

If you expose the API remotely, always use TLS + authentication (never plain HTTP):

# Example: nginx reverse proxy with TLS
server {
    listen 8443 ssl;
    ssl_certificate /path/to/cert.pem;
    ssl_certificate_key /path/to/key.pem;

    auth_basic "vLLM API";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://127.0.0.1:8000;
    }
}

Why This Matters: Supply Chain Lessons for Local AI

Here's the hard truth: Local AI isn't automatically safer just because it's running on your hardware.

Many builders believe "local = secure" because data doesn't leave the machine. But LiteLLM proved that a local service can silently exfiltrate credentials and secrets without you knowing. The moment you install a dependency, you're trusting not just that library's code, but every library it depends on. And every library those depend on. It's dependencies all the way down.

The False Security of "It's Running Locally"

Your security model for local AI isn't "the machine is trusted because it's mine." It's "every process running on the machine is suspect until proven otherwise."

A compromised proxy can steal your AWS keys while serving inference requests correctly. You'd never notice. The infrastructure breach would appear months later when an attacker spins up expensive instances in your account.

Evaluating New Tools (Going Forward)

Before adopting a new proxy, orchestrator, or wrapper, ask:

Is it actively maintained? (Last commit in the last month?)
Does it have security audits or bug bounties? (Community trust signals)
What dependencies does it pull in? (Use pip-tree or pipdeptree to visualize the dependency tree—the deeper and wider, the higher your supply chain risk)
Could I replace it with a simpler alternative? (vLLM vs. LiteLLM — vLLM is simpler and more focused)
Who maintains it? (Individual maintainer = bus factor risk; organization = better bus factor)

Simple is more secure. Fewer moving parts = fewer attack surfaces = fewer places for attackers to hide malicious code.

Open Source Doesn't Mean Safe

LiteLLM is open source and popular. That didn't prevent the attack. Audited code is safer than unaudited code, but audits aren't perfect and audited code can still have supply chain vulnerabilities (as we saw: the attack wasn't in LiteLLM's code, it was in Trivy, which was used to validate LiteLLM's code).

Trust the process, not the fact of being open-source: active maintenance, rapid response to security issues, transparency about vulnerabilities.

FAQ

"Do I need to take my system offline?"

No, but you should migrate away from LiteLLM immediately (within 24 hours if you ran 1.82.7–1.82.8). You can run both LiteLLM and vLLM side-by-side during migration, gradually shifting traffic. Once vLLM or Ollama is handling all requests, uninstall LiteLLM and rotate credentials.

"What if I've already upgraded to 1.82.9 or later?"

The patch removed the malicious .pth file. The attacker's code is gone. But anything harvested before the patch (SSH keys, cloud credentials, API keys) was potentially exfiltrated. Still rotate credentials as a precaution. Still audit your logs for March 24 activity. Consider migrating to vLLM or Ollama anyway—it eliminates LiteLLM's supply chain dependency entirely.

"Can I trust vLLM now?"

vLLM was not affected by this attack (separate codebase). It's maintained by UC Berkeley's Sky Computing Lab with good security practices and a strong community. Like any dependency, monitor security advisories: use pip-audit weekly and keep it updated.

"How do I know if attackers accessed my AWS account?"

Check CloudTrail for unauthorized API calls during and after March 24, 2026. Look for:

Unusual EC2 instance launches in unexpected regions
IAM role assumption from unfamiliar IP addresses
S3 bucket access from unknown principals
VPC flow logs showing exfiltration to external IPs

If you see suspicious activity, isolate your environment immediately and contact AWS support.

"Should I use a different proxy layer instead?"

If you need multi-model routing, vLLM is the best-in-class option (actively maintained, audited, high-performance, no supply chain bloat). If you don't need routing, Ollama's direct API is simpler and more secure. There are other orchestrators (LocalAI, sgLang), but they solve different problems and may introduce their own supply chain risks. Stick with actively maintained projects with clear security practices.

The Bottom Line

The LiteLLM compromise shows that "local" doesn't mean "safe." Supply chain attacks are now table stakes for self-hosted infrastructure. The lesson isn't "never use third-party tools"—it's "use them carefully, understand your dependencies, rotate credentials regularly, and be ready to migrate."

vLLM and Ollama are solid, maintained alternatives. Both cost $0. Both take a few hours to migrate to. The security upside—eliminating LiteLLM's supply chain risk—is permanent.

If you're running 1.82.7 or 1.82.8, audit your version today. If you're on 1.82.9+, rotate credentials and consider migrating. Either way, apply the hardening steps above and you'll sleep better.

LiteLLM Compromised: Audit Your Local AI Stack [CVE-2026]

TL;DR

What LiteLLM Does (And Why It Matters for Your Stack)

The Attack Surface: Where Secrets Live

The March 24 Supply Chain Attack: What Actually Happened

What Did It Do?

Scope and Duration

Who's Actually at Risk?

What Was NOT Compromised

Step 1: Audit Your Current Stack

Check If LiteLLM Is Even Running

Identify Your LiteLLM Version

What Data Did LiteLLM Have Access To?

Step 2: Decide Your Path (Upgrade + Harden vs. Migrate)

Option A: Upgrade LiteLLM to 1.82.9+ (Short-term fix)

Option B: Migrate to vLLM 0.6.0+ (Recommended for production)

Option C: Migrate to Ollama Direct API (Recommended for simplicity)

Step 3: Migrate Away From LiteLLM

Path A: Migrate to vLLM Proxy

Path B: Migrate to Ollama Direct API

Step 4: Audit Your Logs and Rotate Credentials

What to Check

Rotate All Credentials (Critical)

Step 5: Harden Your New Stack

API Key Management

Request Logging — What to Log, What NOT to

Dependency Scanning

Network Isolation

Why This Matters: Supply Chain Lessons for Local AI

The False Security of "It's Running Locally"

Evaluating New Tools (Going Forward)

Open Source Doesn't Mean Safe

FAQ

"Do I need to take my system offline?"

"What if I've already upgraded to 1.82.9 or later?"

"Can I trust vLLM now?"

"How do I know if attackers accessed my AWS account?"

"Should I use a different proxy layer instead?"

The Bottom Line

Sources

Technical Intelligence, Weekly.