Used Server GPUs for Local LLMs: Tesla P40, A100, and What's Actually Worth It

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.

TL;DR: The Tesla P40 at $150–200 is the best value server GPU for budget builds. The A100 is legitimately excellent but priced out of most hobbyist budgets. The H100 is a waste of money for local inference unless you're running a small operation.

Data center GPUs have flooded eBay over the last two years as companies rotate out older hardware. For local LLM builders, this looks like a deal — and sometimes it is. But there are real gotchas that nobody warns you about before you pull the trigger. This guide covers what to buy, what to avoid, and what you're actually getting into.

Why Server GPUs at All?

Consumer GPUs like the RTX 3090 and 4090 are the default recommendation for local LLMs. They're well-supported, easy to set up, and have display outputs. But server GPUs offer something consumer cards can't match at the price: raw VRAM.

Tesla P40: 24GB GDDR5 for ~$150–200
A100 40GB: 40GB HBM2 for ~$2,000–3,000
A100 80GB: 80GB HBM2e for ~$5,000–8,000
H100 80GB: 80GB HBM3 for $15,000+ (not realistic for individuals)

The P40 especially stands out. $150–200 for 24GB of VRAM is a deal that's hard to match anywhere in the consumer market. The RTX 3090 gives you the same 24GB but at $700–900 used.

Tesla P40: The Budget Hero

The Tesla P40 is a Pascal-era (2016) data center GPU with 24GB of GDDR5. It was designed for inference workloads, which means it handles LLM inference natively well.

What's good:

24GB VRAM at $150–200 on eBay
Solid FP32 performance for LLM inference
Runs cool and quiet with proper airflow (it's a blower card designed for server racks)
Well-supported by llama.cpp and most local LLM software

What's bad:

No display output. Zero. You cannot plug a monitor into it. You need a separate GPU for display, or you SSH into the machine.
Requires active blower cooling or it will thermal throttle. These cards were designed to sit in server racks with front-to-back airflow. In a standard desktop case, you need a dedicated fan blowing directly over the heatsink, or you add an aftermarket cooler.
Needs a PCIe power riser in some configurations — it uses a different connector than consumer GPUs.
Pascal architecture means no Tensor Cores, no Flash Attention optimizations. Slower per token than an equivalent RTX card.

Who should buy it:

Someone building a dedicated inference machine that they'll SSH into anyway
Budget builders who want 24GB VRAM and can't afford a 3090
People building a secondary GPU to add to an existing system

Who should skip it:

Anyone who wants a simple plug-and-play experience
Builders who need display output from the AI GPU
Anyone who wants modern architecture optimizations

eBay tips for the P40:

Buy from sellers with 98%+ feedback and a return policy
Search "Tesla P40 24GB tested" — avoid listings that say "pulled from server, untested"
Check for physical damage to the blower fan and PCIe connector
Budget $30–50 for a Noctua or Arctic fan mod if you're putting it in a desktop case

A100: The Real Deal

The A100 is a different category entirely. This is what cloud providers run for production inference. The 40GB and 80GB variants are both legitimate powerhouses.

A100 40GB (~$2,000–3,000):

40GB HBM2 with insane memory bandwidth (1,555 GB/s)
Tensor Cores with BF16 support — runs quantized models very efficiently
FP16 and INT8 inference support with hardware acceleration
The A100 40GB can run 70B models at Q4_K_M (fits tightly at ~40GB). For 70B at Q8, you need the A100 80GB (Q8 IS quantization, and 70B Q8 requires ~70GB — far beyond 40GB)

A100 80GB (~$5,000–8,000):

80GB HBM2e with even higher bandwidth
Can run 405B quantized or 70B completely unquantized
The card that makes "what's the largest model I can run" a boring question

What's bad about A100s:

Same no-display-output issue as the P40
Requires SXM4 socket or PCIe form factor — most eBay listings are PCIe, which is what you want for desktop builds
SXM4 variants are essentially useless without a full server board
Power draw is significant: A100 PCIe is 300W, SXM4 is 400W
No driver support for gaming or desktop use whatsoever

eBay risks with A100s:

High counterfeit/rebadged risk. At $2,000–3,000 there are fake listings. Look for sellers with verified photos of the actual card.
"SXM4" variants are a common mistake purchase — they require a DGX server, not a desktop motherboard
Verify PCIe vs SXM form factor before buying. The listing should say "PCIe" explicitly.
Run nvidia-smi immediately when the card arrives to verify VRAM and model number

Who should buy an A100:

Small teams or businesses running shared local inference servers
Researchers who need to run full-precision 70B models
Anyone who has already hit a wall with 24GB and needs a real upgrade path

H100: Almost Never Worth It for Individuals

The H100 80GB runs ~$15,000–20,000 on the used market in 2026. For local inference, it's faster than the A100 but not proportionally faster at that price.

The H100's advantages are mostly in training throughput and multi-node distributed inference. For single-machine local inference, the gains over an A100 80GB don't justify a 3x price premium.

Skip it unless you're building something that generates revenue and needs to scale.

The Practical Recommendation

Here's the honest breakdown:

Under $500 budget for GPU: Tesla P40. Accept the tradeoffs, set it up with SSH access, mod the cooling, and enjoy 24GB VRAM for $200.
$700–1,000 budget: RTX 3090 used consumer card. Easier setup, better architecture, display output. Worth the premium over the P40 for most people.
$2,000–3,500 budget: A100 40GB PCIe. Legitimate step-change in what you can run. Memory bandwidth alone makes 70B+ inference dramatically faster.
$5,000+ for GPU: A100 80GB PCIe. The highest practical tier for individual builders.

General eBay Safety Tips for Server Hardware

Always buy with PayPal Goods & Services or a credit card with purchase protection
Test immediately on arrival. Don't wait two weeks to run nvidia-smi.
Server GPUs often come dirty. Compressed air and thermal paste replacement is expected.
Check the PCIe power connector pins for bending before plugging in
Cooling mods are often necessary — budget for them upfront