TL;DR: Both cards have 24GB VRAM. The 4090 is roughly 40% faster for inference but costs significantly more even used. The 3090 is the better value pick if you're on a budget. The 4090 is worth it if you do this daily and speed matters.
The used market for high-VRAM GPUs is the best place to build a serious local LLM rig right now. Both the RTX 3090 and RTX 4090 have landed at a price point where they're accessible without being cheap, and both offer the 24GB of VRAM that unlocks serious model inference. This guide helps you pick the right one.
Current Used Market Prices (Early 2026)
Prices shift week to week on eBay and Facebook Marketplace, but here's where things stand:
RTX 3090 24GB:
- eBay average: $700–900 for clean reference or AIB cards
- Low end (good condition, minor cosmetic wear): $650
- High end (rare triple-fan OC variants, box included): $950
- Local Facebook Marketplace: $600–750 if you're patient
RTX 4090 24GB:
- eBay average: $1,400–1,700 for most listings
- Low end (founders edition, minor wear): $1,300
- High end (AIB OC cards in great condition): $1,800+
- Local deals are rarer but possible at $1,200–1,400
The price gap is roughly $600–900 depending on exact condition and timing. That gap is the whole question.
VRAM: The Tie
Both cards have 24GB of VRAM. For local LLM inference, this is the most important number. They're equal here.
What that 24GB gets you:
- Llama 3.1 70B at Q2_K or Q3_K (lower quality but it fits)
- Any 34B model at Q4 or higher (good quality)
- Any 13B or smaller model fully unquantized (maximum quality)
- Mistral, Qwen, Phi-4, Gemma 3 — all the popular mid-range models run beautifully
Neither card edges the other on VRAM. That battle is a draw.
Architecture: Where They Diverge
The RTX 3090 uses the Ampere architecture (GA102 chip, 2020). The RTX 4090 uses Ada Lovelace (AD102, 2022). Two full generations of GPU design separate them.
Practical differences for LLM inference:
- The 4090 has 4th-gen Tensor Cores with FP8 support. The 3090 has 3rd-gen with FP16/INT8. This matters for quantized inference.
- The 4090 has improved Flash Attention hardware support, which helps with longer context lengths.
- Memory bandwidth: 3090 has 936 GB/s. 4090 has 1,008 GB/s. The difference is real but not enormous.
- Raw inference speed: The 4090 is roughly 35–45% faster for tokens/second on equivalent models at equivalent quantization.
A 3090 running Llama 3 8B Q4_K_M does ~85–110 tokens/sec. A 4090 on the same model does ~120–150 tokens/sec. That's the gap in practical terms.
The Real Question: Is 40% Faster Worth the Price Premium?
This depends entirely on how you use the machine.
Get the RTX 3090 if:
- You're experimenting or building your first serious rig
- Your primary use is running medium models (7B–34B) where both cards are more than fast enough
- Budget matters and you'd rather put the $600–900 savings elsewhere (more NVMe, a UPS, a second 3090 later)
- You're planning to dual-GPU anyway — two 3090s with NVLink gives you 48GB and NVLink bandwidth for the same price as a single 4090
- You're okay with slightly slower performance on 70B models
Get the RTX 4090 if:
- You use this machine daily for work and faster inference directly affects productivity
- You're doing anything with long context (>8k tokens) where the 4090's architecture advantages compound
- You want the best single-GPU inference performance available on the consumer market
- You don't want the complexity of a dual-GPU setup and want to stay single card as long as possible
- You run fine-tuning experiments, where the 4090's training throughput is meaningfully better
Red Flags When Buying Used
For both cards:
- Fan bearing noise — listen to the card under load before committing to a purchase. Failing fans are a $50–100 repair.
- Thermal pads dried out from years of running hot. Expect to repaste any used high-end GPU.
- Bent PCIe power connector pins — inspect before plugging in.
- GPU sag damage on the PCIe slot — look at the card at an angle to see if the connector is bent.
RTX 3090 specific:
- VRAM coil whine under load is more common on the 3090 due to its power delivery design
- The founders edition 3090 runs hot and loud under sustained load — AIB cooler variants are preferable for sustained workloads
- Check for signs of mining use: sustained high GPU load for months shows up as worn thermal pads and fan wear
RTX 4090 specific:
- The 16-pin "melting connector" issue from 2022–2023. Most affected cards have either been repaired or replaced by now, but inspect the 16-pin connector for any discoloration or deformation.
- Ask the seller if the adapter (if included) has been replaced with the updated version
- Verify the card has its full 450W power limit — some firmware-modified cards have reduced limits
Where to Buy
- eBay: Largest selection. Use filters for "sold" listings to check actual transaction prices vs asking prices. Buy from sellers with 98%+ feedback. Stick to listings with return policies.
- Facebook Marketplace: Lower prices but no buyer protection. Best for local pickup where you can test before paying.
- r/hardwareswap: Reddit's GPU trading community. Generally honest sellers. Check flair for verified traders.
- Local Craigslist: Hit or miss. Bring a laptop to test before paying cash.
The Short Version
If you're building a first serious LLM rig and budget matters: RTX 3090. You're not leaving money on the table — you're getting 24GB of VRAM that handles everything up to 70B models, and you can always upgrade later.
If you've already been doing this and you want the best single-GPU local inference experience money can buy on the consumer market: RTX 4090. It's worth the premium if you're using it constantly.