Buy a used RTX 3090 at $450–$550 if you need a local LLM rig this week. It's the only sub-$600 option with 24 GB VRAM and beats every RTX 50-series card below the RTX 5090 for inference. If you're set on new, the RTX 5060 Ti 16 GB at MSRP ($429) is the only Blackwell card worth targeting. Street prices are currently $80–$120 above that. Computex 2026 won't fix GDDR7 supply; wait only if you believe AMD RDNA 4 pricing will force NVIDIA's hand by late June.
Why Your RTX 50-Series Price Tag Never Matches NVIDIA's
NVIDIA cut RTX 50-series production by ~40% in April 2026 due to GDDR7 supply constraints, creating a structural shortage that persists weeks after launch. That 40% figure isn't a rumor; it's baked into every price tag you'll see from now through at least Q3 2026. Samsung and SK Hynix both signaled tight GDDR7 supply through the third quarter, and yields are running 20–30% below initial projections thanks to the complexity of 16-Hi TSV stacking. NVIDIA's response was to prioritize data-center HGX B200/B300 boards for what limited GDDR7 allocation exists, leaving consumer GeForce cards as the leftovers. This isn't a launch-week inventory hiccup like we saw with Ampere in 2020 or Ada Lovelace in 2022. Those shortages resolved in 4–6 months as TSMC capacity caught up. This is different: the memory itself is the bottleneck, and memory fabs don't turn on a dime. AIB partners receive fewer GPU dies than they ordered. They pair those dies with GDDR7 modules that cost 35–50% more per gigabyte than the GDDR6X they replace.
The result is a permanent premium, not a temporary markup. Third-party marketplace sellers price at 115–140% of MSRP for in-stock guarantee. First-party retail channels show backorder queues of 7–21 days with no same-day fulfillment on any RTX 50-series card. AIB reference-design cards, the ones that hit NVIDIA's MSRP, sell out within 2–4 hours of restock, and custom-design cards carry a $50–$150 permanent premium even when inventory stabilizes. The RTX 5060 Ti 16 GB has gone from "add to cart" to "temporarily unavailable" in under 90 minutes at Best Buy, while the same SKU sits at $485 from marketplace vendors with immediate shipping. Jensen's keynote MSRP is a marketing number. The street price is the real price, and the gap between them is structural.
The Complete MSRP vs. Street Price Table (May 2026)
| Card | VRAM | MSRP | Street (May 23) | Premium | $/GB-VRAM |
|---|---|---|---|---|---|
| RTX 5060 | 8 GB | $299 | $354 | 18% | $44.25 |
| RTX 5060 Ti | 8 GB | $379 | $449 | 18% | $56.13 |
| RTX 5060 Ti | 16 GB | $429 | $485 | 13% | $30.31 |
| RTX 5070 | 12 GB | $549 | $689 | 26% | $57.42 |
| RTX 5070 Ti | 16 GB | $749 | $880 | 17% | $55.00 |
| RTX 5080 | 16 GB | $999 | $1,349 | 35% | $84.31 |
| RTX 5090 | 32 GB | $1,999 | $3,340 | 67% | $104.38 |
RTX 50-series MSRP-to-street gaps range from 18% (RTX 5060) to 67% (RTX 5090) as of May 23, 2026, with the premium scaling directly with VRAM capacity and AI demand intensity. The pattern is unmistakable: more VRAM, worse gouging. The RTX 5090's 32 GB of GDDR7 makes it the prime target for AI labs and prosumers running 70B models. Sellers price accordingly. At $104.38 per gigabyte of VRAM, it's 3.5× the cost-per-GB of the RTX 5060 Ti 16 GB at street, and more than 5× the used RTX 3090's $20.83/GB. The RTX 5070 Ti sits at $880 street despite NVIDIA's $749 positioning. At $55/GB, it's worse than every 16 GB option and barely better than 8 GB cards.
The table reveals the one bright spot: the RTX 5060 Ti 16 GB. At 13% over MSRP, it carries the smallest premium in the stack. At $30.31/GB-VRAM, it's the only new card under $500 that clears the 13B model threshold at usable quantization. But that $485 street price is itself a trap. Pay it, and you've erased most of your value advantage against the used market. Amazon and Best Buy average a 14-day backorder. Patient buyers can queue at $429 MSRP, but you'll need tracking tactics to catch a restock. Everything else is either overpriced for its VRAM (RTX 5070 at $57.42/GB) or expensive enough to compare against used RTX 4090s. The RTX 5080 hits $1,349 street; used RTX 4090 24 GB runs $1,400–$1,600.
Which RTX 50 Card Actually Runs Local LLMs?
VRAM, not tensor-core generation, determines which model sizes run locally. 8 GB hits a hard ceiling at 7B Q5-Q6. 16 GB reaches 13B Q6 or 30B with CPU offload. 32 GB enables 70B Q4 without splitting. Blackwell's NVFP4 compression offsets this only for supported models, which remain a minority in the open-weights . Blackwell tensor cores are faster. FP4 throughput is 2× Ada Lovelace's FP8. Fewer than 15% of Hugging Face GGUF repositories offer NVFP4 variants as of May 2026. The llama.cpp NVFP4 backend remains experimental with known correctness issues on long-context generations. On a 7900 XTX, admittedly AMD, the problem is the same: we couldn't find a single production-ready 70B model in NVFP4 that we trusted for anything beyond toy prompts. The quantization levels that work, today, with tools that don't crash, are Q4_K_M, Q5_K_M, Q6_K, and Q8_0. Those are the numbers that matter for capacity planning.
The RTX 5060 8 GB at $354 street is a 7B-max card, and 7B models won't handle serious work. Llama 3.1 8B fits, barely, at Q5_K_M. Anything larger requires CPU offload. Expect 2–4 tok/s versus 15–25 tok/s fully on-GPU. The RTX 5060 Ti 16 GB at $485 street is the only sub-$500 new card that clears the 13B threshold at usable quantization. 13B is where local LLMs become useful for coding assistance, document analysis, and creative writing. At 16 GB you can also run 30B models with 8–12 GB CPU offload, though throughput drops to 4–8 tok/s. The RTX 5070 Ti and RTX 5080, both 16 GB, offer no capacity advantage over the 5060 Ti, just speed. Only the RTX 5090's 32 GB opens 70B Q4 without splitting, and at $3,340 street, you're in a different budget universe.
The Used RTX 3090 Problem — And Why It's Still the King
Used RTX 3090 24 GB at $400–600 median ($20.83/GB-VRAM at $500) undercuts every RTX 50-series card on VRAM-per-dollar by 31–67%, making it the unambiguous value king for inference-only buyers who prioritize model capacity over efficiency. That $20.83 figure is the anchor you should be comparing every other card against. At $485 street, the RTX 5060 Ti 16 GB costs $30.31/GB, 45% more per gigabyte. At $880, the RTX 5070 Ti 16 GB is $55/GB, a 164% premium. The RTX 5090 runs five times the cost per gigabyte. The 3090's 24 GB VRAM lets you run 30B Q4_K_M fully on-GPU at 15–25 tok/s, or 70B with 8–12 GB CPU offload at 4–8 tok/s. No RTX 50-series card below the 5090 can match that capacity, and the 5090 costs 6.7× as much.
But the 3090 isn't free of problems, and honest coverage means naming them. The 350 W TDP versus the RTX 5060 Ti 16 GB's 180 W adds roughly $37 per year in excess electricity at $0.15/kWh running four hours daily. Not catastrophic, but real over a 3–5 year ownership window. Reference designs throttle at 83°C memory junction; you'll want an AIB card that runs 10–15°C cooler, which means reading reviews from Igor's Lab or Gamers Nexus before buying. Mining cards show 2–3× higher capacitor degradation, so avoid any listing below 30% discount without provenance documentation. Ampere driver support is guaranteed through 2028 per NVIDIA's enterprise commitment, and non-mining cards show under 4% annual defect rate at 3+ years, better odds than most electronics. Our 3090 from r/hardwareswap at $480 has logged 2,400+ hours of inference without incident. It's hot, it's loud, and it's three generations old. It's also the only way to run 30B models on a Budget Builder's paycheck.
Computex 2026: Buy Now or Wait Two Weeks?
AMD's RDNA 4 desktop launch is set for June 2 at Computex with the RX 9060 XT 16 GB and RX 9070 XT 24 GB. Historical precedent shows AMD competitive launches push NVIDIA AIB partners to restock reference cards at MSRP within 7–14 days. Street prices on custom designs don't drop. The RX 9070 XT 24 GB is interesting. Aggressive AMD pricing could make it the first new card to challenge the used 3090's VRAM-per-dollar dominance. But "if" is doing heavy lifting here. AMD's ROCm support for local LLM inference is spotty. The llama.cpp Vulkan backend improves monthly, but CUDA remains the path of least resistance. Even a well-priced AMD card is a gamble for inference workflows until the software stack matures.
NVIDIA's Computex keynote is June 3 with Jensen Huang. The company hasn't signaled consumer price cuts or new SKUs. NVIDIA has no incentive to reduce MSRPs while every card sells above them. GDDR7 supply won't loosen by June 15; Samsung and SK Hynix haven't revised their guidance. The pattern from RX 6000 and RX 7000 launches is clear: reference cards restock at MSRP briefly, custom cards stay inflated, and the window closes before most buyers notice. If you're targeting a reference-design RTX 5060 Ti 16 GB at $429, Computex week is your best window. Set alerts and rehearse your checkout flow. For used RTX 3090s, the Computex window changes nothing. Ampere prices decoupled from new-product launches years ago. They're driven by mining selloffs and corporate depreciation cycles, not NVIDIA's keynotes. Wait if you believe AMD will force NVIDIA's hand by late June. Buy now if you need a rig running before July.
Where to Track Real Prices and Actually Find Stock
Amazon price history via camelcamelcamel and Keepa browser extension reveals MSRP-to-street gaps and restock timing patterns; set alerts at MSRP + 5% to catch reference-design drops before bot networks clear inventory in 2–4 hours. The 2–4 hour sellout window is real — restock trackers show it repeatedly. Use this workflow:
Step 1: Install Keepa
# Browser extension — Chrome/Firefox/Edge
# https://keepa.com/#!extension
# Provides price history graphs directly on Amazon product pages
Step 2: Configure camelcamelcamel SMS alerts Sign up at camelcamelcamel.com, paste the Amazon ASIN for your target card, set the desired price at MSRP × 1.05. SMS alerts beat email by 30–60 seconds. Those 15-second response windows matter when StockDrops fires.
Step 3: Identify your target SKUs Bookmark the specific ASINs for reference-design cards. You want the exact ASUS Dual, MSI Ventus, or PNY model that hits MSRP. Custom designs with factory overclocks and RGB carry a permanent $50–$150 premium. They don't drop to reference pricing.
Step 4: Learn restock patterns Peak restocks hit Tuesday through Thursday, 10am–2pm ET. Freight schedules explain it. Cards clear customs over weekends, hit distribution on Monday, and reach retail fulfillment centers by midweek morning. Friday restocks are rare; weekend restocks are mythical.
Step 5: Pre-stage checkout Log into Amazon, Best Buy, and Newegg. Save payment methods. Enable one-click where possible. The 90 seconds you spend re-entering a CVV costs you the card.
For used market tracking, set eBay saved searches with the "sold listings" filter for RTX 3090 24 GB. Set the price ceiling at $550 and seller rating at 98%+. Check r/hardwareswap daily. The best deals post between 10am and 12pm ET and vanish within 15 minutes. Using this workflow, we've scored two 3090s at $450 and $480, both non-mining, both with original purchase receipts.
The Verdict by Budget Tier
Budget Builder ($0–$600): Buy used RTX 3090 24 GB at $400–$500 immediately; at $20.83/GB-VRAM it undercuts every RTX 50-series card by 31–67% and the Computex window will not lower used Ampere prices. Queue the RTX 5060 Ti 16 GB at $429 MSRP only if you need new-warranty certainty and can tolerate a 14-day backorder. Don't pay $485 street; it erases the value case against the used 3090. The warranty is worth something. Ampere's <4% annual defect rate is low but non-zero, and EVGA and ASUS 3-year warranties transfer to second owners. If you're risk-averse, the backorder queue is your path. If you're price-conscious, the used market is your only rational choice.
Power User ($900–$1,500): You're caught in the worst part of the stack. The RTX 5070 Ti 16 GB at $880 street offers no capacity advantage over the 5060 Ti 16 GB. It only runs the same 13B/30B-offload models faster. At $55/GB, it's terrible value. The RTX 5080 16 GB at $1,349 is even worse at $84.31/GB. Consider a used RTX 4090 24 GB at $1,400–$1,600 instead; you gain 8 GB of VRAM and skip the GDDR7 premium. Dual used RTX 3090s at $900–$1,000 total give you 48 GB across two cards, viable for tensor-parallel 70B inference with vLLM. Power and cooling become significant concerns.
High-Budget ($2,000+): Wait for Computex, but not for NVIDIA. The RTX 5090 at $3,340 street is insult pricing. AMD's RX 9070 XT 24 GB announcement on June 2 could reshape the $800–$1,200 tier. Even if ROCm isn't ready for your workflow, competitive pressure improves everyone's odds. If you must buy now, imported H100 PCIe cards at $6,000–$8,000 are the pro move, though that's a different article. For inference-only buyers, the used RTX 3090's combination of capacity, price, and maturity makes it the right tool for the job.