CraftRigs
Technical Report

Don't Wait for RTX 50 Super — Buy the 5070 Ti Now

By Charlotte Stewart 9 min read

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.

If you've been waiting for RTX 50 Super to drop and make your local AI dreams affordable, I have bad news: it's not coming. NVIDIA indefinitely delayed it. RTX 60 won't arrive before late 2028 at the earliest.

This is not a rumor. This is not speculation. This is the current state of NVIDIA's roadmap, and it means you have a choice: sit tight and wait 18+ months for a GPU that might not exist in its promised form, or buy the RTX 5070 Ti today and start running local AI right now.

The cost of waiting is much higher than you think.

RTX 50 Super: Cancelled, Delayed, or Just Forgotten?

Let's be precise about what happened. NVIDIA never made an official statement saying "RTX 50 Super is cancelled." What happened instead is that industry reports — sourced from AIC (authorized independent component) partner communications and chipset leaks — revealed that RTX 50 Super was "put on hold indefinitely" back in early 2026. The card simply dropped from NVIDIA's public roadmap.

Initial leaked timelines from mid-2025 promised a mid-2026 RTX 50 Super refresh. That date has come and gone. New roadmaps, obtained by VideoCardz and other hardware analysts in February 2026, show no RTX 50 Super at all. Instead, the next major architecture is Rubin — which powers RTX 60 — and that's slated for H2 2027 at absolute earliest, with late 2028 increasingly likely based on memory shortage constraints.

Why would NVIDIA skip an entire generational refresh? Simple: the RTX 5070 Ti already owns the mid-range market. A $600-700 RTX 50 Super would cannibalize RTX 5070 Ti sales. RTX 60 won't arrive for another year-and-a-half, so NVIDIA decided to milk the RTX 50 series longer and skip the "Super" tier entirely.

CraftRigs take: RTX 50 Super was a product that made sense on paper — a mid-cycle refresh to fill the price gap. But that gap filled itself with the RTX 5070 Ti. There's no void to fill, so the product died before it was born.

RTX 60 Won't Save You — And Might Not Even Exist as Planned

The tempting story is "RTX 60 will be a massive jump, so waiting is smart." The reality is messier.

RTX 60 will use the Rubin architecture — not Blackwell. Blackwell IS the RTX 50 series, and it delivered roughly 30% shader performance per watt over Ada (RTX 40). But Rubin? NVIDIA has released zero architectural details. Zero performance targets. Zero public roadmap commit.

Here's what we know: NVIDIA historically refreshes GPU architectures every 18-24 months. RTX 50 (Blackwell) launched in January 2025. By that math, RTX 60 should arrive in mid-to-late 2026 or early 2027. But memory supply constraints — DRAM/HBM shortages that hit the entire industry in 2025-2026 — have pushed that timeline to H2 2027 or even 2028.

Warning

Every month you wait for RTX 60 is a month you're NOT running local Llama 3.1, NOT fine-tuning your own models, and NOT saving money on API calls. Opportunity cost is real.

Even if RTX 60 arrives in late 2027 and delivers a 30% performance boost over RTX 5070 Ti, that speedup doesn't unlock new use cases. You don't wait 18 months for "faster inference." You wait for architectural changes that enable something genuinely new — like new quantization methods, better low-precision support, or architectural features that reduce power draw.

Rubin hasn't been announced with any of those features. So what, exactly, are you waiting for?

The Math: 12-18 Months of Lost ROI

Let me show you the real cost of waiting.

You're deciding between two paths:

Path A: Buy RTX 5070 Ti today ($749 MSRP, $855-1,200 street price)

  • Month 1: Learn Ollama, run Llama 3.1 8B, experiment with system prompts
  • Month 2-3: Graduate to 13-14B models, integrate with your workflow (RAG pipelines, code assist)
  • Month 4-12: Run production workloads — save $100-200/month in ChatGPT API costs
  • Month 13-18: You've saved $1,200-2,400 on API calls alone, plus learned local LLM engineering

Path B: Wait for RTX 60 (late 2027 or 2028)

  • Year 1: No local AI. ChatGPT bills keep coming. No learning. Watching the hype cycle.
  • Late 2027: RTX 60 launches (maybe). Buy it. Start learning.
  • You've lost 12-18 months of ROI and are now 18 months behind on LLM engineering knowledge

The financial math: 12 months × $150/month API savings = $1,800 opportunity cost. Add in the 12 months of learning you've foregone, and the real cost of waiting approaches $2,500-3,000 in actual + opportunity cost.

That's almost 4x the cost difference between today's RTX 5070 Ti and where RTX 60 will probably cost.

RTX 5070 Ti vs. RTX 5070: Which One Bridges the Gap?

If you're buying today, the choice is simple: RTX 5070 Ti or RTX 5070.

RTX 5070 Ti ($749 MSRP, currently $855-1,200 street price, 16 GB VRAM):

  • Handles Llama 3.1 30B models at 18-20 tokens/second
  • Runs Qwen 14B at ~35 tok/s
  • Can run Mistral 7B at 40+ tok/s
  • Llama 3.1 70B at Q5 quantization runs with system RAM assist (~3-5 tok/s due to offloading)
  • Future-proof for 2-3 years of model development

RTX 5070 ($549 MSRP, $680-850 street price, 12 GB VRAM):

  • Sweet spot for 30B models and below
  • Mistral 7B, Qwen 7B, Llama 3.1 8B: all run at 40+ tok/s
  • Llama 3.1 13B hits ~25 tok/s
  • Handles 70B models only if you offload to system RAM (same speed hit as 5070 Ti)

The honest take: If you're running models up to 30B consistently, RTX 5070 is the budget move and it's smart. If you ever want to experiment with 70B models — and you will — RTX 5070 Ti earns its $200 premium by keeping GPU VRAM at capacity without offloading.

Tip

The single best predictor of "which GPU do I need?" is whether you want to run 70B models in GPU VRAM without CPU fallback. If yes: RTX 5070 Ti. If no: RTX 5070.

All benchmarks cited above are from Ollama + CUDA 12.4 running in April 2026 with standard inference settings (greedy sampling, no batching).

AMD's RX 9070 XT: The Smarter Buy by Value (If You're Comfortable with Risk)

Here's where this gets interesting.

AMD's RX 9070 XT hits $599 MSRP with 16 GB VRAM — that's $150 cheaper than RTX 5070 Ti's MSRP while matching the core spec (16 GB). Performance-wise, GamersNexus benchmarks show RX 9070 XT at roughly 95% of RTX 5070 Ti's rasterization performance. For LLM inference workloads specifically, the cards trade blows depending on the model.

The catch: ROCm support (AMD's compute platform) is still experimental as of April 2026. Ollama supports AMDGPU, yes, but users report frequent initialization failures, hanging processes during ROCm driver discovery, and general instability on Linux. Windows support is somewhat more stable but not production-ready.

CraftRigs take: If you're on Windows and comfortable potentially tweaking driver settings, RX 9070 XT is the smarter financial choice — same performance, $150 cheaper, better value trajectory as AMD matures ROCm. If you're on Linux or need guaranteed stability, stick with RTX 5070 Ti.

The risk calculus: RX 9070 XT might be the GPU of the future, but RTX 5070 Ti is the GPU that works today. Choose based on your tolerance for beta software.

Should You Wait? The Decision Tree

Here's the filter:

Wait for RTX 60 (2028) if:

  • You can genuinely wait until late 2027 or 2028 without FOMO
  • Your primary use case for local AI is still hypothetical (you haven't built yet)
  • You have no cash flow pressure to deploy local LLM systems today
  • You're cool with 18+ months of opportunity cost

Buy RTX 5070 Ti today if:

  • You want to run Llama 3.1 models in the next month
  • You're tired of ChatGPT API bills and want to save money immediately
  • You have a specific local LLM use case (RAG pipeline, code assist, fine-tuning research)
  • You want to learn how this stuff actually works before RTX 60 lands

There is no middle ground. A 3-4 month wait for some hypothetical mid-year refresh? That's leaving money on the table. An 18-month wait for a real architectural refresh? That's a defensible decision if you can actually commit to waiting.

Note

Waiting 6 months hoping for a price drop is the worst option. Street prices won't move more than 5-10%. Waiting 18 months for RTX 60 is defensible. Waiting 6 months for nothing is purgatory.

The Verdict: Buy Now, Upgrade in 2029 If You Want

RTX 5070 Ti isn't perfect. It costs more than you'd like. It can't run Llama 3.1 70B in GPU VRAM without offloading. It's not the Goldilocks GPU that handles everything at perfect speed.

But it does something RTX 60 cannot do: it exists right now, it costs less than waiting, and it will run 30B models beautifully for the next three years.

Here's what I'd do: Buy the RTX 5070 Ti today. Use it hard for 12-18 months. Run Llama 3.1 14B for code assist. Build a RAG pipeline. Fine-tune a smaller model on your proprietary data. Save $1,500 on API bills.

When RTX 60 arrives in late 2027 or 2028 — IF it's everything NVIDIA promises — you'll have 18 months of ROI already in the bank. At that point, upgrading is a choice, not desperation. You'll know exactly what you need from the new GPU because you'll have lived with the old one.

RTX 50 Super will never exist. RTX 60 might not arrive for two years. But RTX 5070 Ti is here, it's solid, and waiting for something better costs you more than the upgrade ever will.

FAQ

Will RTX 5070 Ti prices drop when RTX 60 launches?

Probably 5-10% lower, but not a crash. The 5070 Ti will still be a capable mid-range card. It didn't crater when previous generations arrived, and it won't this time. Plan for modest depreciation, not a fire sale.

Can I actually run Llama 3.1 70B on RTX 5070 Ti?

Technically yes, but with caveats. Llama 3.1 70B at Q4 quantization needs ~35 GB of VRAM to stay fully on GPU. The 5070 Ti has 16 GB, so you're offloading to system RAM, which drops throughput to 3-5 tokens/second instead of the 15-18 tok/s you'd get with a 5090 or dual-GPU setup. If you need to run 70B frequently, RTX 5070 Ti is a compromise, not a solution. If you want to experiment occasionally, it works.

Should I buy the RTX 5070 instead of the 5070 Ti?

Yes, if your ceiling is 30B models. The RTX 5070's $549 MSRP and 12 GB VRAM are perfect for Mistral 7B, Qwen 14B, Llama 3.1 13B workloads. You lose 4 GB of VRAM and pay less for it. The only reason to jump to 5070 Ti is if you want 70B models in VRAM without CPU fallback.

Is RX 9070 XT more stable now?

Not yet. As of April 2026, ROCm support is still in beta territory. Ollama integration is improving but users report periodic hangs, driver initialization failures, and general flakiness on Linux. Windows is more stable but still experimental. Give it 6-12 months for production stability.

What if I wait 3-4 months and RTX 60 actually launches?

It won't. Every leaked roadmap shows Q4 2027 at earliest, with 2028 increasingly likely. A 3-4 month wait is gambling with known odds — you're betting on a timeline that's already been broken multiple times. Waiting 18 months for a real architectural refresh is sensible. Waiting a few months for hope is wasting opportunity cost.

Will my RTX 5070 Ti become obsolete when RTX 60 lands?

No. Historical precedent: RTX 40 series cards are still solid in 2026, and RTX 50 outpaced them by "only" 30%. RTX 5070 Ti will still run Llama 3.1 30B beautifully in 2028. You're not future-proofing with this card; you're present-enabling. The upgrade path is optional, not mandatory.


The bottom line: Don't wait for a product that doesn't exist. RTX 50 Super is gone. RTX 60 is 18+ months away. The RTX 5070 Ti is here, it's competent, and the opportunity cost of waiting outweighs the potential gain.

Buy it. Use it. Upgrade in 2029 if you want. You'll thank yourself for starting the local AI journey now instead of watching from the sidelines for another year-and-a-half.

nvidia gpu-news local-llm buying-guide rtx-5070

Technical Intelligence, Weekly.

Access our longitudinal study of hardware performance and architectural optimization benchmarks.