CraftRigs
Hardware Comparison

FSR 4.1 vs DLSS 4.5: Which GPU Should You Buy If You Want to Game AND Run AI?

By Chloe Smith 6 min read

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.

Everyone arguing about FSR 4.1 vs DLSS 4.5 is having the wrong conversation. The upscaling gap matters, sure, but if you're the kind of person who wants to game at 1440p and run Qwen or Llama locally on the same machine, the real question is $870 more expensive.

That's the street price difference between an RX 9070 XT ($729) and an RTX 4080 Super ($1,597) right now, in March 2026, when GPU prices still haven't come back down to earth. For that gap you could buy a decent monitor, an NVMe drive, and still have money left. So before we get into upscaling frame analysis and token-per-second benchmarks, keep that number in mind.

See also: RX 9070 XT vs RTX 4080 Super →

What We're Actually Comparing

Quick spec reality check, because the marketing numbers obscure the situation:

RX 9070 XT — RDNA 4 (Navi48), TSMC 4nm, 16GB GDDR6, 640 GB/s bandwidth, 304W TDP. Launched March 2025 at $599 MSRP. Good luck finding one at that.

RTX 4080 Super — Ada Lovelace (AD103), TSMC 5nm, 16GB GDDR6X, 736 GB/s bandwidth, 320W TDP. Launched January 2024 at $999 MSRP. Also good luck.

Both cards sit at 16GB VRAM. That's not a coincidence — it's the number that matters most for running local AI models, and both teams know it. The memory bandwidth gap (640 vs 736 GB/s) is real but secondary to what ROCm vs CUDA does to your workflow.

Gaming: The 9070 XT Is Better Than It Has Any Right to Be

In pure rasterization — the thing GPUs have been doing since forever — the 9070 XT punches well above its price. 3DMark puts it at 29,992 vs the 4080 Super's 28,531. The newer, cheaper card beats it in synthetic raster benchmarks.

Real-game numbers tell a more mixed story. At 1440p with no ray tracing, the gap is narrow enough that you'd need a side-by-side to notice. At 4K with full ray tracing or path tracing, the 4080 Super pulls ahead more meaningfully. Cyberpunk 2077 in path-trace mode at 1440p shows the 9070 XT getting around 43 FPS with FSR 4 Quality. Playable. Not comfortable.

Warning

Path tracing is where the 9070 XT's software stack currently lets it down. In Cyberpunk, running the NVIDIA rendering pipeline via OptiScaler produces 58 FPS on the same AMD hardware — a 35% uplift over native FSR 4. AMD's pipeline for heavy RT workloads is just less efficient right now. FSR Redstone (with Radiance Caching) should improve this, but it's not in a released game yet.

So: rasterization gaming at 1440p, the 9070 XT is excellent. Path-traced 4K gaming, the 4080 Super has a real advantage.

FSR 4.1 vs DLSS 4.5: The Honest Assessment

FSR 4.1 shipped (technically leaked, then officially adopted via Adrenalin 26.3.1) in March 2026. The main change was reducing motion blur — a genuine problem in FSR 4.0 where vegetation and fine details smeared during camera movement. In Cyberpunk, Star Wars Outlaws, and Mafia: The Old Country, FSR 4.1 resolves noticeably more detail in Performance mode.

But it traded one problem for another. The algorithm now tries to render fine detail instead of blurring it, which introduces edge flickering and graininess — noticeable in STALKER 2 and AC Shadows. No free lunch.

DLSS 4.5 still leads. Hardware Unboxed's testing put it plainly: "DLSS 4.5 gives the impression that the game is running at a higher resolution." Vegetation, hair, moving objects — NVIDIA's reconstruction holds edges that FSR 4.1 wobbles on.

Note

FSR 4.x is exclusive to RDNA 4 GPUs (RX 9000 series). It does not backport to older Radeon cards. If you're on a 7900 XTX, you get FSR 3 or OptiScaler. DLSS 4.5 runs on all RTX 40 and 50 series cards.

The gap between FSR 4.1 and DLSS 4.5 is real but smaller than it was six months ago. At 1440p Quality mode, most people won't notice in motion. At 4K Performance mode, DLSS 4.5 is still clearly better. If you're already on team AMD, FSR 4.1 is a meaningful upgrade over 4.0. But it hasn't closed the gap.

The AI Side: Where It Actually Gets Complicated

Both cards have 16GB VRAM. For AI inference, VRAM is the primary constraint — if the model fits entirely in VRAM, you get fast inference. If it spills to system RAM, you get a 10x performance cliff.

With a 14B model (Qwen3 14B, Llama 3.1 14B) running via Ollama, the RTX 4080 (non-Super) clocks around 60 tokens/second. The RX 9070 XT lands at 47–49 tokens/second. Call it a 20-25% speed penalty for AMD. Annoying, but workable — both feel responsive for interactive use.

The 30B model is where things fall apart for AMD. The RTX 4080 holds around 39 tokens/second. The RX 9070 XT drops to roughly 11-12 tokens/second. Same VRAM, wildly different results. Why? The NVIDIA side keeps more of the 30B model in VRAM via more efficient quantization handling and CUDA's mature inference stack. The AMD side starts offloading layers to CPU far earlier.

Tip

On a 16GB GPU, the practical sweet spot for local LLMs is 7B–14B models at FP16/Q8, or up to ~24B with aggressive 4-bit quantization. The GPT-OSS 20B model hits around 140 tokens/second on a 16GB RTX 4080 when it fits entirely in VRAM — that's genuinely usable for coding assistance. The 9070 XT hits similar speeds on the same fully-in-VRAM workloads, but its ceiling collapses faster when models start spilling.

For Stable Diffusion and image generation, both cards handle SDXL comfortably at 1024x1024. AMD's DirectML works on Windows but is CPU-intensive — you'll want a reasonably fast processor alongside it. On Linux with ROCm, the experience is much closer to parity.

The Software Gap Is Real, But 2026 Is Different

This is the nuance most comparisons skip. ROCm 7.2.0 dropped in January 2026, and it's genuinely the first release where recommending AMD for local AI on Linux is not a compromise. PyTorch with ROCm support now installs via pip on RDNA 4 cards without driver spoofing or janky patches.

Windows is still messier. DirectML works but requires more CPU headroom. ROCm on Windows is technically possible but not as smooth.

The raw performance gap between CUDA and ROCm is 10-30% depending on the workload, according to Thunder Compute's March 2026 benchmarks. That's down from the 50%+ gap that existed 18 months ago. AMD closed real ground. They didn't close all of it.

If you're on Linux and running 7B–14B models, the 9070 XT is a legitimate choice for AI work. If you're on Windows and doing anything beyond light Ollama inference — fine-tuning, image model training, larger quantized models — CUDA's ecosystem friction advantage is still meaningful.

The $870 Question

At MSRP, this comparison is interesting. At street price, it's almost no contest.

The RTX 4080 Super is sitting at $1,544–$1,597 in March 2026. That's $545–$600 above its $999 launch price, nearly 60% over MSRP. The RX 9070 XT at $729 is $130 over its $599 MSRP — annoying, but nothing like the Nvidia premium.

For gaming alone, paying $870 more for the 4080 Super gets you better path tracing, better DLSS, and better ray tracing. In rasterization — which covers 90%+ of gaming — you're getting similar performance at best.

For the dual-use gaming + AI person, the 9070 XT at $729 runs 14B models at 47-49 tokens/second, games at 1440p with competitive FSR 4.1, and leaves ~$870 in your pocket. The 4080 Super runs 14B models at ~20% faster and games better in RT-heavy titles.

Ask yourself what you actually run. If your "AI use" is Ollama with Qwen3 14B, the 9070 XT is fine. If you're running 30B+ models, doing LoRA fine-tuning, or building applications that depend on CUDA libraries, the 4080 Super's ecosystem advantage starts to justify the premium — but only if you can find one below $1,200, which currently means used market.

The Verdict

Buy the RX 9070 XT if you game at 1440p, run models up to 14B parameters, are on Linux (or patient with Windows DirectML), and aren't willing to pay a $870 premium for marginal improvements.

Buy the RTX 4080 Super only if you find it close to $999 — used market, lucky retail drop, whatever — and you have a specific reason to need CUDA (30B+ inference, training, fine-tuning, PyTorch projects). At $1,597 retail, the 4080 Super is making a promise it can't keep for most dual-use buyers.

FSR 4.1 vs DLSS 4.5 is a real gap. But it's not an $870 gap.

fsr-4-1 dlss-4-5 rx-9070-xt rtx-4080-super gaming local-ai upscaling rdna4 cuda

Technical Intelligence, Weekly.

Access our longitudinal study of hardware performance and architectural optimization benchmarks.