CraftRigs
Architecture Guide

How Fast Does NVMe Speed Actually Affect LLM Load Times? Benchmarks

By Georgia Thomas 3 min read

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.

TL;DR: PCIe 3.0 to 4.0 is a meaningful upgrade for model loading — roughly 30-40% faster load times. PCIe 4.0 to 5.0 barely matters — under 10% improvement in real-world testing. PCIe 4.0 is the sweet spot. Don't spend extra on Gen 5 for LLM work.

The Test Setup

We tested model loading times using llama.cpp on the same system, swapping only the NVMe drive. All models in GGUF format at Q4_K_M quantization. Benchmarks as of March 2026.

System: Ryzen 7 7800X3D, 64GB DDR5-6000, RTX 4090 24GB

Drives tested:

  • PCIe 3.0: Intel 670p 2TB (~3,500 MB/s sequential read)
  • PCIe 4.0: Samsung 990 Pro 2TB (~7,450 MB/s sequential read)
  • PCIe 5.0: Crucial T705 2TB (~14,500 MB/s sequential read)

Each model was loaded from a cold start (no OS caching) five times, results averaged.

The Results

Llama 3 8B (Q4_K_M) — 4.9GB file:

  • PCIe 3.0: 3.2 seconds
  • PCIe 4.0: 2.1 seconds
  • PCIe 5.0: 1.9 seconds

Llama 3 13B (Q4_K_M) — 7.9GB file:

  • PCIe 3.0: 5.1 seconds
  • PCIe 4.0: 3.3 seconds
  • PCIe 5.0: 3.0 seconds

DeepSeek 33B (Q4_K_M) — 19.6GB file:

  • PCIe 3.0: 11.8 seconds
  • PCIe 4.0: 7.4 seconds
  • PCIe 5.0: 6.8 seconds

Llama 3 70B (Q4_K_M) — 40.5GB file:

  • PCIe 3.0: 24.1 seconds
  • PCIe 4.0: 15.2 seconds
  • PCIe 5.0: 13.8 seconds

What the Numbers Mean

The jump from PCIe 3.0 to 4.0 is real and consistent — about 35-37% faster across all model sizes. If you're still on a Gen 3 drive, upgrading makes a noticeable difference, especially with larger models.

The jump from PCIe 4.0 to 5.0 is disappointing. Despite double the rated sequential read speed, real-world model loading only improves by 8-10%. Why? Because llama.cpp (and most inference engines) don't do pure sequential reads when loading models. There's metadata parsing, weight rearrangement, memory allocation, and VRAM transfer overhead that bottleneck before raw drive speed becomes the limiting factor.

For a 70B model, the difference between Gen 4 and Gen 5 is about 1.4 seconds. That's not worth the $70+ price premium on the drive.

The Threshold

PCIe 4.0 is where faster storage stops meaningfully improving LLM load times. Beyond that, the bottleneck shifts to:

  1. Memory mapping overhead in the inference engine
  2. PCIe bus transfer to GPU VRAM
  3. Model initialization and warmup

No amount of storage speed fixes those. Your money is better spent on the components that actually affect inference performance — GPU VRAM and compute.

When Storage Speed Matters (and Doesn't)

It matters if you:

  • Swap between models frequently (testing, comparing, serving multiple models)
  • Run a model server that cold-starts models on demand
  • Are still on a PCIe 3.0 or SATA SSD

It doesn't matter if you:

  • Load one model and run it all day (load once, forget about it)
  • Already have a PCIe 4.0 drive
  • Are deciding between a faster SSD and more VRAM (always pick VRAM)

The Bottom Line

Upgrade from Gen 3 to Gen 4 if you haven't already — the Samsung 990 Pro or WD Black SN850X at around $150 for 2TB is the right call. For the complete storage breakdown including capacity and endurance recommendations, see our NVMe SSD guide.

Skip Gen 5 entirely for LLM workloads. That $70+ premium buys you 1-2 seconds of load time improvement. Put it toward your GPU budget instead.

For how storage fits into a complete LLM build, check our ultimate hardware guide.


nvme benchmarks model-loading pcie storage-speed

Technical Intelligence, Weekly.

Access our longitudinal study of hardware performance and architectural optimization benchmarks.