Does the CPU matter for local LLM inference speed?

For models that fit entirely in GPU VRAM, the CPU is almost irrelevant — inference speed is set by VRAM bandwidth. The CPU becomes a significant bottleneck only when CPU offloading is used (model too large for VRAM), where more cores and DDR5 bandwidth directly affect tokens per second.

What CPU is best for running 70B models with CPU offloading?

High core count and fast DDR5 matter most. The Ryzen 9 7950X and 9950X are strong picks — 16 cores and support for DDR5-6000 dual-channel. They offer roughly 90 GB/s of memory bandwidth, which directly feeds the offloaded layers. Budget alternatives like the Ryzen 9 7900X still perform well.

How many PCIe lanes do I need for a dual-GPU LLM build?

A dual-GPU setup needs at least x8/x8 electrical connectivity, which requires 24+ PCIe lanes from the CPU or platform. Consumer Intel Core Ultra and AMD Ryzen 9000 on X670E both provide 24–28 lanes — enough for two GPUs at x8/x8. For x16/x16 or three-GPU setups, you need Threadripper.

Is Threadripper worth it for a local LLM build?

Only if you're running three or more GPUs, or need guaranteed x16/x16 bandwidth for two GPUs. For single or dual GPU builds, a mainstream Ryzen 9 or Core Ultra 9 platform is more cost-effective. Threadripper systems carry a significant platform cost premium.

Best CPU for Local LLMs 2026: Ryzen vs Intel vs Cache [2026]

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.

TL;DR: For single-GPU inference, almost any modern CPU works — the CPU is not the bottleneck. For CPU offloading (running 70B+ models on underpowered GPU setups), you want high core count and fast DDR5: the Ryzen 9 7950X or 9950X are the sweet spots. For multi-GPU builds, choose a platform with enough PCIe lanes: Ryzen 9000 series on X670E or Intel Core Ultra 9 for mainstream, Threadripper 9000 series for maximum lane count.

The Honest Answer Nobody Wants to Hear

The CPU is the least important component in a local LLM rig. Almost everything that determines your inference speed comes from your GPU — specifically VRAM size and memory bandwidth. The CPU sits in the background managing the OS, loading files from disk, and handling whatever tiny fraction of work doesn't fit in VRAM.

That said, "least important" isn't "irrelevant." The CPU matters in three specific scenarios: CPU offloading, multi-GPU lane distribution, and how fast models load from disk. Get those three things right and you can stop overthinking the CPU decision.

When the CPU Actually Matters

Scenario 1: CPU Offloading

If your GPU VRAM can't hold the full model, the inference engine splits layers between GPU and CPU. Every layer on the CPU runs at system RAM bandwidth instead of VRAM bandwidth — and your RAM bandwidth is determined by your CPU's memory controller.

A Ryzen 9 7950X running DDR5-6000 in dual-channel gets roughly 90 GB/s of memory bandwidth. An older i7-12700K on DDR4-3200 gets around 51 GB/s. When you're offloading 20 layers to CPU on a 70B model, that 40 GB/s difference translates directly to maybe 2-4 extra tokens per second. Not nothing.

Core count also matters here because llama.cpp and Ollama spread CPU computation across cores. A 16-core CPU running offloaded layers is genuinely faster than an 8-core at the same clock speed.

Scenario 2: PCIe Lane Count

Each GPU needs PCIe bandwidth to transfer data from system RAM. A dual-GPU build needs at least x8/x8 electrical connectivity — which requires 24+ lanes from the CPU/platform.

Consumer Intel Core Ultra (Arrow Lake) provides 24 CPU lanes. AMD Ryzen 9000 on X670E gives 24-28 CPU lanes. Both are fine for dual x8/x8.

If you want x16/x16 (both GPUs at full bandwidth) or you're running three GPUs, you need HEDT: Threadripper 9000 series with up to 160 PCIe lanes, or used Threadripper PRO 5000 series.

Scenario 3: Model Load Times

Getting a 40GB model from NVMe into VRAM involves the CPU orchestrating the transfer. A faster CPU means slightly faster load times. We're talking 3-8 seconds on a modern system vs 10-15 seconds on a slower one for a 70B model. Meaningful if you're frequently switching models, irrelevant if you load once and leave it running.

Note

In a normal single-GPU inference workflow — model fits in VRAM, running 7B or 13B models — the CPU contribution to tokens-per-second is effectively zero. The GPU is doing all the work. A $150 Ryzen 5 7600 performs identically to a $700 Ryzen 9 9950X in this scenario.

AMD Picks

Budget + Single GPU: AMD Ryzen 5 7600 or 9600

Around $180-200 as of March 2026. 6 cores, DDR5 support, PCIe 5.0 on the top slot. For a single GPU build where everything fits in VRAM, this is genuinely the right answer. Don't spend 4x more on a 16-core chip to get the same inference speed.

The Workhorse: AMD Ryzen 9 7950X or 9950X

Around $450-550 as of March 2026. 16 cores, 32 threads, supports DDR5-6000 comfortably, 24 PCIe lanes. This is the CPU for serious LLM workloads on consumer AM5 platform.

The 7950X and 9950X are effectively equivalent for LLM purposes — the 9950X (Zen 5 architecture) is ~10-15% faster in CPU-bound tasks, which shows up in CPU offloading scenarios. Both handle dual-GPU setups at x8/x8 without issue.

The 7950X frequently goes on sale and is excellent value. If you're price-sensitive and don't need the Zen 5 improvements, grab it.

When You Need Maximum PCIe: Threadripper 9960X or 9970X

The 9960X (24 cores, $1,399) and 9970X (32 cores, $2,499) on the WRX90 platform give you 160 PCIe lanes. That's enough for multiple GPUs at x16/x16, plus NVMe storage, plus networking cards, all without touching chipset lanes.

Tip

Threadripper and EPYC platforms support quad-channel DDR5, which doubles your memory bandwidth compared to dual-channel consumer platforms. For heavy CPU offloading — running 70B models primarily on CPU — this matters significantly. Quad-channel DDR5-6000 approaches 180 GB/s.

This matters for triple-GPU builds, professional inference servers, or anyone who needs maximum CPU offloading throughput. For most users, it's expensive overkill.

Intel Picks

Budget + Single GPU: Intel Core i5-14600K or Core Ultra 5 245K

Around $200-220 as of March 2026. 14 cores (P+E), DDR5 support on 700/800-series boards. Handles single-GPU builds perfectly fine.

The Core Ultra 5 245K (Arrow Lake) is slightly better for LLM workloads specifically — better memory controller, improved efficiency cores that contribute more to parallel workloads.

The Workhorse: Intel Core Ultra 9 285K

Around $500 as of March 2026. 24 cores (8 P + 16 E), DDR5-6400 support, 24 CPU PCIe lanes. Intel's answer to the Ryzen 9 7950X for dual-GPU builds.

The Core Ultra 9 285K is competitive with Ryzen 9 9950X in multi-threaded workloads. For CPU offloading, results are close — AMD's memory controller efficiency gives it a slight edge in practice, but we're talking 5-10% at most.

The Core Ultra 9 285K has one advantage: the broader ecosystem of LGA 1851 boards with specific workstation features like Thunderbolt 4/5 ports, which matters for some workflows.

Caution

Intel Core i9-14900K and i9-13900K have well-documented instability issues at stock clocks on certain boards. If you're building an always-on inference server, these CPUs are higher risk than they look. The Core Ultra 9 285K (Arrow Lake) does not have this issue.

The Platform Decision

AMD AM5 (Ryzen 7000/8000/9000) — best choice for most LLM builds. Excellent memory controller, EXPO support for easy DDR5 overclocking, competitive multi-core performance, and the X670E platform has solid multi-GPU support. Platform will be supported through at least 2027.

Intel LGA 1851 (Core Ultra 200 series) — viable alternative. Good single-core performance, broad board ecosystem, Thunderbolt integration. Slightly behind on memory bandwidth compared to AMD at the same speeds.

AMD TRX50/WRX90 (Threadripper 9000) — for serious multi-GPU or multi-model deployments. High platform cost ($500+ for CPU alone, $700+ for boards) but the lane count and quad-channel bandwidth are unmatched at this price.

What to Actually Buy

Single GPU, model fits in VRAM (the majority of people): Ryzen 5 7600 or 9600. Spend the saved money on GPU or RAM.

Single GPU + heavy CPU offloading (running 70B models on a 24GB GPU): Ryzen 9 7950X — best bang for money on this specific workload.

Dual GPU build: Ryzen 9 9950X or Core Ultra 9 285K. Either works for x8/x8 dual GPU.

Multi-GPU (3+) or production inference server: Threadripper 9960X on WRX90. Accept the platform cost.

Don't let YouTube builds with i9-14900KS and heavy overclocking convince you that more CPU = faster LLMs. That relationship is tenuous at best. The GPU decides your tokens per second. The CPU decides almost nothing — until you're offloading, and then it matters exactly as much as your RAM bandwidth allows.