NVLink
A high-bandwidth NVIDIA interconnect that lets two GPUs share data directly, bypassing the slower PCIe bus.
NVLink is NVIDIA's proprietary GPU-to-GPU interconnect, designed to move tensors between cards far faster than PCIe can. For local AI builders, it's the reason a pair of RTX 3090s remains one of the most cost-effective ways to run large dense models that don't fit in a single card's VRAM.
How It Works
NVLink uses a dedicated bridge connector between two compatible cards, creating a point-to-point link that's roughly an order of magnitude faster than a PCIe 4.0 x16 slot. On the RTX 3090 — the last consumer card to support it — this matters most for tensor parallelism, where layers of a model are split across GPUs and activations have to cross between cards on every forward pass. Without NVLink, those transfers fall back to PCIe and the slower link becomes the bottleneck. Newer consumer cards (RTX 4090, RTX 5090) dropped the connector entirely; NVLink now lives only on data-center parts like H100 and B200.
Tradeoffs and Limits
NVLink is a two-card affair on consumer hardware — you can bridge a pair of 3090s, but you can't chain three or four together over NVLink. Once you scale past two cards, every additional GPU communicates over PCIe, and PCIe lane allocation on the motherboard becomes the real constraint. This is why multi-GPU rigs at three or more 3090s often push builders toward Threadripper or EPYC platforms with abundant PCIe lanes, rather than chasing more NVLink bridges. Software support also varies: tensor-parallel frameworks like vLLM and ExLlamaV2 can exploit NVLink, while many single-process llama.cpp workflows don't benefit much because they pipeline layers sequentially.
Why It Matters for Local AI
For a 70B-class dense model in 4-bit, two NVLinked 3090s give you 48 GB of pooled VRAM and enough inter-GPU bandwidth that tensor parallelism actually pays off in tokens per second. Without NVLink, the same two cards still run the model, but you'll feel the PCIe tax on prompt processing and any tensor-parallel decode. It's the single biggest reason used 3090s have held their resale value against newer, NVLink-less GPUs.