Vera Rubin — Local AI Glossary | CraftRigs

Vera Rubin is NVIDIA's post-Blackwell datacenter GPU platform, unveiled at GTC 2026 as the successor to the Hopper generation. It is a hyperscaler product — built for cloud providers and AI labs, not home builders — but its announcement reshapes the buying window for everyone running models locally.

What Vera Rubin Actually Is

Vera Rubin is a rack-scale GPU platform aimed at frontier-model training and inference at hyperscale. Unlike consumer cards, it is sold as integrated systems with proprietary interconnects, liquid cooling, and pricing that makes sense only when amortized across thousands of concurrent users. The platform notably integrates LPU (Language Processing Unit) inference technology from Groq, which NVIDIA acquired for roughly $20 billion in December 2025 — its largest deal ever. Jonathan Ross and Groq's senior leadership joined NVIDIA specifically to fold inference-optimized silicon into Vera Rubin alongside the training-optimized GPU dies.

Why Home Builders Won't Buy One

Vera Rubin parts are not coming to consumer rigs. The chips, the interconnect fabric, and the cooling requirements are all designed around datacenter deployment. What does matter for local builders is the downstream effect: every generational jump at the top pushes the previous generation down the stack. Hopper-class hardware in cloud fleets gets retired or repriced, and consumer Blackwell cards become the practical ceiling for home AI for the next several years. The Groq acquisition also signals that NVIDIA now treats inference as a distinct, high-value market from training — a split that will eventually shape consumer product lines too.

Why It Matters for Local AI

The Vera Rubin announcement is a buying-window signal, not a product to chase. Cloud-side Hopper depreciation and Blackwell mainstreaming mean used H100 and refurbished workstation cards will keep falling in price, while consumer 24GB–32GB VRAM cards stabilize as the sweet spot for running dense 70B-class models at usable tokens per second. Watch the inference-vs-training split — it predicts which consumer features (LPU-style decode acceleration, larger KV-cache headroom) trickle down next.