WSL2 — Local AI Glossary | CraftRigs

WSL2 is Microsoft's second-generation Windows Subsystem for Linux, running a full Linux kernel inside a managed Hyper-V VM. For local AI builders, it's the bridge that lets a Windows desktop run Linux-first inference stacks — CUDA toolkits, ROCm builds, and Linux-only llama.cpp features — without giving up the Windows host.

How It Works for GPU Workloads

WSL2 exposes the host GPU through a paravirtualized driver. NVIDIA cards work via the standard Windows driver plus the CUDA-on-WSL stack — no separate Linux driver install. AMD support runs through ROCm on WSL2 and is functional but rougher around the edges than CUDA: most things work, edge cases break, and Ollama handles it natively while custom builds may need patching. This is a hard upgrade over WSL1, which had no real GPU passthrough at all.

Memory and Performance Tradeoffs

The Hyper-V VM keeps its own memory pool, which is the source of most pain. Windows can fragment VRAM badly when desktop apps, browsers, and the WSL2 GPU process all compete — a common cause of CUDA out-of-memory errors on otherwise-sized-correctly rigs running 27–35B models. Fixes include capping WSL2 RAM in .wslconfig, restarting the WSL service before large model loads, and loading the LLM first before opening other GPU consumers. Disk I/O across the Windows/Linux filesystem boundary is slow; keep models on the Linux side (/home/...) rather than /mnt/c/.

When It's the Right Choice

WSL2 is the right call when you want Windows for daily driving but Linux for inference — gaming rig by day, LLM host by night. NVIDIA users get near-native CUDA performance with minor overhead. AMD users on RDNA 3/4 cards like the RX 9070 XT can run local LLMs through WSL2 + ROCm, but native Linux remains the smoother path; Windows-only AMD setups face the most setup friction. For LM Studio and similar Windows-native apps, WSL2 isn't needed — it only enters the picture when you reach for Linux-only tooling.

Why It Matters for Local AI

Most cutting-edge inference work — new quant formats, custom kernels, multi-GPU experiments — lands on Linux first. WSL2 gives Windows builders access to that ecosystem without a second boot drive, but it adds a memory-management layer that's the single most common cause of OOM crashes on Windows local-LLM rigs.