CraftRigs
CraftRigs / Glossary / Dense
Models & Quantization

Dense

A model architecture where every parameter activates on every token, as opposed to mixture-of-experts designs that only fire a subset.

A dense model is the classic transformer shape: every weight in the network participates in every forward pass. When you load a dense 27B model, you pay for all 27 billion parameters in VRAM and you pay for all of them on every token you generate. There are no shortcuts, no routing, no experts sitting idle.

Dense vs MoE

The contrast that matters in 2026 is dense versus MoE. MoE models have a much larger total parameter count but only activate a fraction per token, which makes inference faster than a dense model of equivalent size — assuming you have the hardware to hold the whole thing. Dense models are simpler: parameter count maps cleanly to VRAM footprint and to compute per token. Qwen3.5's compact lineup (0.8B through 9B) is fully dense and Apache 2.0, while frontier MoE releases like MiniMax M25 demand multi-GPU rigs to even load.

VRAM Footprint Is Predictable

Because every weight is live, dense model parameters translate directly into memory budget through your chosen quantization. Qwen 3.5 27B Dense at Q4_K_M weighs 16.7GB — it loads completely into a 24GB RTX 3090 with room for KV-cache, but it will not fit in a 16GB card without VRAM offloading layers to system RAM, which craters tokens per second. There is no middle ground with dense: the model fits or it doesn't.

Why It Matters for Local AI

Dense models are still the default recommendation for single-GPU local rigs because their VRAM math is honest. You pick a quant, you check the file size, you know whether it loads. The flip side is the ceiling — once you outgrow what one GPU can hold, dense scaling gets brutal fast, which is exactly where MoE architectures start to look attractive despite their multi-GPU hardware bar.