Bits Per Weight (BPW) — Local AI Glossary | CraftRigs

Every parameter in a language model is a numerical value — a weight that was learned during training. Bits per weight (BPW) describes how many bits are used to store each of those values. Higher BPW means more precise representation, larger file size, more VRAM required, and typically better output quality. Lower BPW means the opposite.

The BPW-to-VRAM Formula

Estimating a model's VRAM footprint from BPW is straightforward:

Model size in GB ≈ (parameter count × BPW) / 8,000,000,000

Examples for a 7B model:

16 BPW (FP16): (7,000,000,000 × 16) / 8B = ~14GB
8 BPW (Q8_0): ~7GB
4 BPW (Q4_K_M): ~3.5–4.5GB (K-quants add slight overhead)

For 70B:

16 BPW: ~140GB
8 BPW: ~70GB
4 BPW: ~35–40GB

This formula is a reliable starting estimate before consulting specific model cards.

Common BPW Formats

BPW	Format Name	Use Case
16	FP16 / BF16	Training, full quality, high VRAM
8	Q8_0	Near-lossless, fits in VRAM
6	Q6_K	High quality, modest savings
5	Q5_K_M	Good balance for reasoning tasks
4	Q4_K_M	Standard for most local inference
3	Q3_K_M	Aggressive, visible quality loss
2	Q2_K	Extreme compression, significant degradation

Mixed-Precision Quantization

K-quant formats (Q4_K_M, Q5_K_M, etc.) don't apply the same BPW uniformly across all layers. Certain layers — particularly the attention layers that most affect output quality — are stored at higher precision, while less sensitive layers get more aggressive compression. This is why Q4_K_M often outperforms naive Q4_0 at the same approximate size.

Why It Matters for Local AI

BPW is the underlying concept behind every quantization format. When you see a model listed as "Q4_K_M," you're seeing 4 BPW with K-quants medium quality. Understanding BPW lets you estimate VRAM requirements for any model, and helps you reason about which quantization level your hardware can support before downloading a 40GB file.