CraftRigs
CraftRigs / Glossary / Bits Per Weight (BPW)
Models & Quantization

Bits Per Weight (BPW)

The number of bits used to store each model parameter, determining model size in memory.

Every parameter in a language model is a numerical value — a weight that was learned during training. Bits per weight (BPW) describes how many bits are used to store each of those values. Higher BPW means more precise representation, larger file size, more VRAM required, and typically better output quality. Lower BPW means the opposite.

The BPW-to-VRAM Formula

Estimating a model's VRAM footprint from BPW is straightforward:

Model size in GB ≈ (parameter count × BPW) / 8,000,000,000

Examples for a 7B model:

  • 16 BPW (FP16): (7,000,000,000 × 16) / 8B = ~14GB
  • 8 BPW (Q8_0): ~7GB
  • 4 BPW (Q4_K_M): ~3.5–4.5GB (K-quants add slight overhead)

For 70B:

  • 16 BPW: ~140GB
  • 8 BPW: ~70GB
  • 4 BPW: ~35–40GB

This formula is a reliable starting estimate before consulting specific model cards.

Common BPW Formats

BPWFormat NameUse Case
16FP16 / BF16Training, full quality, high VRAM
8Q8_0Near-lossless, fits in VRAM
6Q6_KHigh quality, modest savings
5Q5_K_MGood balance for reasoning tasks
4Q4_K_MStandard for most local inference
3Q3_K_MAggressive, visible quality loss
2Q2_KExtreme compression, significant degradation

Mixed-Precision Quantization

K-quant formats (Q4_K_M, Q5_K_M, etc.) don't apply the same BPW uniformly across all layers. Certain layers — particularly the attention layers that most affect output quality — are stored at higher precision, while less sensitive layers get more aggressive compression. This is why Q4_K_M often outperforms naive Q4_0 at the same approximate size.

Why It Matters for Local AI

BPW is the underlying concept behind every quantization format. When you see a model listed as "Q4_K_M," you're seeing 4 BPW with K-quants medium quality. Understanding BPW lets you estimate VRAM requirements for any model, and helps you reason about which quantization level your hardware can support before downloading a 40GB file.