Bits Per Weight (BPW)
The number of bits used to store each model parameter, determining model size in memory.
Every parameter in a language model is a numerical value — a weight that was learned during training. Bits per weight (BPW) describes how many bits are used to store each of those values. Higher BPW means more precise representation, larger file size, more VRAM required, and typically better output quality. Lower BPW means the opposite.
The BPW-to-VRAM Formula
Estimating a model's VRAM footprint from BPW is straightforward:
Model size in GB ≈ (parameter count × BPW) / 8,000,000,000
Examples for a 7B model:
- 16 BPW (FP16): (7,000,000,000 × 16) / 8B = ~14GB
- 8 BPW (Q8_0): ~7GB
- 4 BPW (Q4_K_M): ~3.5–4.5GB (K-quants add slight overhead)
For 70B:
- 16 BPW: ~140GB
- 8 BPW: ~70GB
- 4 BPW: ~35–40GB
This formula is a reliable starting estimate before consulting specific model cards.
Common BPW Formats
| BPW | Format Name | Use Case |
|---|---|---|
| 16 | FP16 / BF16 | Training, full quality, high VRAM |
| 8 | Q8_0 | Near-lossless, fits in VRAM |
| 6 | Q6_K | High quality, modest savings |
| 5 | Q5_K_M | Good balance for reasoning tasks |
| 4 | Q4_K_M | Standard for most local inference |
| 3 | Q3_K_M | Aggressive, visible quality loss |
| 2 | Q2_K | Extreme compression, significant degradation |
Mixed-Precision Quantization
K-quant formats (Q4_K_M, Q5_K_M, etc.) don't apply the same BPW uniformly across all layers. Certain layers — particularly the attention layers that most affect output quality — are stored at higher precision, while less sensitive layers get more aggressive compression. This is why Q4_K_M often outperforms naive Q4_0 at the same approximate size.
Why It Matters for Local AI
BPW is the underlying concept behind every quantization format. When you see a model listed as "Q4_K_M," you're seeing 4 BPW with K-quants medium quality. Understanding BPW lets you estimate VRAM requirements for any model, and helps you reason about which quantization level your hardware can support before downloading a 40GB file.