Model Parameters (7B, 13B, 70B)
The number of learned numerical weights in a model — the primary predictor of capability and VRAM requirement.
When a model is described as "7B" or "70B," the B stands for billion and refers to the number of trainable parameters — the individual numerical weights that encode everything the model learned during training. A 7B model has 7 billion such values. A 70B model has 70 billion.
Parameters are the model. They're stored in memory during inference, they're what quantization compresses, and they're what determines how much VRAM you need.
Why Parameter Count Predicts VRAM
The math is direct: at FP16 (16 bits per parameter, 2 bytes), VRAM usage in GB is approximately:
parameters (in billions) × 2 = GB of VRAM at FP16
- 7B × 2 = ~14GB
- 13B × 2 = ~26GB
- 70B × 2 = ~140GB
With Q4_K_M quantization (approximately 4 bits per weight, 0.5 bytes per parameter):
- 7B × 0.5 = ~3.5–4.5GB
- 13B × 0.5 = ~7–8GB
- 70B × 0.5 = ~35–40GB
These numbers plus KV cache overhead determine whether a model fits in your VRAM.
Rough Capability Tiers
Parameter count correlates strongly — but not perfectly — with capability:
- 1B–3B: Suitable for simple tasks, fast on any hardware. Limited reasoning.
- 7B–8B: Capable of most everyday tasks. Writes well, follows instructions, handles basic coding. The "daily driver" tier for most local users.
- 13B–14B: Noticeably better reasoning and instruction following than 7B. Good for coding, analysis, and complex prompts.
- 32B–34B: Strong across the board. Competes with older GPT-4 class performance on many benchmarks.
- 70B–72B: The top of consumer-accessible local models. Matches or approaches frontier model quality on many tasks.
- 405B+: Frontier-class, requires professional GPU clusters or distributed setups.
Parameters vs Quality
Parameter count isn't the only factor. Training data quality, instruction tuning, architecture improvements, and fine-tuning all affect output quality. A well-trained 7B model (like Qwen 2.5 7B) can outperform a poorly trained 13B on specific tasks. But all else being equal, more parameters means more capacity to store knowledge and perform complex reasoning.
Why It Matters for Local AI
Parameter count is the first filter when choosing a model for your hardware. Know your VRAM, know your target quantization level, and use the formula above to determine what fits. Then choose the largest model that fits comfortably, leaving headroom for KV cache.