DeepSeek V4 confirmed last week: 1 trillion open weights, MIT license, competitive with GPT-5.4 on coding and math benchmarks. The community has been running it on consumer hardware since the weights dropped, and the results confirm what the benchmark scores suggested — this is the most capable open-weight model released to date.
The 1T total parameter count sounds extreme, but DeepSeek V4 is a Mixture-of-Experts model with approximately 37B active parameters per forward pass. The official distilled variants (7B, 14B, 32B, 70B) are dense models trained to replicate the larger model's performance — and they're what most local users will actually be running.
Here's the full breakdown by tier.
Quick Summary
- DeepSeek V4 full model (1T) requires 10+ high-VRAM GPUs — not practical for most home builders
- Distilled variants (7B–70B) are where consumer hardware actually runs this model
- The 32B distill on a single RTX 3090 is the sweet spot: competitive with GPT-5.4 on coding at zero API cost
Understanding the DeepSeek V4 Model Family
DeepSeek V4 ships as a model family, not a single monolithic file:
DeepSeek V4 (full, 1T MoE): The flagship. 1T total parameters, 37B active. Requires massive multi-GPU hardware.
DeepSeek V4 Distill 7B: Dense model distilled from V4. Punches above its weight class compared to other 7B models.
DeepSeek V4 Distill 14B: Fits comfortably on 12GB VRAM at Q4. Strong coding performance.
DeepSeek V4 Distill 32B: The consumer sweet spot. 24GB VRAM handles it at Q4. Competitive with GPT-4.5 on coding tasks.
DeepSeek V4 Distill 70B: Requires 48GB VRAM at Q4. Strongest consumer-runnable variant.
The distills are not the same model as V4 — they're smaller models trained on V4's outputs. But because they're trained on a 1T model's reasoning traces, they outperform models of similar size trained on human data alone.
Tier Breakdown by GPU
8GB VRAM — DeepSeek V4 Distill 7B
Compatible GPUs:
- RTX 3070, RTX 3060 Ti, RTX 4060, RTX 4060 Ti 8GB
- RX 6800, RX 7600
- Any GPU with 8GB+ VRAM
Performance at 8GB:
- DeepSeek V4 7B at Q4 (~5GB) — fits with room for context
- DeepSeek V4 7B at Q8 (~8GB) — tight, may need to reduce context length
The 7B distill is genuinely useful for coding autocomplete, simple Q&A, and lightweight agent tasks. It's not GPT-5.4 replacement territory but it outperforms standard 7B models of earlier vintages significantly.
Quantization tradeoffs at 7B:
- Q4_K_M: Best quality at 8GB, standard choice
- Q8: Near-lossless quality, requires tight VRAM management
- Q2: Not recommended — meaningful quality degradation at 7B
16GB VRAM — DeepSeek V4 Distill 14B
Compatible GPUs:
- RTX 4060 Ti 16GB, RTX 4070, RTX 4070 Super
- RX 9070 XT (16GB)
- RTX 5060 Ti 16GB
Performance at 16GB:
- 14B at Q4 (~10GB) — excellent fit with context headroom
- 14B at Q8 (~14GB) — fits well, near-lossless quality
The 14B distill is meaningfully better than the 7B on complex reasoning chains. At Q8 quantization in 16GB VRAM, it's one of the most capable models per dollar of hardware at this tier.
Token speed (14B Q4 at 16GB VRAM): ~55–70 tokens/second depending on card.
24GB VRAM — DeepSeek V4 Distill 32B (The Sweet Spot)
Compatible GPUs:
- RTX 3090 (24GB) — used ~$500
- RTX 4090 (24GB) — new ~$1,600
- RTX 5090 (32GB) — runs 32B at Q8 with room to spare
Performance at 24GB:
- 32B at Q4 (~20GB) — fits cleanly with good context headroom
- 32B at Q8 (~34GB) — doesn't fit, needs CPU RAM offload (15–20GB to RAM)
The 32B distill is where "competing with GPT-5.4 on coding" becomes a realistic claim. On HumanEval and LiveCodeBench, independent testers have put DeepSeek V4 Distill 32B within 5–8% of GPT-5.4. For debugging, code review, and software architecture questions, the gap is often imperceptible.
Token speed (32B Q4 at 24GB VRAM): ~30–50 tokens/second on RTX 3090, ~45–65 on RTX 4090.
Tip
RTX 3090 at Q4 for the 32B distill is the best dollars-to-performance configuration for local DeepSeek V4 inference. You get $0/token inference on a model that competes with GPT-5.4 on most coding tasks, for a hardware investment of $500–$600.
48GB VRAM — DeepSeek V4 Distill 70B
Compatible GPUs:
- 2× RTX 3090 (~$1,000 used)
- 2× RTX 4090 (~$3,200 new)
- Single NVIDIA A6000 48GB (~$2,500 used)
Performance at 48GB:
- 70B at Q4 (~40GB) — fits cleanly
- 70B at Q5 (~50GB) — tight, just over 48GB, needs minimal offload or 64GB VRAM
The 70B distill is the strongest consumer-accessible variant. At Q4, it benchmarks at approximately 90–93% of GPT-5.4 across coding, math, and reasoning categories. For most professional coding workflows, this is indistinguishable from GPT-5.4.
Token speed (70B Q4 at 48GB): ~15–25 tokens/second over PCIe multi-GPU.
The Full 1T Model: What It Actually Takes
For completeness:
- Q4 quantization: ~500GB VRAM required
- Minimum practical config: 5× NVIDIA A100 80GB (400GB total, needs significant offload) or 7× A100 (560GB for cleaner inference)
- Expected cost (used A100 80GB): $10,000–$15,000 per card × 7 = $70K–$105K
This is data center territory. The distilled variants are the practical answer for local inference.
Quantization Tradeoffs
Notes
Near-lossless, high VRAM
Good balance
Standard choice, good quality/VRAM ratio
Noticeable degradation on complex reasoning
Significant degradation — avoid unless necessary For the DeepSeek V4 distills, Q4_K_M is the default recommendation. The 32B at Q4 is better than the 70B at Q2 — always prefer a better quantization on a smaller model over a worse quantization on a larger one.
FAQ
Can you run DeepSeek V4 on a single consumer GPU? Yes, but only smaller distilled variants or heavily quantized versions of the full model. The DeepSeek V4 distill at 32B runs on a single RTX 3090 at Q4. The full 1T model requires 10+ high-VRAM GPUs. Most local users will want the 32B or 70B distilled versions.
What is the minimum GPU for running DeepSeek V4 locally? For the 7B distilled variant: any GPU with 8GB VRAM (RTX 3070, RTX 4060, RX 7600). For the 32B distilled variant: 24GB VRAM (RTX 3090 or RTX 4090). For the 70B distilled variant: 48GB minimum (two RTX 3090s or a single A6000).
How does DeepSeek V4 compare to GPT-5.4 on benchmarks? DeepSeek V4 is competitive with GPT-5.4 on coding tasks and mathematical reasoning, and slightly behind on open-ended reasoning and instruction following. The gap is narrow enough that for coding-focused use cases, the local DeepSeek V4 32B distill is a genuine GPT-5.4 alternative at zero API cost.