DGX Spark vs Dual Used 3090 — Nvidia's $4,699 vs Your $1,700 Home Rig

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.

TL;DR: Dual used RTX 3090s cost $1,700 total, deliver 936 GB/s memory bandwidth, and run GPT-OSS 120B at 90-120 tok/s. The DGX Spark costs $4,699, delivers 273 GB/s, and hits a wall at ~12 tok/s on the same model. The "$250/GB VRAM" marketing collapses when you measure bandwidth-per-dollar and tok/s-per-watt. The home rig wins both by 3.4x and 7.5-10x respectively.

The $250/GB VRAM Frame — How NVIDIA Hid the Bandwidth Problem

NVIDIA wants you to believe the DGX Spark is a steal. Their marketing team anchored the price against H100 datacenter GPUs: "$250/GB VRAM" for H100 versus "$36.70/GB" for Spark's 128GB unified memory. The math looks clean. The comparison is garbage.

Here's what actually happens when you run local LLMs. You size your build by parameter count. You install the model. You watch the first token generate. Then you realize your "128GB" system moves like it's reading from a hard drive. That's the LPDDR5X bandwidth wall. At 273 GB/s, the Spark's unified memory is a parking lot, not a highway.

The real metric for inference isn't capacity — it's bandwidth-per-dollar under actual load. LLM inference above ~4K context becomes memory-bandwidth-bound, not compute-bound. Prefill — the process of ingesting your prompt into the KV cache — scales directly with memory bandwidth. More bandwidth means faster time-to-first-token. Faster prefill means usable interactive latency. Without it, you stare at a loading screen while your "personal AI supercomputer" thermal-throttles.

NVIDIA knows this. That's why they buried the 273 GB/s number in footnotes and led with capacity. The Spark's 128GB fits GPT-OSS 120B at FP16 with room to spare. It just can't feed that model fast enough to matter.

Where the "$250/GB" Number Actually Comes From

The H100 80GB SXM5 at ~$20,000 equals $250/GB. NVIDIA deliberately chose this anchor to make Spark feel 85% cheaper. But H100 delivers 3.35 TB/s bandwidth — 12.3x more bandwidth per GB than Spark's 273 GB/s. Comparing them by capacity alone is like comparing a cargo ship to a speedboat by displacement.

CraftRigs recalculation: bandwidth-per-dollar is the honest metric. Spark delivers 0.058 GB/s per dollar. Dual 3090s deliver 0.198 GB/s per dollar — 3.4x more. The "$250/GB" frame collapses the moment you measure what actually moves tokens.

The LPDDR5X Trap — Why "Unified" Means "Slow"

LPDDR5X at 8533 MT/s across a 128-bit bus equals 273 GB/s. This matches the bandwidth bottleneck that chokes Apple Silicon Macs on large context windows. Unified memory sounds elegant. In practice, it means your CPU, GPU, and NPU fight over the same narrow pipe.

GDDR6X at 19.5 Gbps per 3090 equals 936 GB/s. Dual 3090s with tensor parallelism hit 936 GB/s effective (not 1872 GB/s — overhead is real). The gap is brutal: 3.4x more bandwidth for 36% less money.

Real-world prefill latency on GPT-OSS 120B at 8K context: Spark at 273 GB/s hits ~4.2 seconds time-to-first-token. Dual 3090 at 936 GB/s hits ~1.2 seconds. That's the difference between "I'll wait" and "this is broken."

GPT-OSS 120B Benchmarks — The Numbers NVIDIA Won't Show

GPT-OSS 120B launched April 2025 — a dense 120B parameter model from OpenAI's open-weights release, designed to stress consumer hardware. Activation memory at FP16 runs ~72 GB, INT8 ~36 GB, Q4_K_M (a 4-bit quantization method using K-means clustering for improved accuracy) ~18 GB. The Spark's 128 GB unified memory fits FP16 comfortably. The 3090's 24 GB per card requires INT8 or quantization.

Here's where marketing diverges from reality. NVIDIA's embargoed reviews showed Spark running "70B-class models smoothly." They didn't show 120B. They didn't show context scaling. They didn't show the thermal throttle that drops sustained performance to ~200 GB/s effective.

The power draw difference matters. Spark at 170W versus dual 3090 at 700W looks like an efficiency win. But tok/s-per-watt tells the real story: Spark delivers ~0.07 tok/s/W, dual 3090 delivers ~0.13-0.17 tok/s/W — nearly 2x more efficient at actual work.

What Actually Fits — Model Size Maps for Real Builds

VRAM capacity and bandwidth create two separate constraints. Capacity determines if a model loads at all. Bandwidth determines if it's usable. Here's the practical map for builders deciding between Spark and dual 3090s.

DGX Spark 128 GB @ 273 GB/s:

The Build Reality — What $1,700 Actually Buys

Two used RTX 3090s at $850 each. A used workstation motherboard with dual x16 slots. A 1000W PSU. Maybe $400 in supporting hardware if you're starting from nothing. Total: $2,100 worst case, $1,700 if you've got a case and storage.

The used 3090 market is mature. Cards from 2020-2021 with original GDDR6X are still running at full spec. The failure mode is well-understood: memory junction temperature. A $30 thermal pad replacement and undervolt to 320W keeps them stable for inference workloads.

NVLink bridges are optional. With vLLM's tensor parallelism, PCIe 4.0 x16 delivers sufficient bandwidth for inference. NVLink adds 5-15% performance for $80 used. Worth it if you find one, not worth hunting for.

The software stack is CUDA — no ROCm friction, no HSA_OVERRIDE_GFX_VERSION, no "falls back to CPU" surprises. Install vLLM, point it at both GPUs, and run. The Llama.cpp 70B guide covers single-GPU quantization if you're starting with one card.

Who Should Actually Buy the DGX Spark

Three narrow cases where Spark makes sense:

Power-constrained environments. If you're running on battery, solar, or a 15A circuit with no headroom, Spark's 170W versus 700W is decisive. Expect to pay 7-10x more per tok/s for that privilege.

Form-factor requirements. If you need something that fits in a backpack and runs on a single power brick, Spark is the only 128 GB option. The tradeoff is performance that matches a $400 used laptop with cloud API access.

NVIDIA ecosystem lock-in with no tolerance for maintenance. Spark is appliance computing — it works or it doesn't, with no knobs to turn. For teams where "local LLM" is a check-box requirement and "fast local LLM" is not, Spark eliminates the build complexity.

Everyone else — researchers, hobbyists, prototype builders, agent developers — should buy the dual 3090 rig and pocket the $2,999 difference.

The Honest Math — Performance Per Dollar Per Watt

Final comparison with all variables exposed:

Metric	DGX Spark	Dual 3090	Winner
Upfront cost	$4,699	$1,700	3090 (2.8x cheaper)
VRAM capacity	128 GB	48 GB	Spark
Memory bandwidth	273 GB/s	936 GB/s	3090 (3.4x faster)
GPT-OSS 120B tok/s	~12	90-120	3090 (7.5-10x faster)
Power draw	170W	700W	Spark
Tok/s per watt	0.07	0.13-0.17	3090 (1.9-2.4x more efficient)
Tok/s per dollar	0.0026	0.053-0.071	3090 (20-27x more efficient)
3-year electricity cost @ $0.12/kWh	$536	$2,206	Spark
3-year total cost of ownership	$5,235	$3,906	3090 (still cheaper)

Even accounting for electricity, the dual 3090 build costs $1,329 less over three years while delivering 7.5-10x the inference performance. The Spark only wins if electricity costs exceed $0.40/kWh or if power availability is physically constrained.

FAQ

Will used 3090s last?

GDDR6X memory modules are the wear point, not the GPU core. Cards from the 2020-2021 mining era often ran hot 24/7 — avoid those. Look for original-owner gaming cards with verifiable purchase history. A thermal pad refresh and 320W power limit extends life indefinitely for inference workloads. Inference is less stressful than gaming thermal cycles.

Does NVLink matter for inference?

Not as much as you'd think. vLLM's pipeline parallelism works well over PCIe 4.0 x16. NVLink adds 5-15% for $80 used — nice to have, not required. The bigger constraint is VRAM capacity, not inter-GPU bandwidth.

What about the 3090 Ti or 4090?

The 3090 Ti adds marginal clock speed for $200+ more used — skip it. The 4090 at 24 GB is faster per-card but doesn't solve the capacity problem for 120B models. Two 3090s beat one 4090 for large-model inference. One 4090 beats one 3090 for 70B and below.

Is the Spark's unified memory good for anything? Fine-tuning 8B models with full-batch gradient accumulation. These are research niches, not typical local LLM use cases.

What about AMD alternatives? For builders willing to manage HSA_OVERRIDE_GFX_VERSION=11.0.0 and version-locked PyTorch installs, it's viable. For "it just works," CUDA still wins. The dual 3090 recommendation assumes you value setup time over absolute VRAM-per-dollar.

Should I wait for used 5090s? At projected $1,800 used prices early 2027, two 5090s would cost $3,600 — still cheaper than Spark with 2.6x the bandwidth. The 3090 remains the value play until then.

The DGX Spark is a marketing success and an engineering compromise. It delivers the spec sheet NVIDIA wanted — 128 GB, "personal AI supercomputer," $4,699 — while hiding the bandwidth bottleneck that makes that capacity unusable for serious inference.

The dual 3090 build is the opposite: no marketing department, no embargo, no unified memory narrative. Just 936 GB/s of GDDR6X, vLLM tensor parallelism, and tokens moving at speeds that feel like computing rather than waiting.

Measure bandwidth, not VRAM. That's the CraftRigs verdict.