CraftRigs

Build Guides

Step-by-step rig builds for local AI. From budget setups to multi-GPU workstations, with parts lists and benchmarks.

224 articles
Sort:
Architecture Guide

Ryzen 9 9950X3D2 for Local LLMs: Does 208MB Cache Pay Off?

208MB L3 cache speeds CPU inference—but only when GPU is the bottleneck. We benchmarked whether the $899 premium over 9950X is worth it for local LLMs.

cpu-inferenceryzen-9950x3d2local-llm-benchmark
Architecture Guide

The 8GB VRAM Trap: Why Your RTX 5060 Ti Might Cost You Twice

RTX 5060 Ti 8GB looks budget-friendly at $379 until you hit the 14B model wall. Here's exactly what fits in 8GB vs 16GB, with benchmarks and the honest upgrade path.

rtx-5060-tivram-requirementsgpu-buyers-guide
Architecture Guide

Decode Speed Explained: Tokens Per Second in Local LLMs

Decode speed (tok/s) determines how fast your local LLM feels. Learn what drives it, real GPU benchmarks, and why VRAM bandwidth beats TFLOPS every time.

local llmdecode speedtokens per second
Architecture Guide

Top 5 Budget GPUs for Local AI in 2026: What YouTube Won't Tell You

The 5 best budget GPUs for local AI in 2026, benchmarked on tok/s — not gaming fps. RTX 4060 Ti 16GB, RTX 5060 Ti 16GB, RTX 3060 12GB, RTX 3090 24GB, and RX 9060 XT 16GB tested with real VRAM limits disclosed.

budget-gpulocal-llmrtx-4060-ti
Architecture Guide

Build the Lenovo ThinkStation P5 Gen 2 for Half the Price

Lenovo's dual RTX Pro 6000 workstation will cost $35,000+. Here's how to build the same 192GB VRAM setup for $22,000 — or a rational dual 4090 build for $10,000.

workstation buildrtx pro 6000rtx 4090
Architecture Guide

Mistral Small 4 Local Setup: The 119B MoE Hardware Reality

Mistral Small 4 is 119B total parameters despite '6B active' marketing. You need 60–80GB VRAM to run it locally. Here's the exact hardware guide to set it up right.

mistral-small-4local-llmllama-cpp
Architecture Guide

The RTX 3090 Is Now the Best Value Local LLM GPU

Used RTX 3090s are at $650-750 — a 22% drop from six months ago. Here's why this is the floor, what 24GB VRAM actually unlocks, and where to buy safely.

rtx-3090local-llmvram
Architecture Guide

Should You Buy a Used RTX 5070 Ti?

New RTX 5070 Ti costs $999, used costs $899 — but it launched at $749 MSRP. Here's what caused this inverted market and whether buying used right now makes sense.

rtx-5070-tiblackwellused-gpu
Architecture Guide

Gemma 4 GPU Sweet Spot: Which Card Handles Every Size

Gemma 4 is imminent — and if Gemma 3's trajectory holds, 24GB VRAM covers the sweet spot tier. Here's the VRAM breakdown from Gemma 3 and which GPU tier to target now.

gemma-4gemma-3google
Architecture Guide

3 Things to Check Before Buying a Used RTX 4090

Used RTX 4090s at $1,400-1,800 are tempting for 24GB local LLM builds. Here's what to verify before you send money — and what to walk away from.

rtx-4090used-gpubuying-guide
Architecture Guide

The $5,000 Ultimate Local LLM Server Build

Full component list for a $5,000 workstation-class local LLM build. Dual GPU options, maximum VRAM, and real part picks for serious researchers and developers.

Architecture Guide

CPU Offloading Explained: When and Why to Use It

What CPU offloading is, how the --n-gpu-layers flag works in llama.cpp, and when splitting model layers between VRAM and RAM is worth the speed hit.

cpu-offloadingllama-cppvram
Architecture Guide

ECC RAM for LLM Servers: Do You Actually Need It?

What ECC RAM does, who actually needs it for local LLM workloads, and when it's worth the extra cost. Honest answer for consumer builders and production inference servers.

Architecture Guide

USB4 eGPU for Local LLMs: Does It Actually Work?

USB4 and Thunderbolt 4 eGPUs are bandwidth-limited to ~5 GB/s. Here's what that means for LLM inference throughput and whether it's worth trying.

egpuusb4thunderbolt4
Architecture Guide

Local LLM for Small Business: Hardware Setup Under $2,000

Model API costs doubled to $8.4B in 2025. For small businesses spending $150+/month on AI, local hardware pays itself back in under a year. Here's the exact build.

small businesslocal llmhardware
Architecture Guide

Gamer to AI Builder: Repurposing Your Gaming PC for Local LLMs

That RTX 3080 or 3090 in your gaming rig already runs local AI. Here's exactly what your hardware can handle, what it can't, and the one upgrade that makes the biggest difference.

gaming pclocal llmrepurpose hardware
Architecture Guide

How to Set Up a Local AI API Server for Your Team

Run a shared local LLM that your whole team can access like an internal ChatGPT. Hardware sizing, Ollama vs vLLM, and deployment options covered.

ollamavllmapi-server
Architecture Guide

PCIe Lanes for Local LLM Builds: When It Actually Matters

PCIe x16 vs x8 makes almost no difference once models are in VRAM. Here's when lane count actually bottlenecks your LLM rig — and what to spec for dual or triple GPU builds.

pciepcie-lanesmulti-gpu
Architecture Guide

Build a PC to Run Local LLMs: Component Guide for 2026

Building from scratch for local AI is different from a gaming build. This guide covers which components actually matter for LLM inference — and which ones you can save on.

local-llmpc-buildvram
Architecture Guide

Is 8GB VRAM Enough for Local LLMs in 2026?

The honest answer on whether your 8GB GPU can handle local AI in 2026 — what runs, what doesn't, and when to upgrade.

8gb-vrambudgetlocal-llm
Architecture Guide

llama.cpp Advanced Guide: Flags That Actually Boost Speed

Default llama.cpp settings leave 40–60% speed on the table. Master -ngl, -c, tensor-split, mmap, and context tuning to squeeze every token out of your hardware.

llama.cppquantizationperformance
Architecture Guide

Local AI on a Budget: Every Price Tier Ranked (2026)

What can you actually run locally at $200, $400, $600, and $1,000+? Honest breakdown of every budget tier with real hardware recommendations and what you're giving up.

budgetpriceaffordable
Architecture Guide

Mac vs PC for Local AI: The Complete Comparison

Apple Silicon vs NVIDIA GPU for running local LLMs — which is actually better? Real benchmarks, use cases, and the honest answer based on what you need.

macpcapple-silicon
Architecture Guide

The $3,000 Dual-GPU LLM Rig: Run 70B Models at Home

A dual-GPU PC build is the most cost-effective way to run 70B models at desktop speed. Two used RTX 3090s with NVLink gives you 48GB combined VRAM for under $3,000.

rtx-3090dual-gpunvlink
Architecture Guide

How to Run Llama 3 70B on a Mac with 128 GB RAM

You need an M4 Max or M3 Ultra Mac with at least 128 GB to run Llama 3 70B comfortably. Best setup is MLX through LM Studio — expect ~11-12 tok/s at Q4, which is conversational speed.

llama-70bapple-siliconm4-max