Local Coding Assistant: Aider + Qwen 3.6 27B + RTX 5080
Build a private, offline coding assistant with Aider, Qwen 3.6 27B, and RTX 5080—40+ tokens/sec throughput, no subscription costs, honest gaps vs. Claude Code documented.
Apr 26, 2026Hardware prices move daily, new releases drop monthly, and buying the wrong $800 GPU at the wrong moment is a mistake you live with for three years.
Charlotte tracks hardware pricing, model requirements, and market timing so you know exactly what to buy, and when to wait. She monitors public markets, used listings, and manufacturer announcements, covering the used market as seriously as new cards.
You've seen GDDR6X and PAM4 on your GPU spec sheet. Learn the signaling story: why GDDR7 chose PAM3, the bandwidth math (1,008→1,472 GB/s), and how it works.
May 8, 2026
Build a private, offline coding assistant with Aider, Qwen 3.6 27B, and RTX 5080—40+ tokens/sec throughput, no subscription costs, honest gaps vs. Claude Code documented.
Apr 26, 2026M4 Max 128 GB runs 70B models silent at 14 tok/s but loses 5x on prompt speed—here's exactly when Mac beats NVIDIA and when it wastes $4,600.
Apr 23, 2026
Weight quants saved 40% VRAM but KV cache still OOMs. Recent llama.cpp patches plus Google's TurboQuant research cut cache 75% with <3% perplexity hit — ROCm build flags inside.
Apr 23, 2026
You paid $1,850 for 24 GB when $650 gets identical VRAM. The 3090 still runs 70B models at 8 tok/s — only one MoE scenario finally breaks it.
Apr 23, 2026
OOM crashes on 'should fit' models? This matrix shows actual VRAM for 16 models × 9 quants — 24 GB runs 70B at Q3_K_L, not Q4. Windows vs Linux included.
Apr 23, 2026
Bought 24 GB GPU for 70B models but run Q4_0—wasting $400. 20 GB + Q5_K_M beats it. The 1.25x buffer rule fixes your next buy.
Apr 23, 2026
Your 24 GB card OOMs on '70B models' — here's what actually fits per tier, with 2026 tok/s numbers for Qwen3-235B MoE (22b active) and Llama 4. 8 GB hits wall at 13B.
Apr 23, 2026
Your 759 AI TOPS promise crashes into Q4_K_M reality: real-world reports show ~13% gains, not 115%. Here's when FP4 actually works — and the flag that unlocks it.
Apr 18, 2026
Used H100s hit $8,200 after B100, but 700W TDP and $3,200/year power crush value. Dual RTX 3090 hits 89% of 70B perf for 31% cost—here's when 80 GB wins.
Apr 18, 2026
Benchmark RTX 5060 Ti 8GB on 13B-70B models. See why 8GB hits the ceiling for Llama, Qwen, and Mistral at Q4 quantization.
Apr 14, 2026
Buying a 40+ TOPS laptop won't run your LLMs faster. Your GPU does that. Here's which AI PC parts actually matter—and which to ignore.
Apr 9, 2026
RTX 4070 Super + Ryzen 7 5700X3D delivers serious 70B inference for $2,700. Exact parts list, benchmarks, and what 12GB VRAM really handles.
Apr 9, 2026
Paying $47K/quarter in API costs? A single GPU rig breaks even in under 3 months. Here's the math, the hardware tiers, and what most teams get wrong.
Apr 9, 2026
Agent frameworks burn 3x more tokens than chat—your 16GB GPU hits OOM faster. Here's exact VRAM per framework and when local inference beats Claude API.
Apr 9, 2026
AMD GPUs drop 15–20% in the weeks after NVIDIA launches. Here's the exact timing window to buy RX 7900 or 9060 XT for local AI without overpaying.
Apr 9, 2026
R9700's 32GB VRAM runs 70B models that RTX 5090 can't fit. ROCm setup is 2 hours, not 2 days. Real benchmarks and whether it's worth leaving NVIDIA.
Apr 9, 2026
Ryzen AI Max ships with 24GB allocated to GPU by default. A single kernel parameter bumps it to 108GB—enough for 70B Q4 fully in GPU memory.
Apr 9, 2026
OOM crashes with 27B models aren't always about VRAM. Context length, quantization, and Windows memory limits are fixable without buying new hardware.
Apr 9, 2026
24GB fits DeepSeek V4 quantized. 96GB runs full 1M context. This guide breaks down which tier is worth the cost jump—and when the API beats building hardware.
Apr 9, 2026
Two used RTX 3090s give you 48GB VRAM for $1,400–1,500 total. Run Llama 70B at 16+ tok/s. Complete parts list, motherboard gotchas, and benchmarks.
Apr 9, 2026
ExLlamaV2 hits 250 tok/s on RTX 4090 for batch jobs—5x faster than Ollama. Here's the exact setup and when to use it over llama.cpp for production runs.
Apr 9, 2026
Fine-tuning 8B models takes 45 minutes and 14GB VRAM with Unsloth QLoRA. No A100 needed. Complete guide with exact hardware requirements and benchmarks.
Apr 9, 2026
GPU prices are up 40% and stuck there until 2027. Here's exactly when GDDR7 supply normalizes, which GPUs to buy now, and which to wait on.
Apr 9, 2026
GLM-4.7 needs multi-GPU to run locally—single RTX 4090 won't cut it. Here's exact VRAM, the viable hardware paths, and when to use the API instead.
Apr 9, 2026
TurboQuant compresses KV cache 4–5x, turning your 24GB GPU into a 100K+ context window card. Here's when it ships and whether to wait or build now.
Apr 9, 2026
Memory shortages are pushing GPU prices up 15–30% before summer. Here's which cards to lock in now and which to skip while you still have time.
Apr 9, 2026
Browser downloads leave half your VRAM unused. hf_transfer gets you 3–5x speed, resumes mid-download, and integrates directly with Ollama model paths.
Apr 9, 2026