Fix Guides

Exact fixes for the most common local LLM setup problems — GPU not detected, VRAM errors, ROCm failures, and slow inference. Find your error, get it running.

15 articles

Sort:

Fix Guide

AMD GPU on Windows with Ollama: Use Vulkan Instead of ROCm

ROCm doesn't run on Windows—your RX 7900 XTX sits idle at 4 tok/s. Force Vulkan backend: 18 tok/s on 70B models. Requires typo env var, Adrenalin 24.12.1+.

April 18, 2026

ollamaamd gpuvulkan

Fix Guide

CUDA Driver Version Insufficient Error: What It Means and How to Fix It

scription: ""CUDA driver version is insufficient" crashes LLM setup — here''s the 550.90.07 driver fix, with DDU rollback if 570.xx broke everything

April 18, 2026

CUDANVIDIA driverOllama

Fix Guide

GPU Layers Not Offloading in llama.cpp: Diagnosis and Fix

-ngl 99 still shows 0 layers? Your binary was compiled without CUDA/ROCm. Rebuild with explicit flags, verify with ldd, get 30x speedup in 10 minutes.

April 18, 2026

llama.cppGPU offloadROCm

Fix Guide

llama.cpp NUMA Warning: Why You're Losing Speed on Multi-Socket Systems

Dual EPYC crawling? NUMA warning cuts 6.8→14.2 tok/s. Single-node binding fixes it—if your model fits 128 GB. The exact numactl command inside.

April 18, 2026

llama.cppNUMAEPYC

Fix Guide

Why Your Model Runs at 2 tok/s Instead of 80: The VRAM Spill Problem

Your 4090 crawls at 2 tok/s? VRAM spill kills speed—fix with one quantization tweak, verify in 60s. Works for local LLMs only.

April 18, 2026

ollamallama.cppvram

Fix Guide

Ollama Crashes on Windows with NVIDIA GPU: Causes and Fixes

Ollama crashes Windows NVIDIA? 40% fail on driver 572.xx — 90-second diagnostic isolates your fix, with version-locked solutions that work.

April 18, 2026

ollamawindowsnvidia

Fix Guide

Flash Attention Not Working in Ollama: How to Enable It Correctly

OLLAMA_FLASH_ATTENTION=1 set but VRAM stuck? Env var hit systemd, not server. Fix scope, verify 35% savings at 32K context—if GPU's new enough.

April 18, 2026

ollamaflash attentionVRAM

Fix Guide

Model Loading Fails: Disk Space, Permissions, and Path Errors in Ollama

scription: ""File does not exist" at 47%? 73% are disk, path, or permission issues—not remote failures. Map your exact error to the fix in 4 commands

April 18, 2026

ollamalocal-llmdisk-space

Fix Guide

Multi-GPU Ollama: Why Only One GPU Is Running and How to Fix It

Two GPUs but only one running? OLLAMA_NUM_GPU=40,40 not 2 — fix layer splits, verify with nvidia-smi, hit 22 tok/s on 70B. Needs Ollama 0.1.38+

April 18, 2026

ollamamulti-gpunvidia-smi

Fix Guide

Quantization vs. VRAM: Picking the Right Q-Level to Fit Your GPU

32B Q4_K_M OOM'd? Calculate exact VRAM: weights + KV cache + overhead. 8GB cards hit wall at 13B—here's the quant that fits, with tok/s numbers.

April 18, 2026

quantizationVRAMGGUF

Fix Guide

ROCm Not Detecting Your AMD GPU: Fix Guide for RX 6000/7000 Series

ROCm won't detect your RX 7900? HSA_OVERRIDE_GFX_VERSION fixes 89% of cases—here's the exact env var, kernel check, and udev rule that works.

April 18, 2026

ROCmAMD GPURX 7900

Fix Guide

Fixing ROCm for Unsupported AMD GPUs: HSA_OVERRIDE_GFX_VERSION Explained

RX 6700 XT not detected by Ollama? One env variable unlocks 38 tok/s—if you pick the right GFX version. Mapping table + syntax for every runtime.

April 18, 2026

ROCmAMDRX 6700 XT

Fix Guide

Ollama Not Using GPU: Complete Fix Guide (NVIDIA, AMD, WSL2)

Ollama falls back to CPU at 3 tok/s. Run 3 commands, apply the NVIDIA/AMD/WSL2 fix, hit 45+ tok/s — but AMD on Windows needs Linux.

April 18, 2026

ollamagpunvidia

Fix Guide

Slow Time-to-First-Token vs. Slow Generation: They're Different Problems

LLM hangs 30s before output? That's prefill, not VRAM. Cut TTFT 60% with 4 diagnostic questions — if CPU-bound, not bandwidth-starved.

April 18, 2026

ttftprefill latencyllama.cpp

Fix Guide

Fix: Ollama Out of Memory Errors — Context Length Is Eating Your VRAM

Ollama crashes? NUM_CTX pre-allocates 2.5KB per token on 70B models. Calculate your real VRAM limit, set context right, stop silent CPU fallback.

April 18, 2026

ollamavramkv-cache