Running AI Offline: Hardware for Air-Gapped Local LLM Setups

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.

There's a difference between "private" and "air-gapped" that most local AI guides blur past. Private means your data doesn't leave your machine during normal use. Air-gapped means the machine is physically disconnected from all networks — no Wi-Fi, no Ethernet, no Bluetooth — and your data can't leave even if someone compromised the software.

Most people need private. A specific subset needs air-gapped.

If you're in that subset, this guide is for you.

Who Actually Needs Air-Gapped AI

Legal and compliance teams handling attorney-client privileged documents. Medical researchers with patient data under HIPAA. Defense contractors under ITAR regulations. Forensic investigators who cannot risk network contamination. Journalists protecting source identities in high-risk contexts.

And increasingly: security researchers who need uncensored, network-isolated AI for red-team work and vulnerability analysis.

The phrase "air-gapped" comes from physical network security. An air gap is literally a gap of air between your machine and any network connection — no cable, no wireless radio, nothing. It's the only 100% reliable way to prevent data exfiltration over a network, because you've eliminated the network.

Running AI on an air-gapped machine is straightforward in principle but requires specific hardware choices that not all guides address.

The Hardware Requirements (Air-Gapped Specific)

The inference hardware requirements are identical to any local LLM setup. What changes is everything around the machine.

No wireless hardware. This sounds obvious but: many modern motherboards have built-in Wi-Fi. In a true air-gapped setup, you either disable it in BIOS, physically remove the M.2 Wi-Fi card, or buy a motherboard without wireless. Don't rely on software-disabled Wi-Fi — hardware removal is the correct approach for genuine air-gap requirements.

Bluetooth off entirely. Bluetooth is another attack surface. Same principle applies — hardware removal is more reliable than software disable.

DVD or USB-only data transfer. Air-gapped machines need a controlled path for getting data in and out. USB drives with physical write-protect switches exist and are appropriate for high-security setups. Some organizations use one-directional data diodes for structured information flow.

For GPU selection: Standard recommendations apply. The key constraint for air-gapped setups is that you download all required drivers, firmware, and models before you disconnect. Driver updates on an air-gapped machine require manual USB transfer — plan for this in advance.

Recommended GPU

RTX 4060 Ti 16GB

RTX 4090 24GB

Dual RTX 3090

RTX 5090 32GB

Note

Model downloads happen before air-gap. Use ollama pull [model] or download GGUF files from HuggingFace while still connected. Then physically disconnect. The models are stored locally and don't require connectivity after download. A 70B model at Q4 is roughly 43GB — budget your NVMe accordingly.

Software for Air-Gapped Operation

Ollama works perfectly offline after initial setup. The model server runs on localhost with no required internet connection. No activation, no licensing server, no telemetry that fails without connectivity.

llama.cpp is the other strong option — even lower overhead than Ollama, and the binary can be compiled once and transferred to the air-gapped machine via USB. See the llama.cpp advanced guide for configuration details.

Avoid LM Studio for air-gapped setups. LM Studio makes occasional network calls for model discovery and UI features. It can be used offline, but Ollama or raw llama.cpp give you cleaner guarantees about network behavior.

AnythingLLM for document ingestion works well offline — it stores embeddings in a local SQLite/Chroma database. Good choice if you're processing classified or privileged documents and need a RAG layer.

Caution

Model downloads from HuggingFace or Ollama's registry require internet access. Do all model pulls before disconnecting. Test that models load and inference works before you air-gap the machine. You don't want to discover a missing dependency after you've cut the network.

The Practical Air-Gap Procedure

Build or prepare your inference machine (standard local LLM hardware)
Install OS, GPU drivers, Ollama or llama.cpp
Pull all required models
Test that inference works correctly
Install any local interface (Open WebUI, AnythingLLM)
Shut down
Physically remove or disable wireless hardware
Disconnect Ethernet
Reboot and verify everything still works
The machine is now air-gapped

From this point, data transfer only happens via physically controlled media — USB drives you've verified, optical disks, or hardware data diodes if you're in a structured environment.

Storage: Plan for Model Size

Air-gapped setups can't pull new models on demand. You need to download everything you'll need in advance. A practical starting library for an air-gapped professional setup:

Llama 3.1 8B Q4 (~4.7GB) — fast general assistant
Qwen 2.5 14B Q4 (~8.9GB) — stronger reasoning, fits 16GB VRAM
One 70B model if your hardware supports it (~43GB)
A code-specialized model if relevant (Qwen 2.5 Coder 32B, ~19GB)

Total storage budget: 1-2TB NVMe is appropriate. A fast PCIe 4.0 drive reduces model load times — relevant when you can't just leave the model running and will restart it between sessions. The NVMe benchmark guide has timing data for load speeds across drive generations.

Use Cases That Justify the Friction

Air-gapping adds complexity. Hardware removal, manual data transfer, no remote access. It's not the right choice for most people running local AI for general privacy.

It is the right choice when:

You work with legally privileged materials (attorney-client, medical records)
Your regulatory environment explicitly prohibits network-connected AI (ITAR, HIPAA covered entities, certain government clearances)
You're conducting security research and need AI assistance that cannot be monitored
You're a journalist protecting a source in a targeted country

For everyone else, the privacy-first local AI guide covers the less extreme — but still fully private — approach that works on a standard network-connected machine.

Verdict

Air-gapping is infrastructure, not paranoia. For the specific contexts where it matters — regulated industries, security research, high-risk journalism — it's the right call and the hardware requirements are identical to any local LLM setup. The main difference is in the setup process and the discipline required to maintain the gap.

Build it like any 16GB+ VRAM local LLM rig. Buy a capable workstation. Download your models. Then pull the plug.