What are the minimum hardware requirements for LM Studio?

LM Studio runs on any modern PC with at least 8GB of system RAM for CPU-only inference on small models. For GPU acceleration, you need 6GB+ VRAM for 7B models, 10-12GB for 13B models, and 20-24GB for 34B models. NVIDIA (CUDA), AMD (ROCm on Windows/Linux), Intel Arc, and Apple Silicon (Metal) are all supported.

Can I access LM Studio from another device on my network?

Yes. Enable the local server in LM Studio, set the host to 0.0.0.0 (any interface), and note the IP address shown. Other devices on the same network can reach it at http://YOUR-PC-IP:1234. You may need to allow the port through Windows Firewall or your OS firewall.

Does LM Studio support NVIDIA and AMD GPUs?

Yes. LM Studio auto-detects NVIDIA GPUs via CUDA and AMD GPUs via ROCm (Windows and Linux). On Windows with AMD, ensure you have the latest AMD Adrenalin drivers. LM Studio also supports Apple Silicon via Metal and Intel Arc via oneAPI.

What LM Studio version introduced local server mode?

LM Studio has included local server functionality since early versions, but the current implementation with improved network binding and OpenAI API compatibility is in version 0.3.x. The server settings are under the Developer tab (plug icon) in the left sidebar.

LM Studio Tutorial: Turn Your Gaming PC Into a Local LLM Server

Some links on this page may be affiliate links. We disclose it because you deserve to know, not because it changes anything. Every recommendation here comes from benchmarks, not budgets.

Quick Summary:

GPU VRAM is the hard limit: 7B models need 6-8GB, 13B need 10-12GB, 34B need 20-24GB, 70B need 40-48GB at Q4_K_M quantization.
Server mode is one toggle: LM Studio 0.3.x exposes an OpenAI-compatible API at localhost:1234 — any AI tool that supports OpenAI connects to it automatically.
Network access requires a firewall rule: Binding to 0.0.0.0 lets other devices connect, but you'll need to allow the port through Windows Defender Firewall.

Your gaming PC is probably underutilized. When you're not gaming, that RTX 4070 Ti or RX 7900 XTX is sitting idle with 12-16GB of VRAM that could be running local language models 24/7. LM Studio is the fastest path from "I want to run a local LLM" to actually having one running — no command line required.

This guide covers the full setup: installing LM Studio, choosing the right model for your GPU, loading it, enabling the local API server, and connecting from other devices on your network.

Step 1: Install LM Studio

Download LM Studio from lmstudio.ai. It's available for Windows, macOS, and Linux.

Windows install: Standard .exe installer. Run it, it installs to %LOCALAPPDATA%\Programs\LM Studio. No admin required.

macOS install: .dmg file, drag to Applications. Apple Silicon and Intel both supported.

Linux install: .AppImage file. Make it executable and run:

chmod +x LM-Studio-*.AppImage
./LM-Studio-*.AppImage

For GPU acceleration on Linux:

NVIDIA: Ensure CUDA 12.x drivers are installed. LM Studio detects CUDA automatically.
AMD: ROCm 6.x. Check rocm-smi confirms your card is visible before launching.

After first launch, LM Studio checks your hardware and shows detected GPU(s) in the bottom status bar. Confirm your GPU is listed — if it shows "CPU only," your drivers need attention.

Step 2: Know Your VRAM Limit Before Downloading

This is the step most tutorials skip, and it's why people download models that won't fit.

LM Studio runs GGUF models. GGUF supports multiple quantization levels, each trading quality for size. At Q4_K_M (the recommended baseline — good quality, reasonable size), memory requirements are:

Min VRAM (GPU-only)

3 GB

6 GB

6-7 GB

10-12 GB

22-24 GB

42-48 GB VRAM requirement is slightly higher than model file size because of KV cache and framework overhead. Plan on needing ~1.5-2GB above the file size for comfortable operation. For a deep dive on why VRAM fills up mid-conversation, see our KV cache explainer. For choosing which GGUF quantization variant to download, see our GGUF vs GPTQ vs AWQ vs EXL2 guide.

Your GPU's VRAM:

RTX 3060 12GB → runs 7B comfortably, 13B at Q3 or smaller
RTX 4060 Ti 16GB → runs 13B comfortably
RTX 4070 Ti 12GB → runs 13B at Q4, tight; better at Q3_K_M
RTX 4070 Ti Super 16GB → runs 13B comfortably
RTX 3090 / 4090 24GB → runs 34B at Q4_K_M
Dual RTX 3090 48GB → runs 70B at Q4_K_M

If your model is larger than your VRAM, LM Studio can split it across CPU RAM and GPU (partial GPU offload). This works but is significantly slower — expect 5-20 tokens/sec instead of 50-120 t/s.

Step 3: Download a Model

In LM Studio's left sidebar, click the magnifying glass (Discover) icon.

Search for the model you want. For a first model, recommended starting points by VRAM tier:

6-8GB VRAM: llama3.2:3b or qwen2.5-7b-instruct
10-12GB VRAM: llama3.1-8b-instruct or mistral-7b-instruct
16GB VRAM: qwen2.5-14b-instruct
24GB VRAM: qwen2.5-32b-instruct (at Q3_K_M) or deepseek-r1-distill-qwen-14b

When LM Studio shows a model in search results, it lists multiple quantization variants. For most users:

Q4_K_M — best default. Good quality, practical size.
Q5_K_M — slightly better quality if you have headroom.
Q8_0 — near-lossless, use if the model fits comfortably.
Q2_K or Q3_K_M — only if your VRAM is tight and you need to fit a larger model.

Click the download arrow on your chosen variant. LM Studio downloads to ~/lm-studio/models/ by default (configurable in Preferences → Storage).

Step 4: Load the Model

Click the home icon (house) in the left sidebar to go to the main view. Click "Select a model to load" and choose your downloaded model.

LM Studio shows a configuration panel before loading:

GPU Offload: Set this to 100% (all layers on GPU) if the model fits in VRAM. Reduce it only if you're intentionally splitting to CPU RAM.
Context Length: Default is usually 2048-4096. Higher context = more VRAM used by KV cache. For a 7B model on 8GB VRAM, 4096 context is safe. For 13B on 12GB, cap at 2048-4096.
Prompt Template: Leave as Auto — LM Studio detects the correct chat template from the model's metadata.

Click Load. The status bar at the bottom shows loading progress. A 7B model loads in 5-15 seconds on an NVMe drive with VRAM. A 70B model may take 30-60 seconds.

Once loaded, the model name and token/sec indicator appear in the bottom bar. You can test it in the Chat tab.

Step 5: Enable the Local API Server

This is what turns LM Studio from a chat toy into a local AI infrastructure component.

In the left sidebar, click the </> (Developer) icon. You'll see the Local Server tab.

Enable the server toggle. The server starts at localhost:1234 by default.

To access from other devices on your network:

Change the Host field from localhost to 0.0.0.0
The port stays at 1234 (change it if you need to)
Click the green "Start Server" button

LM Studio shows the full URL — something like http://192.168.1.105:1234. Note this IP address — you'll use it from other devices.

Windows Firewall (required for network access):

Open PowerShell as Administrator and run:

New-NetFirewallRule -DisplayName "LM Studio" -Direction Inbound -Protocol TCP -LocalPort 1234 -Action Allow

Or manually: Windows Security → Firewall → Advanced Settings → Inbound Rules → New Rule → Port → TCP 1234.

Test it from another device:

curl http://192.168.1.105:1234/v1/models

You should get a JSON response listing your loaded model. If you get connection refused, the firewall rule wasn't applied or the server isn't bound to 0.0.0.0.

Step 6: Connect Other Applications

LM Studio's API is OpenAI-compatible. Any tool that accepts an OpenAI API endpoint can point at your local server.

Open WebUI (browser-based chat frontend):

docker run -d -p 3000:8080 \
  -e OPENAI_API_BASE_URL=http://YOUR-PC-IP:1234/v1 \
  -e OPENAI_API_KEY=lm-studio \
  ghcr.io/open-webui/open-webui:main

Continue.dev (VS Code AI coding assistant):

In ~/.continue/config.json:

{
  "models": [{
    "title": "Local LM Studio",
    "provider": "openai",
    "model": "local-model",
    "apiBase": "http://localhost:1234/v1",
    "apiKey": "lm-studio"
  }]
}

Python script:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")
response = client.chat.completions.create(
    model="local-model",
    messages=[{"role": "user", "content": "Explain quantization in one paragraph."}]
)
print(response.choices[0].message.content)

The api_key value doesn't matter — LM Studio accepts any non-empty string.

VRAM Requirements Reference Table

For quick reference when choosing quantization level:

F16

~13.5 GB

~26 GB

~68 GB

~140 GB For understanding exactly why these numbers are what they are — and why VRAM usage grows during long conversations — see our KV cache and VRAM guide.

Troubleshooting Common Issues

Model loads but inference is slow (< 5 tokens/sec): GPU offload is probably set to 0% or low. In the load dialog, set GPU Offload to 100%. Check the status bar shows your GPU name, not "CPU."

Out of memory error on load: Model is too large for your VRAM. Either choose a more aggressive quantization (Q3_K_M or Q2_K), reduce context length, or use a smaller model parameter count.

Can't connect from another device:

Confirm server is bound to 0.0.0.0, not localhost
Check Windows Firewall rule is active
Confirm both devices are on the same subnet (same router)
Try disabling firewall temporarily to confirm it's the culprit

Model not showing in API response: The server only serves the currently loaded model. Go back to the main view, confirm the model is loaded (green indicator), then return to Developer tab.

For a comparison of LM Studio against Ollama and llama.cpp, see Ollama vs LM Studio vs llama.cpp vs vLLM. For a broader homelab API server setup covering multi-device access, see our gaming PC local LLM server guide.