LM Studio — Local AI Glossary | CraftRigs

LM Studio is a desktop application that makes running local LLMs accessible without any command-line setup. It includes a model browser integrated with Hugging Face, GPU-accelerated inference powered by llama.cpp under the hood, a chat interface, and an OpenAI-compatible API server for connecting other applications.

Who It's For

LM Studio is the recommended starting point for anyone who wants to run local models without dealing with terminal commands, manual configuration, or dependency management. The model download, configuration, and chat interface are all in one GUI. For developers who want to replace OpenAI API calls with local inference, LM Studio's built-in API server mode works as a drop-in replacement.

Key Features

Model browser: Integrated search and download from Hugging Face, with VRAM requirements shown per quantization level. You can see at a glance whether a model fits your hardware before downloading.

GPU acceleration: Automatically uses CUDA (NVIDIA), ROCm (AMD), or Metal (Apple Silicon) depending on your hardware. No manual configuration needed for most setups.

OpenAI-compatible API: Run LM Studio in server mode on port 1234 and point any OpenAI-compatible application (Continue, Cursor, custom code) to localhost:1234. The API matches the OpenAI chat completions format.

Chat and prompt templates: Built-in chat interface with support for system prompts, conversation history management, and model-specific prompt formatting.

Platform Support

LM Studio runs on Windows, macOS (Intel and Apple Silicon), and Linux. Apple Silicon support with Metal acceleration is particularly strong — Mac users often find LM Studio the smoothest local AI experience due to tight hardware integration.

Relationship to llama.cpp

LM Studio uses llama.cpp as its inference backend. Models must be in GGUF format. The GUI wraps llama.cpp's functionality with a user-friendly interface, making the underlying power of llama.cpp accessible without CLI knowledge.

For power users who need more control over inference parameters, quantization, or serving configurations, llama.cpp or Ollama directly may be preferable.