LLM (Large Language Model)
A neural network trained on large amounts of text that can generate, summarize, translate, and reason about language.
A Large Language Model (LLM) is a type of artificial neural network trained on massive text datasets to predict and generate human language. The "large" refers to the number of parameters — the numerical weights learned during training — which typically range from 1 billion to over a trillion in frontier models.
How They Work
LLMs are based on the Transformer architecture, introduced by Google researchers in 2017. At inference time (when generating a response), the model processes your input token by token, uses its trained weights to compute attention across the full context window, and predicts the most likely next token. This repeats until the response is complete.
The key hardware demand this creates: all model weights must be loaded into memory (ideally VRAM) before generation can begin, and they stay loaded for the duration of the session.
Parameter Count and Hardware Requirements
Parameter count — measured in billions (7B, 13B, 70B, etc.) — is the primary predictor of both capability and hardware requirements:
| Size | VRAM at Q4 | Capability level |
|---|---|---|
| 7B–8B | ~5GB | Capable assistant, fast on any modern GPU |
| 13B–14B | ~9GB | Strong reasoning, competitive with older GPT-4 on many tasks |
| 30B–34B | ~20GB | High capability, needs 24GB GPU |
| 70B | ~40GB | Near-frontier, needs RTX 4090 or multi-GPU |
Local vs. Cloud
Running an LLM locally means the model weights are stored on your hardware and inference happens on your CPU or GPU. No data leaves your machine. Contrast this with cloud APIs (ChatGPT, Claude, Gemini) where your prompt is sent to a remote server for processing.
Local LLMs require more upfront hardware investment but offer privacy, no rate limits, no ongoing API costs, and full control over the model and its behavior.
Common Local LLMs
- Llama 3.1/3.3 (Meta) — strong general-purpose models at 8B and 70B
- Qwen 2.5 (Alibaba) — excellent at code and multilingual tasks
- Mistral/Mixtral (Mistral AI) — efficient, strong on reasoning
- Gemma 3 (Google) — compact and capable at 9B and 27B
- Phi-4 (Microsoft) — strong performance at small parameter counts