LLM (Large Language Model) — Local AI Glossary | CraftRigs

A Large Language Model (LLM) is a type of artificial neural network trained on massive text datasets to predict and generate human language. The "large" refers to the number of parameters — the numerical weights learned during training — which typically range from 1 billion to over a trillion in frontier models.

How They Work

LLMs are based on the Transformer architecture, introduced by Google researchers in 2017. At inference time (when generating a response), the model processes your input token by token, uses its trained weights to compute attention across the full context window, and predicts the most likely next token. This repeats until the response is complete.

The key hardware demand this creates: all model weights must be loaded into memory (ideally VRAM) before generation can begin, and they stay loaded for the duration of the session.

Parameter Count and Hardware Requirements

Parameter count — measured in billions (7B, 13B, 70B, etc.) — is the primary predictor of both capability and hardware requirements:

Size	VRAM at Q4	Capability level
7B–8B	~5GB	Capable assistant, fast on any modern GPU
13B–14B	~9GB	Strong reasoning, competitive with older GPT-4 on many tasks
30B–34B	~20GB	High capability, needs 24GB GPU
70B	~40GB	Near-frontier, needs RTX 4090 or multi-GPU

Local vs. Cloud

Running an LLM locally means the model weights are stored on your hardware and inference happens on your CPU or GPU. No data leaves your machine. Contrast this with cloud APIs (ChatGPT, Claude, Gemini) where your prompt is sent to a remote server for processing.

Local LLMs require more upfront hardware investment but offer privacy, no rate limits, no ongoing API costs, and full control over the model and its behavior.

Common Local LLMs

Llama 3.1/3.3 (Meta) — strong general-purpose models at 8B and 70B
Qwen 2.5 (Alibaba) — excellent at code and multilingual tasks
Mistral/Mixtral (Mistral AI) — efficient, strong on reasoning
Gemma 3 (Google) — compact and capable at 9B and 27B
Phi-4 (Microsoft) — strong performance at small parameter counts