GGML — Local AI Glossary | CraftRigs

GGML (GPT-Generated Model Language) was the original file format used by llama.cpp to store quantized language models. Created by Georgi Gerganov alongside llama.cpp in early 2023, it became the first widely-used format for running LLMs locally on consumer hardware.

Why It Was Replaced

GGML had several practical limitations that became apparent as the ecosystem grew:

No metadata standard — model architecture, tokenizer settings, and quantization details had to be inferred or stored separately
Breaking changes — updates to llama.cpp frequently broke compatibility with existing GGML files, forcing users to re-download models
No versioning — files had no built-in version information, making compatibility tracking difficult

GGUF was introduced in August 2023 specifically to address these issues. It added a structured header for metadata, stable versioning, and a self-describing format that could be parsed without knowing the model architecture in advance.

Current Status

GGML files are effectively obsolete. The llama.cpp project dropped GGML support in favor of GGUF, and model repositories on HuggingFace have largely replaced GGML files with GGUF equivalents.

If you have a GGML file, you will likely need to convert it to GGUF using the conversion scripts provided in the llama.cpp repository before it will load in current software. Most model downloads from TheBloke and other quantization providers now exclusively offer GGUF.

Relationship to GGUF

GGUF is a direct successor designed for backward-compatible evolution. The name follows the same pattern — G (Georgi) G (Gerganov) plus a descriptor. The key difference is that GGUF files are self-describing: all the information needed to load and run the model is embedded in the file itself.