GGML
The predecessor file format to GGUF for storing quantized LLMs, used by early versions of llama.cpp.
GGML (GPT-Generated Model Language) was the original file format used by llama.cpp to store quantized language models. Created by Georgi Gerganov alongside llama.cpp in early 2023, it became the first widely-used format for running LLMs locally on consumer hardware.
Why It Was Replaced
GGML had several practical limitations that became apparent as the ecosystem grew:
- No metadata standard — model architecture, tokenizer settings, and quantization details had to be inferred or stored separately
- Breaking changes — updates to llama.cpp frequently broke compatibility with existing GGML files, forcing users to re-download models
- No versioning — files had no built-in version information, making compatibility tracking difficult
GGUF was introduced in August 2023 specifically to address these issues. It added a structured header for metadata, stable versioning, and a self-describing format that could be parsed without knowing the model architecture in advance.
Current Status
GGML files are effectively obsolete. The llama.cpp project dropped GGML support in favor of GGUF, and model repositories on HuggingFace have largely replaced GGML files with GGUF equivalents.
If you have a GGML file, you will likely need to convert it to GGUF using the conversion scripts provided in the llama.cpp repository before it will load in current software. Most model downloads from TheBloke and other quantization providers now exclusively offer GGUF.
Relationship to GGUF
GGUF is a direct successor designed for backward-compatible evolution. The name follows the same pattern — G (Georgi) G (Gerganov) plus a descriptor. The key difference is that GGUF files are self-describing: all the information needed to load and run the model is embedded in the file itself.