Fine-Tuning
Training a pre-trained model on additional data to specialize its behavior, improve task performance, or adjust its output style.
Fine-tuning is the process of continuing to train a pre-trained language model on new data. The base model (e.g., Llama 3.1 8B) has been trained on trillions of tokens of general internet text. Fine-tuning updates the weights further using a smaller, curated dataset to shift the model's behavior toward a specific task, style, or domain.
Types of Fine-Tuning
Instruction fine-tuning: Training a base model to follow instructions reliably. This is what turns a raw prediction model into a chat assistant. Models labeled "Instruct" or "Chat" variants have been through this process.
Domain fine-tuning: Training on specialized corpora (medical literature, legal documents, code) to improve domain-specific knowledge and vocabulary. A general 7B model fine-tuned on clinical notes will outperform a larger general model on clinical tasks.
Preference fine-tuning (RLHF / DPO): Training the model to prefer certain outputs over others, based on human feedback or AI-generated preference data. This is how safety guardrails and helpfulness tuning are applied.
Hardware Requirements
Full fine-tuning requires storing the entire model in VRAM along with gradients and optimizer states — roughly 4–6x the base model size in VRAM. A 7B model at FP16 needs 50–80GB for full fine-tuning, which requires data center hardware.
Consumer hardware fine-tuning uses LoRA or QLoRA, which reduce the trainable parameters by 90%+ and make fine-tuning 7B–13B models feasible on a single RTX 3090 or 4090 (24GB).
When Fine-Tuning Is Worth It
Fine-tuning makes sense when:
- You have proprietary data that general models haven't seen
- You need consistent output formatting a base model struggles with
- You're building a specialized product and accuracy in a narrow domain matters
For most local AI use cases, prompt engineering and retrieval augmentation (RAG) accomplish similar goals without training. Reserve fine-tuning for cases where prompt-based approaches have clear ceilings.
Running Fine-Tuned Models
After training, fine-tuned models can be exported to GGUF format for use in llama.cpp, Ollama, LM Studio, or any standard inference software. If trained with LoRA, the adapter can be merged into the base model first or loaded separately at inference time.
Related guides: Fine-tuning a 7B LLM on a consumer GPU with Unsloth and LoRA — practical walkthrough of the full fine-tuning pipeline on a single RTX 3090 or 4090. GGUF vs GPTQ vs AWQ vs EXL2: which quantization format should you use? — choosing the right export format after fine-tuning is complete.