KoboldCpp
Inference server with a web UI designed for creative writing and roleplay — built on llama.cpp with additional sampling controls and story management features.
KoboldCpp is a local inference server that wraps llama.cpp (and optionally ExLlamaV2) with a web-based UI specifically designed for creative writing, roleplay, and interactive fiction. It exposes the full range of sampling parameters that standard chat interfaces hide — top-k, top-p, mirostat, repetition penalty — giving writers granular control over text generation style and randomness.
What Makes KoboldCpp Different
Most local AI interfaces (Ollama, LM Studio) optimize for assistant-style chat interactions: a system prompt, user messages, assistant responses. KoboldCpp is designed around a story metaphor — a continuous text that you and the AI build together, with memory, author's notes, and world-building context all managed separately.
Key features that set it apart:
- Adventure mode: AI dungeon-style text adventure interface
- Author's notes: Background context injected at a configurable position in the context window
- Memory/World info: Dynamic context injection based on keywords in the current text
- Story formatting: Plain text instead of chat bubbles, better for prose and fiction
Hardware and Backend
KoboldCpp uses llama.cpp as its default backend, supporting GGUF models on NVIDIA (CUDA), AMD (ROCm), Apple Silicon (Metal), and CPU. An optional ExLlamaV2 backend provides higher performance for NVIDIA users.
The hardware requirements match llama.cpp — any modern GPU with sufficient VRAM for your chosen model. KoboldCpp adds minimal overhead on top of llama.cpp's baseline performance.
API Compatibility
KoboldCpp exposes a KoboldAI-compatible API that works with frontends like SillyTavern — the most popular UI for character-based roleplay and interactive fiction. The combination of KoboldCpp (backend) + SillyTavern (frontend) is the standard stack for local roleplay AI.
For users focused on productivity tasks (coding, summarization, Q&A), Ollama or LM Studio are simpler choices. KoboldCpp is purpose-built for creative and narrative use cases.