oneAPI — Local AI Glossary | CraftRigs

oneAPI is Intel's unified software stack for programming its CPUs, GPUs, and other accelerators through a single set of libraries and compilers. For local AI builders running Arc cards, it's the Intel-native path to GPU acceleration — the equivalent of what CUDA is to NVIDIA and ROCm is to AMD.

What It Actually Includes

oneAPI bundles the DPC++/SYCL compiler, oneMKL (math kernels), oneDNN (deep neural network primitives), and the Level Zero runtime that talks directly to Intel GPUs. Inference backends like IPEX-LLM and Intel's llama.cpp SYCL builds sit on top of this stack to push tensor math onto Arc silicon. Without oneAPI installed, those backends fall back to CPU or refuse to launch — Vulkan is the only reasonable alternative for GPU compute on Arc without it.

oneAPI vs Vulkan on Arc

This is the practical fork in the road for Arc owners. Vulkan works through llama.cpp's Vulkan backend with zero Intel-specific drivers beyond the standard graphics stack — install and run. oneAPI requires a heavier toolkit install (multi-GB), driver alignment, and environment variables, but in return unlocks SYCL-optimized kernels and IPEX-LLM, which generally squeeze more tokens per second out of the same card on dense models. Vulkan is the "it just works" path; oneAPI is the "I want maximum throughput" path.

Why It Matters for Local AI

If you're picking an Arc B580, B65, or B70 specifically to run LLMs, oneAPI is what determines whether your card behaves like a real inference accelerator or a graphics card with extra VRAM. Skipping it means leaving real decode-speed gains on the table, especially on 27B-class dense models where kernel efficiency compounds across every token. The tradeoff is install complexity and a stack that's still maturing compared to CUDA — so the question "do I need oneAPI or can I just use Vulkan?" has a real answer: Vulkan to start, oneAPI when you want the card's actual ceiling.