hipcc — Local AI Glossary | CraftRigs

hipcc is the compiler driver that turns C++/HIP source into GPU code for AMD hardware, the AMD-side counterpart to NVIDIA's nvcc. If you're building local AI tooling for a Radeon or Instinct card, hipcc is the binary that actually produces the GPU kernels your inference runtime will call.

Where It Sits in the Stack

hipcc ships as part of the ROCm toolchain and wraps the underlying Clang-based HIP compiler. When you compile llama.cpp with -DLLAMA_HIPBLAS=ON, the build system invokes hipcc to produce GPU kernels that link against hipBLAS and other ROCm libraries. Seeing hipcc invocations stream past during cmake --build build --verbose is the strongest signal that your binary will actually have AMD GPU support baked in — not just CPU fallback.

How to Verify It Ran

The most common AMD-side failure mode for local LLM rigs is a binary compiled without GPU support, which surfaces as "0 layers offloaded" at runtime no matter what flags you pass. The fix is build-time, not runtime. Run ldd llama-server | grep -E "cuda|hip" — empty output means no GPU backend was linked in. Rebuild with cmake -B build -DLLAMA_HIPBLAS=ON, watch the verbose build log for hipcc invocations, then confirm activity at load with rocm-smi --showmeminfo while the model spins up. No hipcc lines in the build log means no AMD acceleration in the resulting binary, full stop.

Why It Matters for Local AI

For AMD builders, hipcc is the gate between "my 7900 XTX runs models at GPU speed" and "my expensive card sits idle while the CPU chugs through tokens." Unlike CUDA on NVIDIA — where prebuilt wheels and binaries usually just work — ROCm-backed inference frequently requires building from source, which means hipcc has to be on your PATH and getting called by your build system. If you're choosing AMD for the VRAM-per-dollar advantage, treat a clean hipcc build log as a non-negotiable checkpoint before you start benchmarking tokens per second.