CraftRigs
BETA Tool

AI Hardware Calculator

Estimate VRAM requirements, inference speed, and power consumption for Large Language Models across consumer and enterprise hardware.

Hover or tap the ? icons next to each field for guidance

Target Hardware

GB

CPU offloading reduces VRAM need but slows inference significantly

Model Parameters

B

Total parameters loaded into VRAM

Total VRAM Required ?The total GPU memory needed: model weights + KV cache + runtime overhead. If this exceeds your GPU's VRAM, you'll need to quantize further, use RAM offload, or get a bigger GPU.

42.4 GB

Exceeds Single GPU Capacity (24GB)

Est. Speed ?Estimated tokens generated per second. 10+ tok/s feels conversational. 5-10 is usable with lag. Below 5 is best for batch/offline use. Speed depends on memory bandwidth — not compute cores.

18.5 tok/s

Good for Chat

42.4GB / 48GB
24GB Limit
Model Weights
KV Cache
Overhead

Technical Breakdown

Weights Size 36.8 GB
KV Cache (Context) 4.2 GB
Runtime Overhead 1.4 GB
Est. Power Draw ~580 W
Memory Bandwidth 1,008 GB/s

Hardware Notes

    All calculations are estimates based on standard quantization overheads and published memory bandwidth specs. Actual performance depends on quantization format (GGUF, EXL2, AWQ), software (llama.cpp, vLLM, MLX), system configuration, driver version, and workload. MoE models load all expert weights but activate only a subset per token. Apple Silicon estimates use MLX-typical throughput. Used GPU prices fluctuate — check eBay and r/hardwareswap for current market rates.