CraftRigs
CraftRigs / Glossary / HIPAA-Sensitive
Networking & Inference

HIPAA-Sensitive

Describes data, workloads, or environments that handle protected health information (PHI) and must comply with HIPAA's privacy and security rules.

HIPAA-sensitive workloads are any local AI tasks that touch protected health information (PHI) — patient records, clinical notes, imaging, billing data — and therefore fall under the U.S. Health Insurance Portability and Accountability Act. For builders, this is the constraint that pushes inference off cloud APIs and onto hardware you physically control.

Why Healthcare Runs Local

Hospitals, clinics, and research labs that want to use LLMs on patient data cannot send prompts to OpenAI, Anthropic, or any third-party endpoint without a Business Associate Agreement (BAA) and a defensible audit trail. Even with a BAA, many compliance teams reject cloud inference outright because PHI leaving the network creates discovery risk. Running the model on-premises — ideally air-gapped — collapses the threat surface to a single rack.

Hardware and Software Implications

HIPAA-sensitive deployments typically require an air-gapped or strictly segmented network, full-disk encryption, role-based access logs, and reproducible model weights (no silent updates from a vendor cloud). That rules out anything that phones home for telemetry or license checks. Builds like the DGX Spark air-gapped lab and dual RTX 5090 compliance rig are sized to run mid-to-large models (70B-class and up) entirely offline, with full-RAG pipelines pointed at local document stores rather than hosted vector databases. Storage, backups, and even firmware updates get staged through a controlled transfer process.

Why It Matters for Local AI

HIPAA-sensitive is the single biggest reason a clinic will spend $10K+ on a local rig instead of paying $20/month for a hosted model. It directly shapes the build: you need enough VRAM to run a capable model without quantizing past clinical usefulness, enough system RAM to hold a real RAG index, and zero outbound network dependencies at inference time. Compliance, not raw tokens-per-second, becomes the dominant spec.