The AMD Ryzen AI Max series — particularly the AI Max+ 395 — is one of the most compelling platforms for local LLM inference because of its unified memory architecture. With 128GB of LPDDR5X shared between CPU and iGPU, you can theoretically run 70B models entirely in GPU memory. The problem: Linux does not give the GPU all that memory by default.
Out of the box, the iGPU on Ryzen AI Max typically exposes only a fraction of total RAM as usable VRAM through GTT (Graphics Translation Table) memory. If you check rocm-smi after a fresh install, you might see only 8–16GB of VRAM available. The remaining memory is there — the OS just has not been told to hand it to the GPU.
This guide covers exactly how to change that.
Quick Summary
- Default VRAM: Only 8–16GB accessible out of the box on most Linux distros
- Fix: Set
amdttm.pages_limitandamdttm.page_pool_sizekernel parameters —amdgpu.gttsizeis deprecated and does nothing on modern kernels - Practical max: ~108GB on AI Max+ 395 with 128GB RAM; 96GB is a stable, safe target
Understanding GTT Memory vs. Dedicated VRAM
On discrete AMD GPUs (RX 7900 XTX, for example), VRAM is physically on the card and GTT memory is an additional pool of system RAM the GPU can address. The two pools are distinct.
On Ryzen AI Max, there is no discrete VRAM. The iGPU uses a portion of the system LPDDR5X directly. GTT memory is the VRAM. The driver needs to know how large that pool can grow, and by default it is conservative — typically set to a fraction of installed RAM.
This is a kernel TTM (Translation Table Manager) decision, not a firmware setting. You cannot change it in BIOS/UEFI. It requires a kernel parameter.
The Deprecated Method (Do Not Use)
Before kernel 6.1, the common advice was:
amdgpu.gttsize=98304
This parameter accepted a size in megabytes. It no longer works. On Linux 6.1+, this parameter is silently ignored. If you search forum posts and find someone telling you to use amdgpu.gttsize, that advice is outdated. You will add it to your kernel command line, reboot, and see no change in rocm-smi.
The Modern Method: amdttm Kernel Parameters
The correct parameters for current kernels are:
amdttm.pages_limit— maximum number of 4KB pages the TTM allocator can manageamdttm.page_pool_size— size of the pre-allocated page pool
Both control the same underlying limit from different angles. Setting pages_limit is the primary lever.
The Formula
pages_limit = target_GB × 1024 × 1024 / 4.096
Breaking that down: you are converting gigabytes to 4KB pages.
- 1 GB = 1,073,741,824 bytes
- 1 page = 4,096 bytes
- Pages per GB = 1,073,741,824 / 4,096 = 262,144
The simplified form 1024 × 1024 / 4.096 approximates this and is the value you will see in most documentation.
Common targets:
| Target VRAM | pages_limit value |
|---|---|
| 32 GB | 8,388,608 |
| 48 GB | 12,582,912 |
| 64 GB | 16,777,216 |
| 96 GB | 25,165,824 |
| 108 GB | 28,311,552 |
For the AI Max+ 395 with 128GB RAM, 96GB is the recommended target. This leaves ~32GB for the OS, running applications, and kernel overhead. Pushing to 108GB is possible but increases the risk of OOM kills during inference if the model's KV cache grows large.
Applying the Parameters: Fedora
Fedora uses GRUB2. The parameters go in /etc/default/grub.
Step 1: Edit the GRUB config
sudo nano /etc/default/grub
Find the line that starts with GRUB_CMDLINE_LINUX= and add the parameters inside the quotes:
GRUB_CMDLINE_LINUX="... amdttm.pages_limit=25165824 amdttm.page_pool_size=25165824"
Keep any existing parameters intact. Only append.
Step 2: Regenerate the GRUB config
On Fedora with a BIOS/MBR system:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
On Fedora with UEFI:
sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg
If you are unsure which applies, run both. The one targeting the non-existent path will fail harmlessly.
Step 3: Reboot
sudo reboot
Applying the Parameters: Ubuntu / Debian
sudo nano /etc/default/grub
Add to GRUB_CMDLINE_LINUX_DEFAULT:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdttm.pages_limit=25165824 amdttm.page_pool_size=25165824"
Then:
sudo update-grub
sudo reboot
Applying the Parameters: Arch Linux
Edit /etc/default/grub the same way, then:
sudo grub-mkconfig -o /boot/grub/grub.cfg
sudo reboot
If you use systemd-boot instead of GRUB, add the parameters to your loader entry in /boot/loader/entries/.
Verifying the Change
After reboot, use two methods to confirm GTT memory is available.
Method 1: dmesg
sudo dmesg | grep -i gtt
You should see a line similar to:
[drm] amdgpu: 98304M of GTT memory ready.
If the value matches your target (approximately — it will be in MB), the parameter took effect.
Method 2: rocm-smi
rocm-smi --showmeminfo vram
Look for the GTT total. On a correctly configured AI Max+ 395 with 96GB target, you should see approximately 98,304 MB total GTT.
Method 3: /sys filesystem
cat /sys/class/drm/card0/device/mem_info_gtt_total
This returns bytes. Divide by 1,073,741,824 to get GB.
What To Do If It Does Not Work
Kernel too old: amdttm module parameters require a reasonably recent kernel. Linux 6.6+ is recommended for Ryzen AI Max. Check with uname -r. On Fedora 40/41, you should already be on a supported kernel. On older Ubuntu LTS releases, you may need to install a mainline kernel.
ROCm not installed: If ROCm drivers are not installed, the amdgpu kernel module may load in a limited mode that ignores TTM parameters. Install ROCm 6.x before attempting this configuration.
Parameter not being passed: Verify GRUB is actually using your edited config file. Some systems have multiple GRUB configs. Run sudo grep amdttm /proc/cmdline after reboot — if the parameters do not appear there, GRUB is not using the file you edited.
Practical VRAM Allocation for Inference
Once GTT memory is unlocked, your inference stack (Ollama, llama.cpp, etc.) needs to know to use it. Most tools detect VRAM automatically via ROCm. A 96GB allocation makes the following models fully GPU-resident:
Fits in 96GB GTT?
Yes
Yes
Yes
Yes
No For 405B and larger models, CPU+GPU hybrid inference with llama.cpp is still required — see our guide on CPU+GPU hybrid inference with llama.cpp for layer offloading strategy. For a full review of the hardware that makes best use of this configuration, see our GMKtec EVO-X2 review and the AMD Strix Halo vs Mac Mini M4 comparison.
Tokens per second on the AI Max+ 395 with models fully in GTT memory is significantly better than CPU inference: expect 15–30 t/s on 70B Q4_K_M, depending on context length and batch size. This is competitive with a single RTX 4090 for prompt processing, though slightly behind on generation speed due to LPDDR5X bandwidth versus GDDR6X.
Memory Bandwidth Consideration
One caveat about the AI Max approach: LPDDR5X bandwidth tops out around 273 GB/s on the AI Max+ 395. A discrete RTX 4090 provides 1,008 GB/s of GDDR6X bandwidth. For pure generation speed on large models, the RTX 4090 wins. Where the AI Max shines is total capacity — 96GB of fast-enough unified memory at a fraction of the cost of dual RTX 4090s, in a power envelope under 120W.
If you are running a 70B model and need maximum throughput rather than maximum capacity, a discrete GPU build still wins. But if you need to run 70B+ models on a single laptop or mini PC without external GPUs, the Ryzen AI Max with properly configured GTT memory is the most practical option available today.