Yes, I have an AMD Ryzen AI Max+ chip with memory set to allocate 96 gigs to the GPU and 32 gigs to the CPU. I got it last week, and I've been running gpt-oss-120b at q5 at 40t/s. I run Linux with llama.cpp compiled against ROCm 7.
It works on the GPU, and the NPU is supposed to work on Windows using a framework called lemonade (I haven't tried), but the NPU is not supported with the same software stack on Linux yet.
Yes, I have an AMD Ryzen AI Max+ chip with memory set to allocate 96 gigs to the GPU and 32 gigs to the CPU. I got it last week, and I've been running gpt-oss-120b at q5 at 40t/s. I run Linux with llama.cpp compiled against ROCm 7.
Did you try the native mxfp4 (obviously, Vulkan/ROCm would have to load and upscale it)?
It works on the GPU, and the NPU is supposed to work on Windows using a framework called lemonade (I haven't tried), but the NPU is not supported with the same software stack on Linux yet.
Barely.