Comment by muyuu
7 hours ago
i have a Strix Halo machine
typically those dense models are too slow on Strix Halo to be practical, expect 5-7 tps
you can get an idea by looking at other dense benchmarks here: https://strixhalo.zurkowski.net/experiments - i'd expect this model to be tested here soon, i don't think i will personally bother
Yep, clocking a run right now that's averaging about 8.7t/s. But when I don't mind waiting while I go eat a meal or something, it's not bad!
EDIT: I'm running the Unsloth Qwen3.6-27B-Q6_K GGUF on a Corsair Strix Halo 128GB I bought summer 2025.
https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/blob/main/Qw...
This one is around 250 t/s prefill and 12.4 generation in my testing.
interesting, might be worth having around although it is still pretty slow
similar numbers here - slightly higher PP. slightly better peak speed and retention w/ q8_0 kv cache quants too. llama-bench results here, cba to format for hn: https://pastebin.com/raw/zgJeqRbv
GTR 9 Pro, "performance" profile in BIOS, GTT instead of GART, Fedora 44
If I did a proper benchmark I think the numbers would be what you got. Minimax M2.7 is also surprisingly not that slow, and in some ways faster as it seems to get things right with less thinking output. (around 140 t/s prefill and 23 t/s generation).
1 reply →