Comment by muyuu

7 hours ago

i have a Strix Halo machine

typically those dense models are too slow on Strix Halo to be practical, expect 5-7 tps

you can get an idea by looking at other dense benchmarks here: https://strixhalo.zurkowski.net/experiments - i'd expect this model to be tested here soon, i don't think i will personally bother

6 comments

muyuu

rpdillon 2 hours ago

Yep, clocking a run right now that's averaging about 8.7t/s. But when I don't mind waiting while I go eat a meal or something, it's not bad!

EDIT: I'm running the Unsloth Qwen3.6-27B-Q6_K GGUF on a Corsair Strix Halo 128GB I bought summer 2025.

https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/blob/main/Qw...

hedgehog 7 hours ago

This one is around 250 t/s prefill and 12.4 generation in my testing.

muyuu 3 hours ago

interesting, might be worth having around although it is still pretty slow
anonym29 4 hours ago
similar numbers here - slightly higher PP. slightly better peak speed and retention w/ q8_0 kv cache quants too. llama-bench results here, cba to format for hn: https://pastebin.com/raw/zgJeqRbv
GTR 9 Pro, "performance" profile in BIOS, GTT instead of GART, Fedora 44
- hedgehog 2 hours ago
  
  If I did a proper benchmark I think the numbers would be what you got. Minimax M2.7 is also surprisingly not that slow, and in some ways faster as it seems to get things right with less thinking output. (around 140 t/s prefill and 23 t/s generation).
  
  1 reply →