Comment by muyuu

7 hours ago

i have a Strix Halo machine

typically those dense models are too slow on Strix Halo to be practical, expect 5-7 tps

you can get an idea by looking at other dense benchmarks here: https://strixhalo.zurkowski.net/experiments - i'd expect this model to be tested here soon, i don't think i will personally bother

This one is around 250 t/s prefill and 12.4 generation in my testing.

  • interesting, might be worth having around although it is still pretty slow

  • similar numbers here - slightly higher PP. slightly better peak speed and retention w/ q8_0 kv cache quants too. llama-bench results here, cba to format for hn: https://pastebin.com/raw/zgJeqRbv

    GTR 9 Pro, "performance" profile in BIOS, GTT instead of GART, Fedora 44

    • If I did a proper benchmark I think the numbers would be what you got. Minimax M2.7 is also surprisingly not that slow, and in some ways faster as it seems to get things right with less thinking output. (around 140 t/s prefill and 23 t/s generation).

      1 reply →