Comment by muyuu

5 hours ago

i have a Strix Halo machine

typically those dense models are too slow on Strix Halo to be practical, expect 5-7 tps

you can get an idea by looking at other dense benchmarks here: https://strixhalo.zurkowski.net/experiments - i'd expect this model to be tested here soon, i don't think i will personally bother

3 comments

muyuu

hedgehog 5 hours ago

This one is around 250 t/s prefill and 12.4 generation in my testing.

muyuu 1 hour ago

interesting, might be worth having around although it is still pretty slow
anonym29 2 hours ago

similar numbers here - slightly higher PP. slightly better peak speed and retention w/ q8_0 kv cache quants too. llama-bench results here, cba to format for hn: https://pastebin.com/raw/zgJeqRbv
GTR 9 Pro, "performance" profile in BIOS, GTT instead of GART, Fedora 44