Comment by muyuu
5 hours ago
i have a Strix Halo machine
typically those dense models are too slow on Strix Halo to be practical, expect 5-7 tps
you can get an idea by looking at other dense benchmarks here: https://strixhalo.zurkowski.net/experiments - i'd expect this model to be tested here soon, i don't think i will personally bother
This one is around 250 t/s prefill and 12.4 generation in my testing.
interesting, might be worth having around although it is still pretty slow
similar numbers here - slightly higher PP. slightly better peak speed and retention w/ q8_0 kv cache quants too. llama-bench results here, cba to format for hn: https://pastebin.com/raw/zgJeqRbv
GTR 9 Pro, "performance" profile in BIOS, GTT instead of GART, Fedora 44