Comment by cpburns2009
3 hours ago
Results are nearly identical running on a Strix Halo using Vulkan, llama.cpp b8884:
$ llama-batched-bench -dev Vulkan2 -hf unsloth/Qwen3.6-27B-GGUF:IQ4_XS -npp 1000,2000,4000,8000,16000,32000 -ntg 128 -npl 1 -c 34000
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 1000 | 128 | 1 | 1128 | 3.288 | 304.15 | 9.873 | 12.96 | 13.161 | 85.71 |
| 2000 | 128 | 1 | 2128 | 6.415 | 311.79 | 9.883 | 12.95 | 16.297 | 130.57 |
| 4000 | 128 | 1 | 4128 | 13.113 | 305.04 | 9.979 | 12.83 | 23.092 | 178.76 |
| 8000 | 128 | 1 | 8128 | 27.491 | 291.01 | 10.155 | 12.61 | 37.645 | 215.91 |
| 16000 | 128 | 1 | 16128 | 59.079 | 270.83 | 10.476 | 12.22 | 69.555 | 231.87 |
| 32000 | 128 | 1 | 32128 | 148.625 | 215.31 | 11.084 | 11.55 | 159.709 | 201.17 |
No comments yet
Contribute on Hacker News ↗