Comment by ggerganov
3 hours ago
llama-batched-bench -hf ggml-org/Qwen3.6-27B-GGUF -npp 512,1024,2048,4096,8192,16384,32768 -ntg 128 -npl 1 -c 36000
M2 Ultra, Q8_0
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 512 | 128 | 1 | 640 | 1.307 | 391.69 | 6.209 | 20.61 | 7.516 | 85.15 |
| 1024 | 128 | 1 | 1152 | 2.534 | 404.16 | 6.227 | 20.56 | 8.760 | 131.50 |
| 2048 | 128 | 1 | 2176 | 5.029 | 407.26 | 6.229 | 20.55 | 11.258 | 193.29 |
| 4096 | 128 | 1 | 4224 | 10.176 | 402.52 | 6.278 | 20.39 | 16.454 | 256.72 |
| 8192 | 128 | 1 | 8320 | 20.784 | 394.14 | 6.376 | 20.08 | 27.160 | 306.33 |
| 16384 | 128 | 1 | 16512 | 43.513 | 376.53 | 6.532 | 19.59 | 50.046 | 329.94 |
| 32768 | 128 | 1 | 32896 | 99.137 | 330.53 | 7.081 | 18.08 | 106.218 | 309.70 |
DGX Spark, Q8_0
| PP | TG | B | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s | T s | S t/s |
|-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
| 512 | 128 | 1 | 640 | 0.881 | 580.98 | 16.122 | 7.94 | 17.003 | 37.64 |
| 1024 | 128 | 1 | 1152 | 1.749 | 585.43 | 16.131 | 7.93 | 17.880 | 64.43 |
| 2048 | 128 | 1 | 2176 | 3.486 | 587.54 | 16.169 | 7.92 | 19.655 | 110.71 |
| 4096 | 128 | 1 | 4224 | 7.018 | 583.64 | 16.245 | 7.88 | 23.263 | 181.58 |
| 8192 | 128 | 1 | 8320 | 14.189 | 577.33 | 16.427 | 7.79 | 30.617 | 271.75 |
| 16384 | 128 | 1 | 16512 | 29.015 | 564.68 | 16.749 | 7.64 | 45.763 | 360.81 |
| 32768 | 128 | 1 | 32896 | 60.413 | 542.40 | 17.359 | 7.37 | 77.772 | 422.98 |
No comments yet
Contribute on Hacker News ↗