Comment by mft_

2 hours ago

The 27B model is dense, so is relatively slow. The 35B-A3B model is marginally weaker but being MoE is much faster - like ~4-8x faster in basic benchmarks on my M1 Max.

For comparison, I just ran a couple of quick benchmarks (default settings) with llama-bench:

Qwen3.6-35B-A3B at Q6_K_XL gave 858 t/s pp512 (prompt processing) and 43 t/s tg128 (token generation).

Qwen3.6-27B at Q4_K_XL gave 103 t/s pp512 and 8 t/s tg128.

1 comment

mft_

pixelesque 1 hour ago

Thanks for the info.