Comment by busfahrer
1 month ago
I am considering a M5 Pro (18/20C) Macbook with 64GB of RAM, but I'm having a really hard time finding benchmarks of real world models:
Could somebody please provide some tokens-per-second numbers for example for Qwen 3.6 35B/A3B, specifically for Q4 and Q6 quants?
My advice: don't just look at tokens per second, but also at time to first token (TTFT).
The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.
You can expect around 55-60t/s with Qwen3.5:35b-a3b or gemma4:26b-a4b Q4