Comment by busfahrer

1 month ago

I am considering a M5 Pro (18/20C) Macbook with 64GB of RAM, but I'm having a really hard time finding benchmarks of real world models:

Could somebody please provide some tokens-per-second numbers for example for Qwen 3.6 35B/A3B, specifically for Q4 and Q6 quants?

2 comments

busfahrer

My advice: don't just look at tokens per second, but also at time to first token (TTFT).

The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.

You can expect around 55-60t/s with Qwen3.5:35b-a3b or gemma4:26b-a4b Q4