← Back to context

Comment by busfahrer

1 month ago

I am considering a M5 Pro (18/20C) Macbook with 64GB of RAM, but I'm having a really hard time finding benchmarks of real world models:

Could somebody please provide some tokens-per-second numbers for example for Qwen 3.6 35B/A3B, specifically for Q4 and Q6 quants?

My advice: don't just look at tokens per second, but also at time to first token (TTFT).

The local inference space is leaning to MoE models, and a lot of them have decent tokens / second, but horrible TTFT.