Comment by stared

1 day ago

I really like Qwen 3.6 27B Q8.

On Apple Silicon, with MLX-LM, I am getting 20 tok/s with Macbook Max M5. Not sure how it compares to llama.cpp performance.

In any case, while it is noticeably slower than this Nvidia RTX setup, being able to run such models on laptop is wild. Though, it heats my laptop rapidly.

0 comments

stared

No comments yet

Contribute on Hacker News ↗