Comment by zozbot234
3 days ago
You can run SOTA local MoE models very slowly by streaming the weights in from a fast PCIe 5 SSD. Kimi 2.5 (generally considered in the ballpark of current sonnet, not opus of course) has been measured as 2 tok/s on Apple M5 hardware, which is the best-case performance unless you have niche HEDT hardware with lots of PCIe lanes to attach storage to and figure out how to use that amount of parallel transfer throughput.
No comments yet
Contribute on Hacker News ↗