Comment by regularfry
5 hours ago
To add more complexity to the picture, you can run MoE models at a higher quant than you might think, because CPU expert offload is less impactful than full layer offload for dense models.
5 hours ago
To add more complexity to the picture, you can run MoE models at a higher quant than you might think, because CPU expert offload is less impactful than full layer offload for dense models.
No comments yet
Contribute on Hacker News ↗