Comment by bigyabai
3 hours ago
A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.
3 hours ago
A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.
No comments yet
Contribute on Hacker News ↗