Comment by DavidSJ
8 months ago
An MoE's matmuls have the same arithmetic intensity as a dense model's matmuls, provided they're being multiplied by a batch of activation vectors of equal size.
8 months ago
An MoE's matmuls have the same arithmetic intensity as a dense model's matmuls, provided they're being multiplied by a batch of activation vectors of equal size.
No comments yet
Contribute on Hacker News ↗