Comment by lambda
1 day ago
Yeah, I looked up some models I have actually run locally on my Strix Halo laptop, and its saying I should have much lower performance than I actually have on models I've tested.
For MoE models, it should be using the active parameters in memory bandwidth computation, not the total parameters.
No comments yet
Contribute on Hacker News ↗