Comment by brookst
1 day ago
Thanks for the fantastic explanation!
Would it be more efficient to calculate some kind of per-model or per-layer mean, and then only specify standard deviations, maybe by fp8 or smaller?
1 day ago
Thanks for the fantastic explanation!
Would it be more efficient to calculate some kind of per-model or per-layer mean, and then only specify standard deviations, maybe by fp8 or smaller?
No comments yet
Contribute on Hacker News ↗