Comment by brookst
16 hours ago
Thanks for the fantastic explanation!
Would it be more efficient to calculate some kind of per-model or per-layer mean, and then only specify standard deviations, maybe by fp8 or smaller?
16 hours ago
Thanks for the fantastic explanation!
Would it be more efficient to calculate some kind of per-model or per-layer mean, and then only specify standard deviations, maybe by fp8 or smaller?
No comments yet
Contribute on Hacker News ↗