Comment by brookst
10 months ago
Thanks for the fantastic explanation!
Would it be more efficient to calculate some kind of per-model or per-layer mean, and then only specify standard deviations, maybe by fp8 or smaller?
10 months ago
Thanks for the fantastic explanation!
Would it be more efficient to calculate some kind of per-model or per-layer mean, and then only specify standard deviations, maybe by fp8 or smaller?
No comments yet
Contribute on Hacker News ↗