Comment by yorwba
7 hours ago
The handwaving required is just to assume a diagonal preconditioner, and the optimal preconditioner under that constraint corresponds to the new update rule. (See section F of the paper.) And of course a diagonal preconditioner works on the per-paramer level.
No comments yet
Contribute on Hacker News ↗