← Back to context

Comment by tbalsam

4 hours ago

This is the common belief but not quite correct! The Muon update was proposed by Bernstein as the result of a theoretical paper suggesting concrete realizations of the theory, and Keller implemented it and added practical things to get it to work well (input/output AdamW, aggressive coefficients, post-Nesterov, etc).

Both share equal credit I feel (also, the paper's co-authors!), both put in a lot of hard work for it, though I tend to bring up Bernstein since he tends to be pretty quiet about it himself.

(Source: am experienced speedrunner who's been in these circles for a decent amount of time)