Comment by highfrequency

6 months ago

Crazy that there are now five and a half companies that all have roughly state of the art LLMs.

> We developed a new training technique which we refer to as MetaP that allows us to reliably set critical model hyper-parameters such as per-layer learning rates and initialization scales. We found that chosen hyper-parameters transfer well across different values of batch size, model width, depth, and training tokens.

This sounds interesting. Anyone have a link to the paper or other documentation on MetaP?

1 comment

highfrequency

jumpCastle 6 months ago

It's quite similar to muP

https://github.com/microsoft/mup