Comment by rndphs
11 days ago
https://arxiv.org/pdf/1912.02292 "We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better." That is the first sentence of the abstract. The first graph shown in the paper backs it up.
Looking into it further, it seems that typical LLMs are in the first descent regime anyway though so my original point is not too relevant for them anyway it seems. Also it looks like the second descent region doesn't always reach a lower loss than the first, it appears to depend on other factors as well.
No comments yet
Contribute on Hacker News ↗