Comment by an0malous
6 hours ago
Why should the skeptics be reading it? The scaling laws show diminishing returns on more training data and larger models.
From the Kaplan scaling laws paper:
> We have observed consistent scalings of language model log-likelihood loss with non-embedding parameter count N, dataset size D, and optimized training computation Cmin, as encapsulated in Equations (1.5) and (1.6). Conversely, we find very weak dependence on many architectural and optimization hyperparameters. Since scalings with N,D,Cmin are power-laws, there are diminishing returns with increasing scale.
So the skeptics are right to be skeptical of LLMs being all you need for continued advancement in this space. It seems like the believers are the ones who need to learn about the scaling laws.
No comments yet
Contribute on Hacker News ↗