Comment by tbrownaw
2 days ago
> A pervasive belief in scaling has resulted in a massive windfall in capital for industry labs and fundamentally reshaped the culture of conducting science in our field.
People spend money on this because it works. It seems odd to call observable reality a "pervasive belief".
> Academia has been marginalized from meaningfully participating in AI progress and industry labs have stopped publishing.
Firstly, I still see news items about new models that are supposed to do more with less. If these are neither from academia nor industry, where are they coming from?
Secondly, "has been marginalized"? Really? Nobody's going to be uninterested in getting better results with less compute spend, attempts have just had limited effectiveness.
.
> However, it is unclear why we need so many additional weights. What is particularly puzzling is that we also observe that we can get rid of most of these weights after we reach the end of training with minimal loss
I thought the extra weights were because training takes advantage of high-dimensional bullshit to make the math tractable. And that there's some identifiable point where you have "enough" and more doesn't help.
I hadn't heard that anyone had a workable way to remove the extra ones after training, so that's cool.
.
.
The impression I had is that there's a somewhat-fuzzy "correct" number of weights and amount of training for any given architecture and data set / information content. And that when you reach that point is when you stop getting effort-free results by throwing hardware at the problem.
No comments yet
Contribute on Hacker News ↗