Comment by doctorpangloss
1 month ago
More computation cannot improve the quality or domain of data. Maybe the bitter lesson lesson is, lobby bitterly, for copyright laws that favor what you are doing, and weakened anti trust, to give you the insurmountable moat of exclusive data in a walled garden media network.
A human does not need billions of driving hours to learn how to drive competently. The issue with current method is not quality of data but methodology. More computation might unlock newer approaches that are better with less and worse quality data.
A human is not a blank slate. There's millennia of evolutionary history that goes into making a brain adapted and capable of learning from its environment.
A human is a mostly blank slate...but it's a really sophisticated slate that as you say has taken many millions of years of development.
> A human does not need billions of driving hours to learn how to drive competently.
But humans DO need ~16 years of growth and development "to learn how to drive competently" and then will also know how to ride a bycycle, mow grass, build shelves, cook pizza, use a smart phone, ...! There's a lesson in that somewhere ....
You don't need the 16, you can get a much younger person to drive too. It only supports the fact that data amount/quality is not the problem.
6 replies →
I think there's a more fundamental problem at play here: what seems to work in 'AI', search, is made better by throwing more data into more compute. You then store the results in a model, that amounts to pre-computed solutions waiting for a problem. Interacting with the model is then asking questions and getting answers that hopefully fit your needs.
So, what we're doing on the whole seems to be a lot of coding and decoding, hoping that the data used in training can be adequately mapped to the problem domain realities. That would mean that the model you end up with is somehow a valid representation of some form of knowledge about the problem domain. Trouble is, more text won't yield higher and higher resolution of some representation of the problem domain. After some point, you start to introduce noise.
Yeah well. That was a bad analogy, and everyone I know who used to say that, admits error.