Comment by jeremyjh
15 hours ago
> It just doesn’t work that way, LLMs need to be generalised a lot to be useful even in specific tasks.
This is the entire breakthrough of deep learning on which the last two decades of productive AI research is based. Massive amounts of data are needed to generalize and prevent over-fitting. GP is suggesting an entirely new research paradigm will win out - as if researchers have not yet thought of "use less data".
> It really is the antithesis to the human brain, where it rewards specific knowledge
No, its completely analogous. The human brain has vast amounts of pre-training before it starts to learn knowledge specific to any kind of career or discipline, and this fact to me intuitively suggests why GP is baked: You cannot learn general concepts such as the english language, reasoning, computing, network communication, programming, relational data from a tiny dataset consisting only of code and documentation for one open-source framework and language.
It is all built on a massive tower of other concepts that must be understood first, including ones much more basic than the examples I mentioned but that are practically invisible to us because they have always been present as far back as our first memories can reach.
There is actually a whole lot of research around the "use less data" called data pruning. The goal in a lot of cases there is basically to achieve the same performance with less data. For example [1] received quite some attention in the past.
[1] https://arxiv.org/abs/2206.14486
I clarified my comment - "perhaps researchers have not tried 'use less data'" suggests I might be unaware of this concept, I changed it to "as if". In fact "less data" was tried for decades before the first image classifiers were actually working in 2012. My understanding of that paper you are linking to is that it is not a new research paradigm; it is about filtering/pruning less relevant data that is not needed to improve a particular capability in a deep learning model, and that is absolutely one likely approach that will yield the goal of smaller, better models in many tasks.
That will not change the fact that a coding model has to learn vastly many foundational capabilities that will not be present in such a dataset as small as all the python code ever written. It will mean much less python than all the python ever written will be needed, but many other things needed too in representative quantities.