Comment by londons_explore

3 months ago

> These models somehow just generalize dramatically worse than people. It's a very fundamental thing

My guess is we'll discover that biological intelligence is 'learning' not just from your experience, but that of thousands of ancestors.

There are a few weak pointers in that direction. Eg. A father who experiences a specific fear can pass that fear to grandchildren through sperm alone. [1].

I believe this is at least part of the reason humans appear to perform so well with so little training data compared to machines.

[1]: https://www.nature.com/articles/nn.3594

6 comments

londons_explore

HarHarVeryFunny 3 months ago

From both an architectural and learning algorithm perspective, there is zero reason to expect an LLM to perform remotely like a brain, nor for it to generalize beyond what was necessary for it to minimize training errors. There is nothing in the loss function of an LLM to incentivize it to generalize.

However, for humans/animals the evolutionary/survival benefit of intelligence, learning from experience, is to correctly predict future action outcomes and the unfolding of external events, in a never-same-twice world. Generalization is key, as is sample efficiency. You may not get more than one or two chances to learn that life-saving lesson.

So, what evolution has given us is a learning architecture and learning algorithms that generalize well from extremely few samples.

jebarker 3 months ago
> what evolution has given us is a learning architecture and learning algorithms that generalize well from extremely few samples.
This sounds magical though. My bet is that either the samples aren’t as few as they appear because humans actually operate in a constrained world where they see the same patterns repeat very many times if you use the correct similarity measures. Or, the learning that the brain does during human lifetime is really just a fine-tuning on top of accumulated evolutionary learning encoded in the structure of the brain.
- HarHarVeryFunny 3 months ago
  
  > This sounds magical though
  Not really, this is just the way that evolution works - survival of the fittest (in the prevailing environment). Given that the world is never same twice, then generalization is a must-have. The second time you see the tiger charging out, you better have learnt your lesson from the first time, even if everything other than "it's a tiger charging out" is different, else it wouldn't be very useful!
  You're really saying the same thing, except rather than call it generalization you are calling it being the same "if you use the correct similarity measures".
  The thing is that we want to create AI with human-like perception and generalization of the world, etc, etc, but we're building AI in a different way than our brain was shaped. Our brain was shaped by evolution, honed for survival, but we're trying to design artificial brains (or not even - just language models!!) just by designing them to operate in a certain way, and/or to have certain capabilities.
  The transformer was never designed to have brain-like properties, since the goal was just to build a better seq-2-seq architecture, intended for language modelling, optimized to be efficient on today's hardware (the #1 consideration).
  If we want to build something with capabilities more like the human brain, then we need to start by analyzing exactly what those capabilities are (such as quick and accurate real-time generalization), and considering evolutionary pressures (which Ilya seems to be doing) can certainly help in that analysis.
  Edit: Note how different, and massively more complex, the spatio-temporal real world of messy analog never-same-twice dynamics is to the 1-D symbolic/discrete world of text that "AI" is currently working on. Language modelling is effectively a toy problem in comparison. If we build something with brain-like ability to generalize/etc over real world perceptual data, then naturally it'd be able to handle discrete text and language which is a very tiny subset of the real world, but the opposite of course does not apply.
  
  2 replies →