← Back to context

Comment by HarHarVeryFunny

3 days ago

> This sounds magical though

Not really, this is just the way that evolution works - survival of the fittest (in the prevailing environment). Given that the world is never same twice, then generalization is a must-have. The second time you see the tiger charging out, you better have learnt your lesson from the first time, even if everything other than "it's a tiger charging out" is different, else it wouldn't be very useful!

You're really saying the same thing, except rather than call it generalization you are calling it being the same "if you use the correct similarity measures".

The thing is that we want to create AI with human-like perception and generalization of the world, etc, etc, but we're building AI in a different way than our brain was shaped. Our brain was shaped by evolution, honed for survival, but we're trying to design artificial brains (or not even - just language models!!) just by designing them to operate in a certain way, and/or to have certain capabilities.

The transformer was never designed to have brain-like properties, since the goal was just to build a better seq-2-seq architecture, intended for language modelling, optimized to be efficient on today's hardware (the #1 consideration).

If we want to build something with capabilities more like the human brain, then we need to start by analyzing exactly what those capabilities are (such as quick and accurate real-time generalization), and considering evolutionary pressures (which Ilya seems to be doing) can certainly help in that analysis.

Edit: Note how different, and massively more complex, the spatio-temporal real world of messy analog never-same-twice dynamics is to the 1-D symbolic/discrete world of text that "AI" is currently working on. Language modelling is effectively a toy problem in comparison. If we build something with brain-like ability to generalize/etc over real world perceptual data, then naturally it'd be able to handle discrete text and language which is a very tiny subset of the real world, but the opposite of course does not apply.

> Note how different, and massively more complex, the spatio-temporal real world of messy analog never-same-twice dynamics is to the 1-D symbolic/discrete world of text that "AI" is currently working on.

I agree that the real world perceived by a human is vastly more complex than a sequence of text tokens. But it’s not obvious to me that it’s actually less full of repeating patterns or that learning to recognize and interpolate those patterns (like an LLM does) is insufficient for impressive generalization. I think it’s too hard to reason about this stuff when the representations in LLMs and the brain are so high-dimensional.

  • I'm not sure how they can be compared, but of course the real world is highly predictable and repetitious (if you're looking at the right generalizations and abstractions), with brains being the proof of that. Brains are very costly, but their predictive benefit is big enough to more than offset the cost.

    The difference between brains and LLMs though is that brains have evolved with generality as a major driver - you could consider it as part of the "loss function" of brain optimization. Brains that don't generalize quickly won't survive.

    The loss function of an LLM is just next-token error, with no regard as to HOW that was achieved. The loss is the only thing shaping what the LLM learns, and there is nothing in it that rewards generalization. If the model is underparamized (not that they really are), it seems to lead to superposed representations rather than forcing generalization.

    No doubt the way LLMs are trained could be changed to improve generalization, maybe together with architectural changes (put an autoencoder in there to encourage compressed representations ?!), but trying to take a language model and tweak it into a brain seems the wrong approach, and there is a long list of architectural changes/enhancements that would be needed if that is the path.

    With animal brains, it seems that generalization must have been selected for right from the simplest beginnings of a nervous system and sensory driven behavior, given that the real world demands that.