Comment by acdha
14 hours ago
This is definitely complicated—I’m not a neuroscientist but worked for some and married one, so I’ve heard quite a few entries from the genre of how our brains fool ourselves or make our conscious experience seem more coherent and linear than it actually is—but the big ones I see are the inability to learn from experience or have a generalized sense of conceptual reasoning. For the latter, I’m not just thinking about the simple “count the r’s in strawberry” things companies have put so much effort into masking but the way minor changes in a question can get conflicting answers from even the best models, indicating that while there’s something truly fascinating about how they cluster topics it is not the same as having a conceptual model of the world or a theory of mind. This is the huge problem in the field: all of these companies would love to have a model which is safe to use in adversarial contexts because then the mass layoffs could begin in earnest, but the technology just isn’t there.
This isn’t a religious argument that there’s something about our brains which can’t be replicated, but simply that it’s sufficiently more complex than anything we have currently.
> minor changes in a question can get conflicting answers from even the best models
Humans are notorious for doing this.
Humans can't reliably subitize more than five-ish objects, while chimps can actually do this task better than us. That's our "cant count the R's in strawberry" (which flagship models can reliably do now, general letter counting).
https://en.wikipedia.org/wiki/Subitizing
That’s not a valid analogy: humans reliably perform that task billions of times daily. It’s still routine to find cases which reveal that while models may have improved on some basic tasks (or learned to call a tool) there isn’t a deeper understanding of the underlying concept or the problem they’re being asked to solve.
And AI agents reliably-ish do tasks billions of times a day that humans struggle with, namely regurgitating information at incredible rates across wide breadths of topics. I see it as merely a matter of degree, not category.
How do you measure "deeper understanding" in humans? You usually do it by asking them to show their work, show how the dots connect. Reasoning models are getting there, and when they do, I'm sure the goalposts will move yet again.