Comment by acdha
17 hours ago
This is definitely complicated—I’m not a neuroscientist but worked for some and married one, so I’ve heard quite a few entries from the genre of how our brains fool ourselves or make our conscious experience seem more coherent and linear than it actually is—but the big ones I see are the inability to learn from experience or have a generalized sense of conceptual reasoning. For the latter, I’m not just thinking about the simple “count the r’s in strawberry” things companies have put so much effort into masking but the way minor changes in a question can get conflicting answers from even the best models, indicating that while there’s something truly fascinating about how they cluster topics it is not the same as having a conceptual model of the world or a theory of mind. This is the huge problem in the field: all of these companies would love to have a model which is safe to use in adversarial contexts because then the mass layoffs could begin in earnest, but the technology just isn’t there.
This isn’t a religious argument that there’s something about our brains which can’t be replicated, but simply that it’s sufficiently more complex than anything we have currently.
> minor changes in a question can get conflicting answers from even the best models
Humans are notorious for doing this.
Not unless you’re referring to significant mental illness, no. Individual people may vary if, say, I ask for health advice but if I ask the same doctor they’re not going to flip the answer based on whether I use medical or wellness influencer phrasings — and that allows them to build a reputation which other people can rely on.
This especially applies to mistakes: the junior developer who drops a database by mistake is unlikely to ever do that again, whereas the same AI companies models keep doing that to a small but non-zero number of customers because they don’t have that higher level learning process or anything like fear of consequences.
Humans can't reliably subitize more than five-ish objects, while chimps can actually do this task better than us. That's our "cant count the R's in strawberry" (which flagship models can reliably do now, general letter counting).
https://en.wikipedia.org/wiki/Subitizing
That’s not a valid analogy: humans reliably perform that task billions of times daily. It’s still routine to find cases which reveal that while models may have improved on some basic tasks (or learned to call a tool) there isn’t a deeper understanding of the underlying concept or the problem they’re being asked to solve.
And AI agents reliably-ish do tasks billions of times a day that humans struggle with, namely regurgitating information at incredible rates across wide breadths of topics. I see it as merely a matter of degree, not category.
How do you measure "deeper understanding" in humans? You usually do it by asking them to show their work, show how the dots connect. Reasoning models are getting there, and when they do, I'm sure the goalposts will move yet again.