Comment by 0xBA5ED
4 hours ago
"But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are."
Umm, why doesn't it capture it? Why can't a statistical text generator do amazing things without _actually_ being intelligent (I'm thinking agency here)? I think it's important to remind ourselves, these things do not reflect or understand what they're outputting. That is 100% evident with the continuing issues with them outputting nonsense along with their apparently insightful output. The article itself said the output was poor but the student noticed something about it that sparked an idea and he followed that lead.
I reject the premise. I read the outputs I generate carefully (too carefully, probably). They don't "continue to output nonsense". Their success rate exceeds that of humans in some places.
To clarify: the problem I have with "statistical text generator" isn't the word "statistical". It's "text generator". It's been two years now since that stopped being a reasonable way to completely encapsulate what these systems do. The models themselves are now run iteratively, with an initial human-defined prompt cascading into series of LLM-generated interim prompts and tool calls. That process is not purely, or even primarily, one of "text generation"; it's bidirectional, and involves deep implicit searches.
Do you think it's akin to Ilya's [1] claim that next token prediction is reality? E.g. any deeper claims about the structure of that intelligence or comparing to humans?
To be clear, I'm 100% with you that "next token predictor" is stupid to call what these machines are now. We are engineers and can shape the capability landscape to give rise to a ton of emergent behavior. It's kind of amazing. In that sense, being precise about what's going on, rather than being essentialist (technically, yes, the 'actual' algorithm, whatever that even means, is text prediction), is just good epistemology.
I still think it's still a very interesting question though to ask about deeper emergent structures. To me, this is evidence of a more embedded cognition kind of theory of intelligence (admittedly this is not very precise). But IDK how into philosophy you are.
[1] https://www.dwarkesh.com/p/ilya-sutskever
I try really hard not to think about this stuff because I've seen how people talk when they get too deep into it. My mental model, or mental superstructure, if you will, for all of this stuff is that we've discovered a fundamentally novel and effective way of doing computing. Computer science is fascinating and I'm there for it, and prickly when people are dismissive of it. I'm generally not interested in the theory of human intelligence (it's a super interesting problem I just happen not to engage with much), which spares me from a lot of crazy Internet stuff.