Comment by efitz
1 day ago
I haven’t seen this new google model but now must try it out.
I will say that other frontier models are starting to surprise me with their reasoning/understanding- I really have a hard time making (or believing) the argument that they are just predicting the next word.
I’ve been using Claude Code heavily since April; Sonnet 4.5 frequently surprises me.
Two days ago I told the AI to read all the documentation from my 5 projects related to a tool I’m building, and create a wiki, focused on audience and task.
I'm hand reviewing the 50 wiki pages it created, but overall it did a great job.
I got frustrated about one issue: I have a github issue to create a way to integrate with issue trackers (like Jira), but it's TODO, and the AI featured on the home page that we had issue tracker integration. It created a page for it and everything; I figured it was hallucinating.
I went to edit the page and replace it with placeholder text and was shocked that the LLM had (unprompted) figured out how to use existing features to integrate with issue trackers, and wrote sample code for GitHub, Jira and Slack (notifications). That truly surprised me.
Predicting the next word is the interface, not the implementation.
(It's a pretty constraining interface though - the model outputs an entire distribution and then we instantly lose it by only choosing one token from it.)
It's true, but by the same token our brain is "just" thresholding spike rates.
Predicting the next word requires understanding, they're not separate things. If you don't know what comes after the next word, then you don't know what the next word should be. So the task implicitly forces a more long-horizon understanding of the future sequence.
This is utterly wrong. Predicting the next word requires a large sample of data made into a statistical model. It has nothing to do with "understanding", which implies it knows why rather than what.
Ilya Sustkever was on a podcast, saying to imagine a mystery novel where at the end it says “and the killer is: (name)”. Saying it’s just a statistical model generating the next most likely word, how can it do that in this case if it doesn’t have some understanding of all the clues, etc. A specific name is not statistically likely to appear
8 replies →
"Understanding" is just a trap to get wrapped up in. A word with no definition and no test to prove it.
Whether or not the model are "understanding" is ultimately immaterial, as their ability to do things is all that matters.
2 replies →
If you're claiming a transformer model is a Markov chain, this is easily disprovable by, eg, asking the model why it isn't a Markov chain!
But here is a really big one of those if you want it: https://arxiv.org/abs/2401.17377
Modern LLMs are post trained for tasks other than next word prediction.
They still output words through (except for multi-modal LLMs) so that does involve next word generation.
The line between understanding and “large sample of data made into a statistical model” is kind of fuzzy.
> Predicting the next word requires understanding
If we were talking about humans trying to predict next word, that would be true.
There is no reason to suppose than an LLM is doing anything other than deep pattern prediction pursuant to, and no better than needed for, next word prediction.
There is plenty reason. This article is just one example of many. People bring it up because LLMs routinely do things we call reasoning when we see them manifest in other humans. Brushing it off as 'deep pattern prediction' is genuinely meaningless. Nobody who uses that phrase in that way can actually explain what they are talking about in a way that can be falsified. It's just vibes. It's an unfalsifiable conversation-stopper, not a real explanation. You can replace "pattern matching" with "magic" and the argument is identical because the phrase isn't actually doing anything.
A - A force is required to lift a ball
B - I see Human-N lifting a ball
C - Obviously, Human-N cannot produce forces
D - Forces are not required to lift a ball
Well sir, why are you so sure Human-N cannot produce forces? How is she lifting the ball ? Well Of course Human-N is just using s̶t̶a̶t̶i̶s̶t̶i̶c̶s̶ magic.
14 replies →
How'd you do at the International Math Olympiad this year?
9 replies →
It's trying to maximize a reward function. It's not just predicting the next word.