← Back to context

Comment by energy123

1 day ago

Predicting the next word requires understanding, they're not separate things. If you don't know what comes after the next word, then you don't know what the next word should be. So the task implicitly forces a more long-horizon understanding of the future sequence.

This is utterly wrong. Predicting the next word requires a large sample of data made into a statistical model. It has nothing to do with "understanding", which implies it knows why rather than what.

  • Ilya Sustkever was on a podcast, saying to imagine a mystery novel where at the end it says “and the killer is: (name)”. Saying it’s just a statistical model generating the next most likely word, how can it do that in this case if it doesn’t have some understanding of all the clues, etc. A specific name is not statistically likely to appear

    • I once was chatting with an author of books (very much an amateur) and he said he enjoyed writing because he liked discovering where the story goes. IE, he starts and builds characters and creates scenarios for them and at some point the story kind of takes over, there is only one way a character can act based on what was previously written, but it wasn't preordained. That's why he liked it, it was a discovery to him.

      I'm not saying this is the right way to write a book but it is a way some people write at least! And one LLMs seem capable of doing. (though isn't a book outline pretty much the same as a coding plan and well within their wheelhouse?)

    • Can current LLMs actually do that, though? What Ilya posed was a thought experiment: if it could do that, then we would say that it has understanding. But AFAIK that is beyond current capabilities.

      1 reply →

    • This implies understanding of preceding tokens, no? GP was saying they have understanding of future tokens.

    • It can't do that without the answer to who did it being in the training data. I think the reason people keep falling for this illusion is that they can't really imagine how vast the training dataset is. In all cases where it appears to answer a question like the one you posed, it's regurgitating the answer from its training data in a way that creates an illusion of using logic to answer it.

      3 replies →

  • "Understanding" is just a trap to get wrapped up in. A word with no definition and no test to prove it.

    Whether or not the model are "understanding" is ultimately immaterial, as their ability to do things is all that matters.

    • If they can't do things that require understanding, it's material, bub.

      And just because you have no understanding of what "understanding" means, doesn't mean nobody does.

      1 reply →

  • Modern LLMs are post trained for tasks other than next word prediction.

    They still output words through (except for multi-modal LLMs) so that does involve next word generation.

  • The line between understanding and “large sample of data made into a statistical model” is kind of fuzzy.

> Predicting the next word requires understanding

If we were talking about humans trying to predict next word, that would be true.

There is no reason to suppose than an LLM is doing anything other than deep pattern prediction pursuant to, and no better than needed for, next word prediction.

  • There is plenty reason. This article is just one example of many. People bring it up because LLMs routinely do things we call reasoning when we see them manifest in other humans. Brushing it off as 'deep pattern prediction' is genuinely meaningless. Nobody who uses that phrase in that way can actually explain what they are talking about in a way that can be falsified. It's just vibes. It's an unfalsifiable conversation-stopper, not a real explanation. You can replace "pattern matching" with "magic" and the argument is identical because the phrase isn't actually doing anything.

    A - A force is required to lift a ball

    B - I see Human-N lifting a ball

    C - Obviously, Human-N cannot produce forces

    D - Forces are not required to lift a ball

    Well sir, why are you so sure Human-N cannot produce forces? How is she lifting the ball ? Well Of course Human-N is just using s̶t̶a̶t̶i̶s̶t̶i̶c̶s̶ magic.

    • You seem to be ignoring two things...

      First, the obvious one, is that LLMs are trained to auto-regressively predict human training samples (i.e. essentially to copy them, without overfitting), so OF COURSE they are going to sound like the training set - intelligent, reasoning, understanding, etc, etc. The mistake is to anthropomorphize the model because it sounds human, and associate these attributes of understanding etc to the model itself rather than just reflecting the mental abilities of the humans who wrote the training data.

      The second point is perhaps a bit more subtle, and is about the nature of understanding and the differences between what an LLM is predicting and what the human cortex - also a prediction machine - is predicting...

      When humans predict, what we're predicting is something external to ourself - the real world. We observe, over time we see regularities, and from this predict we'll continue to see those regularities. Our predictions include our own actions as an input - how will the external world react to our actions, and therefore we learn how to act.

      Understanding something means being able to predict how it will behave, both left alone, and in interaction with other objects/agents, including ourselves. Being able to predict what something will do if you poke it is essentially what it means to understand it.

      What an LLM is predicting is not the external world and how it reacts to the LLMs actions, since it is auto-regressively trained - it is only predicting a continuation of it's own output (actions) based on it's own immediately preceding output (actions)! The LLM therefore itself understands nothing since it has no grounding for what it is "talking about", and how the external world behaves in reaction to it's own actions.

      The LLMs appearance of "understanding" comes solely from the fact that it is mimicking the training data, which was generated by humans who do have agency in the world and understanding of it, but the LLM has no visibility into the generative process of the human mind - only to the artifacts (words) it produces, so the LLM is doomed to operate in a world of words where all it might be considered to "understand" is it's own auto-regressive generative process.

      11 replies →

    • Anything can be euphemized. Human intelligence is atoms moving around the brain. General relativity is writing on a piece of paper.

      1 reply →