Comment by IAmGraydon
1 day ago
This is utterly wrong. Predicting the next word requires a large sample of data made into a statistical model. It has nothing to do with "understanding", which implies it knows why rather than what.
1 day ago
This is utterly wrong. Predicting the next word requires a large sample of data made into a statistical model. It has nothing to do with "understanding", which implies it knows why rather than what.
Ilya Sustkever was on a podcast, saying to imagine a mystery novel where at the end it says “and the killer is: (name)”. Saying it’s just a statistical model generating the next most likely word, how can it do that in this case if it doesn’t have some understanding of all the clues, etc. A specific name is not statistically likely to appear
I once was chatting with an author of books (very much an amateur) and he said he enjoyed writing because he liked discovering where the story goes. IE, he starts and builds characters and creates scenarios for them and at some point the story kind of takes over, there is only one way a character can act based on what was previously written, but it wasn't preordained. That's why he liked it, it was a discovery to him.
I'm not saying this is the right way to write a book but it is a way some people write at least! And one LLMs seem capable of doing. (though isn't a book outline pretty much the same as a coding plan and well within their wheelhouse?)
Can current LLMs actually do that, though? What Ilya posed was a thought experiment: if it could do that, then we would say that it has understanding. But AFAIK that is beyond current capabilities.
Someone should try it and create a new "mysterybench". Find all mystery novels written after LLM training cutoff, and see how many models unravel the mystery
This implies understanding of preceding tokens, no? GP was saying they have understanding of future tokens.
It can't do that without the answer to who did it being in the training data. I think the reason people keep falling for this illusion is that they can't really imagine how vast the training dataset is. In all cases where it appears to answer a question like the one you posed, it's regurgitating the answer from its training data in a way that creates an illusion of using logic to answer it.
It can't do that without the answer to who did it being in the training data.
Try it. Write a simple original mystery story, and then ask a good model to solve it.
This isn't your father's Chinese Room. It couldn't solve original brainteasers and puzzles if it were.
That’s not true, at all.
1 reply →
"Understanding" is just a trap to get wrapped up in. A word with no definition and no test to prove it.
Whether or not the model are "understanding" is ultimately immaterial, as their ability to do things is all that matters.
If they can't do things that require understanding, it's material, bub.
And just because you have no understanding of what "understanding" means, doesn't mean nobody does.
> doesn't mean nobody does
If it's not a functional understating that allows to replicate functionality of understanding, is it the real understanding?
If you're claiming a transformer model is a Markov chain, this is easily disprovable by, eg, asking the model why it isn't a Markov chain!
But here is a really big one of those if you want it: https://arxiv.org/abs/2401.17377
Modern LLMs are post trained for tasks other than next word prediction.
They still output words through (except for multi-modal LLMs) so that does involve next word generation.
The line between understanding and “large sample of data made into a statistical model” is kind of fuzzy.