Comment by IAmGraydon

1 day ago

This is utterly wrong. Predicting the next word requires a large sample of data made into a statistical model. It has nothing to do with "understanding", which implies it knows why rather than what.

15 comments

IAmGraydon

orionsbelt 1 day ago

Ilya Sustkever was on a podcast, saying to imagine a mystery novel where at the end it says “and the killer is: (name)”. Saying it’s just a statistical model generating the next most likely word, how can it do that in this case if it doesn’t have some understanding of all the clues, etc. A specific name is not statistically likely to appear

nicpottier 21 hours ago

I once was chatting with an author of books (very much an amateur) and he said he enjoyed writing because he liked discovering where the story goes. IE, he starts and builds characters and creates scenarios for them and at some point the story kind of takes over, there is only one way a character can act based on what was previously written, but it wasn't preordained. That's why he liked it, it was a discovery to him.
I'm not saying this is the right way to write a book but it is a way some people write at least! And one LLMs seem capable of doing. (though isn't a book outline pretty much the same as a coding plan and well within their wheelhouse?)
shwaj 1 day ago
Can current LLMs actually do that, though? What Ilya posed was a thought experiment: if it could do that, then we would say that it has understanding. But AFAIK that is beyond current capabilities.
- krackers 1 day ago
  
  Someone should try it and create a new "mysterybench". Find all mystery novels written after LLM training cutoff, and see how many models unravel the mystery
squigz 1 day ago

This implies understanding of preceding tokens, no? GP was saying they have understanding of future tokens.
IAmGraydon 1 day ago
It can't do that without the answer to who did it being in the training data. I think the reason people keep falling for this illusion is that they can't really imagine how vast the training dataset is. In all cases where it appears to answer a question like the one you posed, it's regurgitating the answer from its training data in a way that creates an illusion of using logic to answer it.
- CamperBob2 1 day ago
  
  It can't do that without the answer to who did it being in the training data.
  Try it. Write a simple original mystery story, and then ask a good model to solve it.
  This isn't your father's Chinese Room. It couldn't solve original brainteasers and puzzles if it were.
- dyauspitr 15 hours ago
  
  That’s not true, at all.
  
  1 reply →

Workaccount2 1 day ago

"Understanding" is just a trap to get wrapped up in. A word with no definition and no test to prove it.

Whether or not the model are "understanding" is ultimately immaterial, as their ability to do things is all that matters.

pinnochio 1 day ago
If they can't do things that require understanding, it's material, bub.
And just because you have no understanding of what "understanding" means, doesn't mean nobody does.
- red75prime 1 day ago
  
  > doesn't mean nobody does
  If it's not a functional understating that allows to replicate functionality of understanding, is it the real understanding?

astrange 1 day ago

If you're claiming a transformer model is a Markov chain, this is easily disprovable by, eg, asking the model why it isn't a Markov chain!

But here is a really big one of those if you want it: https://arxiv.org/abs/2401.17377

nl 1 day ago

Modern LLMs are post trained for tasks other than next word prediction.

They still output words through (except for multi-modal LLMs) so that does involve next word generation.

dyauspitr 15 hours ago

The line between understanding and “large sample of data made into a statistical model” is kind of fuzzy.