← Back to context

Comment by fnordpiglet

21 hours ago

However it’s disingenuous to say the inference is on the next token because it’s actually not, it’s in the models parameter space across a set of nonlinear activation functions then effectively projected into the token. The idea its predictive of the token isn’t actually the case, it really is a much more complex and more semantic relationship that ends in the series of tokens through the attention mechanism.

The article also makes this assertion that it replays everything over and over again to create each character one at a time as some way to demonstrate the autoregressive self attention mechanism but it’s really not accurate at all, and it trivializes what is going on.

I’m am not asserting LLMs are aware or conscious that’s on the surface profoundly absurd. And I do understand your point that the fact it emits in words something that seems to speak to us gives to the air of humanity that’s isnt real. However there is a very real emergent reality that our language alone appears to lead to embedding a form of thought and understanding that is latent in our use of language in communicating that is in fact coming through the model. It is not regurgitating its corpus and pattern matching because the patterns you input and it emits are not where the inference is operating, its within this enormous vector space through these complex non linear activation functions with learned residuals not in the language corpus.

It is not conscious or aware. It is something else, not human. But if you can not see it as amazing you have lost the capacity to dream.

> But if you can not see it as amazing you have lost the capacity to dream.

I completely disagree. I think if you think these things are amazing, your dreams are incredibly limited and boring.

I remember the first time I talked to a chatbot. Not an LLM, just a regular chatbot, like ELIZA or any other dumb bot.

For a few seconds, it felt magical, like I was talking to a computer that understood me, as it made replies that were sensible to what I was saying. Then it said something incredibly stupid and jarring that made no sense, and that took the magic away. Oh, this is just a dumb computer program.

I remember the first time I talked to an LLM-powered chatbot. It was the exact same thing, except the magic feeling lasted a tiny little bit longer and was a tiny little more convincing. But it went away in the exact same way, for the exact same reason. Once you've seen the emperor without clothes, nothing brings back the magic.

>it’s disingenuous to say the inference is on the next token because it’s actually not, it’s in the models parameter space across a set of nonlinear activation functions then effectively projected into the token. The idea its predictive of the token isn’t actually the case, it really is a much more complex and more semantic relationship

Do you, or anyone reading, have any worthwhile links that make a strong case for this (that there is a stronger semantic relationship than simply next token prediction)? I would like to read more about this.