← Back to context

Comment by dnautics

18 days ago

isn't intellectual reasoning just pattern recognition + a forward causal token generation mechanism?

You can replicate an LLM:

You and a buddy are going to play “next word”, but it’s probably already known by a better name than I made up.

You start with one word, ANY word at all, and say it out loud, then your buddy says the next word in the yet unknown sentence, then it’s back to you for one word. Loop until you hit an end.

Let’s say you start with “You”. Then your buddy says the next word out loud, also whatever they want. Let’s go with “are”. Then back to you for the next word, “smarter” -> “than” -> “you” -> “think.”

Neither of you knew what you were going to say, you only knew what was just said so you picked a reasonable next word. There was no ‘thought’, only next token prediction, and yet magically the final output was coherent. If you want to really get into the LLM simulation game then have a third person provide the first full sentence, then one of you picks up the first word in the next sentence and you two continue from there. As soon as you hit a breaking point the third person injects another full sentence and you two continue the game.

With no idea what either of you are going to say and no clue about what the end result will be, no thought or reasoning at all, it won’t be long before you’re sounding super coherent while explaining thermodynamics. But one of the rounds someone’s going to mess it up, like “gluons” -> “weigh” -> “…more?…” -> “…than…(damnit Gary)…” but you must continue the game and finish the sentence, then sit back and think about how you just hallucinated an answer without thinking, reasoning, understanding, or even knowing what you were saying until it finished.

  • that's not how llms work. study the transformer architecture. every token is conditioned not just on the previous token, but each layer's activation generates a query over the kv cache of the previous activations, which means that each token's generation has access to any higher order analytical conclusions and observations generated in the past. information is not lost between the tokens like your thought exercise implies.

    • “The cow goes ‘mooooo’”

      “that’s not how cow work. study bovine theory. contraction of expiratory musculature elevates abdominal pressure and reduces thoracic volume, generating positive subglottal pressure…”

Obviously not. In actual thinking, we can generate an idea, evaluate it for internal consistency and consistency with our (generally much more than linguistic, i.e. may include visual imagery and other sensory representations) world models, decide this idea is bad / good, and then explore similar / different ideas. I.e. we can backtrack and form a branching tree of ideas. LLMs cannot backtrack, do not have a world model (or, to the extent they do, this world model is solely based on token patterns), and cannot evaluate consistency beyond (linguistic) semantic similarity.

  • There's no such thing as a "world model". That is metaphor-driven development from GOFAI, where they'd just make up a concept and assume it existed because they made it up. LLMs are capable of approximating such a thing because they are capable of approximating anything if you train them to do it.

    > or, to the extent they do, this world model is solely based on token patterns

    Obviously not true because of RL environments.

    • > There's no such thing as a "world model"

      There obviously is in humans. When you visually simulate things or e.g. simulate how food will taste in your mind as you add different seasonings, you are modeling (part of) the world. This is presumably done by having associations in our brain between all the different qualia sequences and other kinds of representations in our mind. I.e. we know we do some visuospatial reasoning tasks using sequences of (imagined) images. Imagery is one aspect of our world model(s).

      We know LLMs can't be doing visuospatial reasoning using imagery, because they only work with text tokens. A VLM or other multimodal might be able to do so, but an LLM can't, and so an LLM can't have a visual world model. They might in special cases be able to construct a linguistic model that lets them do some computer vision tasks, but the model will itself still only be using tokenized words.

      There are all sorts of other sensory modalities and things that humans use when thinking (i.e. actual logic and reasoning, which goes beyond mere semantics and might include things like logical or other forms of consistency, e.g. consistency with a relevant mental image), and the "world model" concept is supposed, in part, to point to these things that are more than just language and tokens.

      > Obviously not true because of RL environments.

      Right, AI generally can have much more complex world models than LLMs. An LLM can't even handle e.g. sensor data without significant architectural and training modification (https://news.ycombinator.com/item?id=46948266), at which point, it is no longer an LLM.

      2 replies →

  • "LLMs cannot backtrack". This is exactly wrong. LLMs always see everything in the past. In this sense they are more efficient than turing machines, because (assuming sufficiently large context length) every token sees ALL previous tokens. So, in principle, an LLM could write a bunch of exploratory shit, and then add a "tombstone" "token" that can selectively devalue things within a certain timeframe -- aka just de exploratory thngs (as judged by RoPE time), and thus "backtrack".

    I put "token" in quotes because this would obviously not necessarily be an explicit token, but it would have to be learned group of tokens, for example. But who knows, if the thinking models have some weird pseudo-xml delimiters for thinking, it's not crazy to think that an LLM could shove this information in say the closer tag.

    • > "LLMs cannot backtrack". This is exactly wrong.

      If it wasn't clear, I am talking about LLMs in use today, not ultimate capabilities. All commercial models are known (or believed) to be recursively applied transformers without e.g. backspace or "tombstone" tokens, like you are mentioning here.

      But yes, absolutely LLMs might someday be able to backtrack, either literally during token generation if we allow e.g. backspace tokens (there was at least one paper that did this) or more broadly at the chain of thought level, with methods like you are mentioning.

      3 replies →