← Back to context

Comment by menaerus

8 days ago

> It’s honestly why I gave up trying to get folks to look at these things rationally as knowable objects (“here’s how LLMs actually work”)

Here's your own fallacy you fell into - this is important to understand. Neither do you nor me understand "how LLMs actually work" because, well, nobody really does. Not even the scientists who built the (math around) models. So, you can't really use that argument because it would be silly if you thought you know something which rest of the science community doesn't. Actually, there's a whole new field in science developed around our understanding how models actually arrive to answers which they give us. The thing is that we are only the observers of the results made by the experiments we are doing by training those models, and only so it happens that the result of this experiment is something we find plausible, but that doesn't mean we understand it. It's like a physics experiment - we can see that something is behaving in certain way but we don't know to explain it how and why.

Even if interpretability of specific models or features within them is an open area of research, the mechanics of how LLMs work to produce results are observable and well-understood, and methods to understand their fundamental limitations are pretty solid these days as well.

Is there anything to be gained from following a line of reasoning that basically says LLMs are incomprehensible, full stop?

  • >Even if interpretability of specific models or features within them is an open area of research, the mechanics of how LLMs work to produce results are observable and well-understood, and methods to understand their fundamental limitations are pretty solid these days as well.

    If you train a transformer on (only) lots and lots of addition pairs, i.e '38393 + 79628 = 118021' and nothing else, the transformer will, during training discover an algorithm for addition and employ it in service of predicting the next token, which in this instance would be the sum of two numbers.

    We know this because of tedious interpretability research, the very limited problem space and the fact we knew exactly what to look for.

    Alright, let's leave addition aside (SOTA LLMs are after all trained on much more) and think about another question. Any other question at all. How about something like:

    "Take a capital letter J and a right parenthesis, ). Take the parenthesis, rotate it counterclockwise 90 degrees, and put it on top of the J. What everyday object does that resemble?"

    What algorithm does GPT or Gemini or whatever employ to answer this and similar questions correctly ? It's certainly not the one it learnt for addition. Do you Know ? No. Do the creators at Open AI or Google know ? Not at all. Can you or they find out right now ? Also No.

    Let's revisit your statement.

    "the mechanics of how LLMs work to produce results are observable and well-understood".

    Observable, I'll give you that, but how on earth can you look at the above and sincerely call that 'well-understood' ?

    • It's pattern matching, likely from typography texts and descriptions of umbrellas. My understanding is that the model can attempt some permutations in its thinking and eventually a permutation's tokens catch enough attention to attempt to solve, and that once it is attending to "everyday object", "arc", and "hook", it will reply with "umbrella".

      Why am I confident that it's not actually doing spatial reasoning? At least in the case of Claude Opus 4.6, it also confidently replies "umbrella" even when you tell it to put the parenthesis under the J, with a handy diagram clearly proving itself wrong: https://claude.ai/share/497ad081-c73f-44d7-96db-cec33e6c0ae3 . Here's me specifically asking for the three key points above: https://claude.ai/share/b529f15b-0dfe-4662-9f18-97363f7971d1

      I feel like I have a pretty good intuition of what's happening here based on my understanding of the underlying mathematical mechanics.

      Edit: I poked at it a little longer and I was able to get some more specific matches to source material binding the concept of umbrellas being drawn using the letter J: https://claude.ai/share/f8bb90c3-b1a6-4d82-a8ba-2b8da769241e

      7 replies →

    • From Gemini:When you take those two shapes and combine them, the resulting image looks like an umbrella.

  • The concept “understand” is rooted in utility. It means “I have built a much simpler model which produces usefully accurate predictions, of the thing or behaviour I seek to ‘understand’”. This utility is “explanatory power”. The model may be in your head, may be math, may be an algorithm or narrative, it may be a methodology with a history of utility. “Greater understanding” is associated with models that are simpler, more essential, more accurate, more useful, cheaper, more decomposed, more composable, more easily communicated or replicated, or more widely applicable.

    “Pattern matching”, “next token prediction”, “tensor math” and “gradient descent” or the understanding and application of these by specialists, are not useful models of what LLMs do, any more than “have sex, feed and talk to the resulting artifact for 18 years” is a useful model of human physiology or psychology.

    My understanding, and I'm not a specialist, is there are huge and consequential utility gaps in our models of LLMs. So much so, it is reasonable to say we don't yet understand how they work.

  • You can't keep pushing the AI hype train if you consider it just a new type of software / fancy statistical database.

Pro tip: call it a "law of nature" and people will somehow stop pestering you about the why.

I think in a couple decades people will call this the Law of Emergent Intelligence or whatever -- shove sufficient data into a plausible neural network with sufficient compute and things will work out somehow.

On a more serious note, I think the GP fell into an even greater fallacy of believing reductionism is sufficient to dissuade people from ... believing in other things. Sure, we now know how to reduce apparent intelligence into relatively simple matrices (and a huge amount of training data), but that doesn't imply anything about social dynamics or how we should live at all! It's almost like we're asking particle physicists how we should fix the economy or something like that. (Yes, I know we're almost doing that.)

  • In science these days, the term "Law" is almost never used anymore, the term "Theory" replaced it. E.g Theory of special relativity instead of Law of special relativity.

Agree. I think it is just people have their own simplified mental model how it works. However, there is no reason to believe these simplified mental models are accurate (otherwise we will be here 20-year earlier with HMM models).

The simplest way to stop people from thinking is to have a semi-plausible / "made-me-smart" incorrect mental model of how things work.