← Back to context

Comment by ACCount37

3 days ago

The more you try to look into the LLM internals, the more similarities you find. Humanlike concepts, language-invariant circuits, abstract thinking, world models.

Mechanistic interpretability is struggling, of course. But what it found in the last 5 years is still enough to dispel a lot of the "LLMs are merely X" and "LLMs can't Y" myths - if you are up to date on the relevant research.

It's not just the outputs. The process is somewhat similar too. LLMs and humans both implement abstract thinking of some kind - much like calculators and arithmometers both implement addition.

Without a direct comparison to human internals (grounded in neurobiology, rather than intuition), it's hard to say how similar these similarities are, and if they're not simply a result of the transparency illusion (as Sydney Lamb defines it).

However, if you can point us to some specific reading on mechanistic interpretability that you think is relevant here, I would definitely appreciate it.

  • That's what I'm saying: there is no "direct comparison grounded in neurobiology" for most things, and for many things, there simply can't be one. For the same reason you can't compare gears and springs to silicon circuits 1:1. The low level components diverge too much.

    Despite all that, the calculator and the arithmometer do the same things. If you can't go up an abstraction level and look past low level implementation details, then you'll remain blind to that fact forever.

    What papers depends on what you're interested in. There's a lot of research - ranging from weird LLM capabilities and to exact operation of reverse engineered circuits.

    • There is no level of abstraction to go up sans context. Again, let me repeat myself as well: the calculator and the arithmometer do the same things -- from the point of view of the cleric that needs to add and subtract quickly. Otherwise they are simply two completely different objects. And we will have a hard time making correct inferences about how one works based only on how we know the other works, or, e.g. how calculating machines work.

      What I'm interested in is evidence that supports that "The more you try to look into the LLM internals, the more similarities you find". Some pointers to specific books and papers will be very helpful.

      6 replies →