Comment by adamzwasserman

5 days ago

LLMs don't use 'overall probability' in any meaningful sense. During training, gradient descent creates highly concentrated 'gravity wells' of correlated token relationships - the probability distribution is extremely non-uniform, heavily weighted toward patterns seen in training data. The model isn't selecting from 'astronomically many possible sequences' with equal probability; it's navigating pre-carved channels in high-dimensional space. That's fundamentally different from novel discovery.

7 comments

adamzwasserman

KoolKat23 5 days ago

That's exactly the same for humans in the real world.

You're focusing too close, abstract up a level. Your point relates to the "micro" system functioning, not the wider "macro" result (think emergent capabilities).

adamzwasserman 5 days ago
I'm afraid I'd need to see evidence before accepting that humans navigate 'pre-carved channels' in the same way LLMs do. Human learning involves direct interaction with physical reality, not just pattern matching on symbolic representations. Show me the equivalence or concede the point.
- KoolKat23 5 days ago
  
  Language and math are a world model of physical reality. You could not read a book and make sense of it if this were not true.
  An apple falls to the ground because of? gravity.
  In real life this is the answer, I'm very sure the pre-carved channel will also lead to gravity.
  
  4 replies →