← Back to context

Comment by KoolKat23

5 days ago

Given a random prompt, the overall probability of seeing a specific output string is almost zero, since there are astronomically many possible token sequences.

The same goes for humans. Most awards are built on novel research built on pre-existing works. This a LLM is capable of doing.

LLMs don't use 'overall probability' in any meaningful sense. During training, gradient descent creates highly concentrated 'gravity wells' of correlated token relationships - the probability distribution is extremely non-uniform, heavily weighted toward patterns seen in training data. The model isn't selecting from 'astronomically many possible sequences' with equal probability; it's navigating pre-carved channels in high-dimensional space. That's fundamentally different from novel discovery.

  • That's exactly the same for humans in the real world.

    You're focusing too close, abstract up a level. Your point relates to the "micro" system functioning, not the wider "macro" result (think emergent capabilities).

    • I'm afraid I'd need to see evidence before accepting that humans navigate 'pre-carved channels' in the same way LLMs do. Human learning involves direct interaction with physical reality, not just pattern matching on symbolic representations. Show me the equivalence or concede the point.

      5 replies →