Comment by red75prime

9 hours ago

The specific sequence of tokens that comprise the Knuth's problem with an answer to it is not in the training data. A naive probability distribution based on counting token sequences that are present in the training data would assign 0 probability to it. The trained network represents extremely non-naive approach to estimating the ground-truth distribution (the distribution that corresponds to what a human brain might have produced).

6 comments

red75prime

qsera 6 hours ago

>the distribution that corresponds to what a human brain might have produced..

But the human brain (or any other intelligent brain) does not work by generating probability distribution of the next word. Even beings that does not have a language can think and act intelligent.

astrange 3 hours ago
LLMs also don't work by generating probability distributions of the next word. Your explanation isn't able to explain why they can generate words, let alone sentences.
- qsera 2 hours ago
  
  That is exactly how they work.
red75prime 1 hour ago
[Citation needed] Neuroscience isn't yet at a point when it can say this with any certainty.
Anyway. It's not a theorem that you can be intelligent only if you fully imitate biological processes. Like flight can be achieved not only by the flapping wings.
- qsera 43 minutes ago
  
  >you can be intelligent only if you fully imitate biological processes
  It is not that. It is about having an understanding of how it is trained. For example, if it was trained on ideas, instead of words, then it would be closer to intelligent behavior.
  Someone will say that during training it builds ideas and concepts, but that is just a name that we give for the internal representation that results from training and is not actual ideas and concepts. When it learns about the word "car", it does not actually understand it as a concept, but just as a word and how it can relate to other words. This enables it to generate words that include "car" that are consistent, projecting an appearance of intelligence.
  It is hard to propose a test for this, because it will become the next target for the AI companies to optimize for, and maybe the next model will pass it.
  
  1 reply →