Comment by fmbb

1 year ago

> that has nothing to do with their intelligence.

Of course. Because these models have no intelligence.

Everyone who believes they do seem to believe intelligence derives from being able to use language, however, and not being able to tell how many times the letter r is in the word strawberry is a very low bar to not pass.

An LLM trained on single letter tokens would be able to, it just would be much more laborious to train.

  • Why would it be able to?

    • If you give LLMs the letters one a time they often count them just fine, though Claude at least seems to need to keep a running count to get it right:

      "How many R letters are in the following? Keep a running count. s t r a w b e r r y"

      They are terrible at counting letters in words because they rarely see them spelled out. An LLM trained one byte at a time would always see every character of every word and would have a much easier time of it. An LLM is essentially learning a new language without a dictionary, of course it's pretty bad at spelling. The tokenization obfuscates the spelling not entirely unlike how verbal language doesn't always illuminate spelling.

      2 replies →