← Back to context

Comment by roywiggins

1 year ago

An LLM trained on single letter tokens would be able to, it just would be much more laborious to train.

Why would it be able to?

  • If you give LLMs the letters one a time they often count them just fine, though Claude at least seems to need to keep a running count to get it right:

    "How many R letters are in the following? Keep a running count. s t r a w b e r r y"

    They are terrible at counting letters in words because they rarely see them spelled out. An LLM trained one byte at a time would always see every character of every word and would have a much easier time of it. An LLM is essentially learning a new language without a dictionary, of course it's pretty bad at spelling. The tokenization obfuscates the spelling not entirely unlike how verbal language doesn't always illuminate spelling.

    • May the effect you see, when you spell it out, be not a result of “seeing” tokens, but a result of the fact that a model learned – at a higher level – how lists in text can be summarized, summed up, filtered and counted?

      Iow, what makes you think that it’s exactly letter-tokens that help it and not the high-level concept of spelling things out itself?

      1 reply →