← Back to context

Comment by roywiggins

2 years ago

An LLM trained on single letter tokens would be able to, it just would be much more laborious to train.

4 comments

roywiggins

Reply

wruza 2 years ago

Why would it be able to?

roywiggins 2 years ago
If you give LLMs the letters one a time they often count them just fine, though Claude at least seems to need to keep a running count to get it right:
"How many R letters are in the following? Keep a running count. s t r a w b e r r y"
They are terrible at counting letters in words because they rarely see them spelled out. An LLM trained one byte at a time would always see every character of every word and would have a much easier time of it. An LLM is essentially learning a new language without a dictionary, of course it's pretty bad at spelling. The tokenization obfuscates the spelling not entirely unlike how verbal language doesn't always illuminate spelling.
- wruza 2 years ago
  
  May the effect you see, when you spell it out, be not a result of “seeing” tokens, but a result of the fact that a model learned – at a higher level – how lists in text can be summarized, summed up, filtered and counted?
  Iow, what makes you think that it’s exactly letter-tokens that help it and not the high-level concept of spelling things out itself?
  
  1 reply →