Comment by roywiggins
1 year ago
An LLM trained on single letter tokens would be able to, it just would be much more laborious to train.
1 year ago
An LLM trained on single letter tokens would be able to, it just would be much more laborious to train.
Why would it be able to?
If you give LLMs the letters one a time they often count them just fine, though Claude at least seems to need to keep a running count to get it right:
"How many R letters are in the following? Keep a running count. s t r a w b e r r y"
They are terrible at counting letters in words because they rarely see them spelled out. An LLM trained one byte at a time would always see every character of every word and would have a much easier time of it. An LLM is essentially learning a new language without a dictionary, of course it's pretty bad at spelling. The tokenization obfuscates the spelling not entirely unlike how verbal language doesn't always illuminate spelling.
May the effect you see, when you spell it out, be not a result of “seeing” tokens, but a result of the fact that a model learned – at a higher level – how lists in text can be summarized, summed up, filtered and counted?
Iow, what makes you think that it’s exactly letter-tokens that help it and not the high-level concept of spelling things out itself?
1 reply →