Comment by aithrowawaycomm
1 year ago
I am aware of errors in computations that can be fixed by better tokenization (e.g. long addition works better tokenizing right-left rather than L-R). But I am talking about counting, and talking about counting words, not characters. I don’t think tokenization explains why LLMs tend to fail at this without CoT prompting. I really think the answer is computational complexity: counting is simply too hard for transformers unless you use CoT. https://arxiv.org/abs/2310.07923
Words vs characters is a similar problem, since tokens can be less one word, multiple words, or multiple words and a partial word, or words with non-word punctuation like a sentence ending period.