Comment by aithrowawaycomm
1 year ago
I think the more obvious explanation has to do with computational complexity: counting is an O(n) problem, but transformer LLMs can’t solve O(n) problems unless you use CoT prompting: https://arxiv.org/abs/2310.07923
This paper does not support your position any more than it supports the position that the problem is tokenization.
This paper posits that if the authors intuition was true then they would find certain empirical results. ie. "If A then B." Then they test and find the empirical results. But this does not imply that their intuition was correct, just as "If A then B" does not imply "If B then A."
If the empirical results were due to tokenization absolutely nothing about this paper would change.
What you're saying is an explanation what I said, but I agree with you ;)
No, it's a rebuttal of what you said: CoT is not making up for a deficiency in tokenization, it's making up for a deficiency in transformers themselves. These complexity results have nothing to do with tokenization, or even LLMs, it is about the complexity class of problems that can be solved by transformers.
There's a really obvious way to test whether the strawberry issue is tokenization - replace each letter with a number, then ask chatGPT to count the number of 3s.
Count the number of 3s, only output a single number: 6 5 3 2 8 7 1 3 3 9.
ChatGPT: 3.