Comment by nl
17 hours ago
I don't see why that follows.
The “Rs in strawberry problem” is literally "count the token R" in the word "strawberry".
One could argue that the learnt tokenization model where it is tokenized into 3 tokens (see https://platform.openai.com/tokenizer) is problematic, but one solution to that is to double-down on it and learn tokenization as part of the end-to-end training instead of separately.
If you mean that the idea of the current idea of the tokenization model being entirely fixed then I agree.
(I'm not entirely sure how multi-modal models function in this regard - they must have a idea of the bytestream, but not familiar enough with that to comment intelligently.)
No comments yet
Contribute on Hacker News ↗