Comment by vanviegen
11 hours ago
> When an LLM started to count the numbers of "r"s in "strawberry", OpenAI was taking a victory lap.
Were they? Or did they feel icky about spending way to much post-training time on such a specific and uninteresting skill?
It's not as specific of a skill as you would think. Being both aware of tokenizer limitations and capable of working around them is occasionally useful for real tasks.
What tasks would those be, that wouldn't be better served by using e.g. a Python script as a tool, possibly just as component of the complete solution?
Off the top of my head: the user wants LLM to help him solve a word puzzle. Think something a bit like Wordle, but less represented in its dataset.
For that, the LLM needs to be able to compare words character by character reliably. And to do that, it needs at least one of: be able to fully resolve the tokens to characters internally within one pass, know to emit the candidate words in a "1 character = 1 token" fashion and then compare that, or know that it should defer to tool calls and do that.
An LLM trained for better tokenization-awareness would be able to do that. The one that wasn't could fall into weird non-humanlike failures.
2 replies →