Comment by Buttons840
2 years ago
LLMs can count characters, but they need to dedicate a lot of tokens to the task. That is, they need a lot of tokens describing the task of counting, and in my experience that allows them to accurately count.
2 years ago
LLMs can count characters, but they need to dedicate a lot of tokens to the task. That is, they need a lot of tokens describing the task of counting, and in my experience that allows them to accurately count.
Source? LLMs have no “hidden tokens” they dedicate.
Or you mean — if the tokenizer was trained differently…
Not hidden tokens, actual tokens. Ask a LLM to guess the letter count like 20 times and often it will converge on the correct count. I suppose all those guesses provide enough "resolution" (for lack of a better term) that it can count the letters.
> often it will converge on the correct count
That's a pretty low bar for something like counting words.
That reminds of something I've wondered about for months: can you improve a LLM's performance by including a large amount of spaces at the end of your prompt?
Would the LLM "recognize" that these spaces are essentially a blank slate and use them to "store" extra semantic information and stuff?
but then it will either overfit or you need to train it on 20 times the amount of data ...
2 replies →