Comment by ascorbic
3 months ago
I've noticed that R1 says "Wait," a lot in its reasoning. I wonder if there's something inherently special in that token.
3 months ago
I've noticed that R1 says "Wait," a lot in its reasoning. I wonder if there's something inherently special in that token.
Semantically, wait is a bit of a stop-and-breathe point.
Consider the text:
I think I'll go swimming today. Wait, ___
what comes next? Well, not something that would usually follow without the word "wait", probably something entirely orthogonal that impacts the earlier sentence in some fundamental way, like:
Wait, I need to help my dad.
Yes, R1 seems to mostly use it like that. It's either to signal a problem with its previous reasoning, or if it's thought of a better approach. In coding it's often something like "this API won't work here" or "there's a simpler way to do this".
I guess it goes to show how important reiteration is for general logic problems. And tbf when finding a solution to something myself I'll consider each part, and/or consider parts in relation to each other and/or consider all parts in relation to each other (on a higher level) before coming to a final solution.
It's weird because I feel like we should've known that from work in general logic/problem solving studies, surely?
I bet a token like "sht!", "f*" or "damn!" would have the same or even stronger effect but the LLM creators would not like to have the users read them
It's literally in the article, they measured it and wait was the best token
Maybe, but it doesn't just use it to signify that it's made a mistake. It also uses it in a positive way, such as it's had a lightbulb moment. Of course some people use expletives in the same way, but that would be less common than for mistakes.
I think you're onto something, however, as the training is done through on text and not actual thoughts, it may take some experimentation to find these stronger words.