Comment by ascorbic

5 months ago

I've noticed that R1 says "Wait," a lot in its reasoning. I wonder if there's something inherently special in that token.

7 comments

ascorbic

lionkor 5 months ago

Semantically, wait is a bit of a stop-and-breathe point.

Consider the text:

I think I'll go swimming today. Wait, ___

what comes next? Well, not something that would usually follow without the word "wait", probably something entirely orthogonal that impacts the earlier sentence in some fundamental way, like:

Wait, I need to help my dad.

ascorbic 5 months ago
Yes, R1 seems to mostly use it like that. It's either to signal a problem with its previous reasoning, or if it's thought of a better approach. In coding it's often something like "this API won't work here" or "there's a simpler way to do this".
- fennecfoxy 5 months ago
  
  I guess it goes to show how important reiteration is for general logic problems. And tbf when finding a solution to something myself I'll consider each part, and/or consider parts in relation to each other and/or consider all parts in relation to each other (on a higher level) before coming to a final solution.
  It's weird because I feel like we should've known that from work in general logic/problem solving studies, surely?

katzenversteher 5 months ago

I bet a token like "sht!", "f*" or "damn!" would have the same or even stronger effect but the LLM creators would not like to have the users read them

raducu 5 months ago

It's literally in the article, they measured it and wait was the best token
ascorbic 5 months ago

Maybe, but it doesn't just use it to signify that it's made a mistake. It also uses it in a positive way, such as it's had a lightbulb moment. Of course some people use expletives in the same way, but that would be less common than for mistakes.
lodovic 5 months ago

I think you're onto something, however, as the training is done through on text and not actual thoughts, it may take some experimentation to find these stronger words.