Comment by pas

7 months ago

can you please elaborate on the wait tokens? what's that? how do they work? is that also from the R1 paper?

5 comments

pas

The same idea is in both the R1 and S1 papers (<think> tokens are used similarly). Basically they're using special tokens to mark in the prompt where the LLM should think more/revise the previous response. This can be repeated many times until some stop criteria occurs. S1 manually inserts these with heuristics, R1 learns the placement through RL I think.

whimsicalism 7 months ago
? theyre not special tokens really
- jebarker 7 months ago
  
  i'm not actually sure whether they're special tokens in the sense of being in the vocabulary
  
  1 reply →

throwaway314155 7 months ago

There's a decent explanation in the article, just FYI.