← Back to context

Comment by pas

3 months ago

can you please elaborate on the wait tokens? what's that? how do they work? is that also from the R1 paper?

The same idea is in both the R1 and S1 papers (<think> tokens are used similarly). Basically they're using special tokens to mark in the prompt where the LLM should think more/revise the previous response. This can be repeated many times until some stop criteria occurs. S1 manually inserts these with heuristics, R1 learns the placement through RL I think.