Comment by codeulike

7 months ago

Wait, so the trick is they reach into the context and basically switch '</think>' with 'wait' and that makes it carry on thinking?

3 comments

codeulike

danans 7 months ago

Not sure if your pun was intended, but 'wait' probably works so well because of the models being trained on text structured like your comment, where "wait" is followed by a deeper understanding.

gield 7 months ago

Yes, that's explicitly mentioned in the blog post:

>In s1, when the LLM tries to stop thinking with "</think>", they force it to keep going by replacing it with "Wait".

luc4sdreyer 7 months ago

Yes, that's one of the tricks.