Comment by bberenberg

3 months ago

In case you’re not sure what S1 is, here is the original paper: https://arxiv.org/html/2501.19393v1

It's linked in the blog post, too. In the first sentence, actually, but for some reason the author never bothered to attach the name to it. As if keeping track of o1, 4o, r1, r2d2, wasn't exhausting enough already.

  • > for some reason the author never bothered to attach the name to it

    Respect for his readers’ intelligence, maybe.

  To enforce a minimum, we suppress the generation of the end-of-thinking token delimiter and optionally append the string “Wait” to the model’s current reasoning trace to encourage the model to reflect on its current generation.

Does this mean that the end-of-thinking delimiter is a single token? Presumably </think> or similar wasn't a single token for the base model. Did they just pick a pair of uncommon single-token symbols to use as delimiters?

EDIT: Never mind, end of thinking is represented with <|im_start|> followed by the word 'answer', so the code dynamically adds/removes <|im_start|> from the list of stop tokens.