Comment by bberenberg

5 months ago

In case you’re not sure what S1 is, here is the original paper: https://arxiv.org/html/2501.19393v1

8 comments

bberenberg

It's linked in the blog post, too. In the first sentence, actually, but for some reason the author never bothered to attach the name to it. As if keeping track of o1, 4o, r1, r2d2, wasn't exhausting enough already.

kgwgk 5 months ago

> for some reason the author never bothered to attach the name to it
Respect for his readers’ intelligence, maybe.

mi_lk 5 months ago

it's also the first link in the article's first sentence

bberenberg 5 months ago

Good call, I must have missed it. I read the whole blog then went searching for what S1 was.

rahimnathwani 5 months ago

  To enforce a minimum, we suppress the generation of the end-of-thinking token delimiter and optionally append the string “Wait” to the model’s current reasoning trace to encourage the model to reflect on its current generation.

Does this mean that the end-of-thinking delimiter is a single token? Presumably </think> or similar wasn't a single token for the base model. Did they just pick a pair of uncommon single-token symbols to use as delimiters?

EDIT: Never mind, end of thinking is represented with <|im_start|> followed by the word 'answer', so the code dynamically adds/removes <|im_start|> from the list of stop tokens.

dagurp 5 months ago

I don't know what R1 is either

latexr 5 months ago

It’s the DeepSeek reasoning model.