Comment by bberenberg
3 months ago
In case you’re not sure what S1 is, here is the original paper: https://arxiv.org/html/2501.19393v1
3 months ago
In case you’re not sure what S1 is, here is the original paper: https://arxiv.org/html/2501.19393v1
It's linked in the blog post, too. In the first sentence, actually, but for some reason the author never bothered to attach the name to it. As if keeping track of o1, 4o, r1, r2d2, wasn't exhausting enough already.
> for some reason the author never bothered to attach the name to it
Respect for his readers’ intelligence, maybe.
it's also the first link in the article's first sentence
Good call, I must have missed it. I read the whole blog then went searching for what S1 was.
Does this mean that the end-of-thinking delimiter is a single token? Presumably </think> or similar wasn't a single token for the base model. Did they just pick a pair of uncommon single-token symbols to use as delimiters?
EDIT: Never mind, end of thinking is represented with <|im_start|> followed by the word 'answer', so the code dynamically adds/removes <|im_start|> from the list of stop tokens.
I don't know what R1 is either
It’s the DeepSeek reasoning model.