Comment by rahimnathwani
2 months ago
To enforce a minimum, we suppress the generation of the end-of-thinking token delimiter and optionally append the string “Wait” to the model’s current reasoning trace to encourage the model to reflect on its current generation.
Does this mean that the end-of-thinking delimiter is a single token? Presumably </think> or similar wasn't a single token for the base model. Did they just pick a pair of uncommon single-token symbols to use as delimiters?
EDIT: Never mind, end of thinking is represented with <|im_start|> followed by the word 'answer', so the code dynamically adds/removes <|im_start|> from the list of stop tokens.
No comments yet
Contribute on Hacker News ↗