Comment by aabdi
1 day ago
Different models do slight variants.
Usually it’s done in post training to enforce behavior based on prompt. Ie. System prompt with thinking:max or low or wtv.
Enforcement then goes via constrained decoding, checking for think token start and end with max lengths, or other variations
No comments yet
Contribute on Hacker News ↗