Comment by Daniel_Van_Zant

5 months ago

Being able to control how many tokens are spent on thinking is a game-changer. I've been building fairly complex, efficient, systems with many LLMs. Despite the advantages, reasoning models have been a no-go due to how variable the cost is, and how hard that makes it to calculate a final per-query cost for the customer. Being able to say "I know this model can always solve this problem in this many thinking tokens" and thus limiting the cost for that component is huge.

Yup, it's just what we wanted for our coding agent. Codebuff can enter a "Deep thinking" mode and we can tell it to burn a lot of tokens hahaha.