Comment by nhmllms
6 months ago
This make me think we should be introducing 'tokens required to answer questions correctly' dimension to each metric. Since letting the model think more verbosely is essentially giving it more compute and memory to answer the question correctly. (not that this is a bad thing, but I would be curious if other models get the answer correctly with the first couple of tokens, or after hundreds of reasoning)
No comments yet
Contribute on Hacker News ↗