← Back to context

Comment by nhmllms

6 months ago

This make me think we should be introducing 'tokens required to answer questions correctly' dimension to each metric. Since letting the model think more verbosely is essentially giving it more compute and memory to answer the question correctly. (not that this is a bad thing, but I would be curious if other models get the answer correctly with the first couple of tokens, or after hundreds of reasoning)