Comment by kristjansson

1 year ago

Each token requires the same amount of compute. To a very crude approximation, model performance scales with total compute applied to the task. It’s not absurd that producing more tokens before an answer improves performance, in a way that’s akin to giving the model more time (compute) to think.

0 comments