Comment by nl
2 days ago
> Instead of running the model once (flash) or multiple times (thinking/pro) in its entirety
I'm not sure what you mean here, but there isn't a difference in the number of times a model runs during inference.
2 days ago
> Instead of running the model once (flash) or multiple times (thinking/pro) in its entirety
I'm not sure what you mean here, but there isn't a difference in the number of times a model runs during inference.
I meant going to the likeliest output (flash) or (iteratively) generating multiple outputs and (iteratively) choosing the best one (thinking/pro)
That's not how these models work.
Thinking models produce thinking tokens to reason out the answer.