Comment by jxmorris12
2 days ago
> In the previous paragraph, the author makes the case for why Lecun was wrong with the example of reasoning models. Yet, in the next paragraph, this assertion is made which is just a paraphrasing of Yecun's original assertion. Which the author himself says is wrong.
This is a subtle point that may have not come across clearly enough in my original writing. A lot of folks were saying that the DeepSeek finding that longer chains of thought can produce higher-quality outputs contradicts Yann's thesis overall. But I don't think so.
It's true that models like R1 can correct small mistakes. But in the limit of tokens generated, the chance that they generate the correct answer still decays to zero.
I think this is an excellent way to think about LLM's and any other software-augmented task. Appreciate you putting the time into an article. I do think your points supported by the graph of training steps vs. response length could be improved by including a graph of (response length vs. loss) or (response length vs. task performance), etc. Though # of steps correlates with model performance, this relationship weakens as # steps goes to infinity.
There was a paper not too long ago which illuminated that reasoning models will increase their response length more or less indefinitely toward solving a problem, but the return from doing so asymptotes toward zero. My apologies for missing a link.
Thanks for replying, hope it wasn't too critical.
>> But in the limit of tokens generated, the chance that they generate the correct answer still decays to zero.
I don't understand this assertion though.
Lecun's thesis was errors just accumulate.
Reasoning models accumulate errors, track back and are able to reduce it back down.
Hence the hypothesis of errors accumulating (at least asymptotically) is false.
What is the difference between "Probability of correct answer decaying to zero" and "Errors keep accumulating" ?