Comment by yathaid

2 days ago

>> But the limiting behavior remains the same: eventually, if we continue generating from a language model, the probability that we get the answer we want still goes to zero

In the previous paragraph, the author makes the case for why Lecun was wrong with the example of reasoning models. Yet, in the next paragraph, this assertion is made which is just a paraphrasing of Yecun's original assertion. Which the author himself says is wrong.

>> Instead of waiting for FAA (fully-autonomous agents) we should understand that this is a continuum, and we’re consistently increasing the amount of useful work AIs

Yes! But this work is already well underway. There is no magic threshold for AGI - instead the characterization is based on what percentile of the human population the AI can beat. One way to characterize AGI in this manner is "99.99% percentile at every (digital?) activity".

> In the previous paragraph, the author makes the case for why Lecun was wrong with the example of reasoning models. Yet, in the next paragraph, this assertion is made which is just a paraphrasing of Yecun's original assertion. Which the author himself says is wrong.

This is a subtle point that may have not come across clearly enough in my original writing. A lot of folks were saying that the DeepSeek finding that longer chains of thought can produce higher-quality outputs contradicts Yann's thesis overall. But I don't think so.

It's true that models like R1 can correct small mistakes. But in the limit of tokens generated, the chance that they generate the correct answer still decays to zero.

  • I think this is an excellent way to think about LLM's and any other software-augmented task. Appreciate you putting the time into an article. I do think your points supported by the graph of training steps vs. response length could be improved by including a graph of (response length vs. loss) or (response length vs. task performance), etc. Though # of steps correlates with model performance, this relationship weakens as # steps goes to infinity.

    There was a paper not too long ago which illuminated that reasoning models will increase their response length more or less indefinitely toward solving a problem, but the return from doing so asymptotes toward zero. My apologies for missing a link.

  • Thanks for replying, hope it wasn't too critical.

    >> But in the limit of tokens generated, the chance that they generate the correct answer still decays to zero.

    I don't understand this assertion though.

    Lecun's thesis was errors just accumulate.

    Reasoning models accumulate errors, track back and are able to reduce it back down.

    Hence the hypothesis of errors accumulating (at least asymptotically) is false.

    What is the difference between "Probability of correct answer decaying to zero" and "Errors keep accumulating" ?

Lecun's argument is fundamentally flawed. When I work on a nontrivial problem, I might make mistakes along the way also. That doesn't mean that large multi-step problems are effectively unsolvable. I simply do sanity checks along the way to catch errors and know to correct them.