Comment by nis0s
2 days ago
> The finding that language models can get better by generating longer outputs directly contradicts Yann’s hypothesis. I think the flaw in his logic comes from the idea that errors must compound per-token. Somehow, even if the model makes a mistake, it is able to correct itself and decrease the sequence-level error rate
I don’t think current LLM behavior is necessarily due to self-correction, but more due to availability of internet-scale data, but I know that reasoning models are building towards self-correction. The problem, I think, is that even reasoning models are rote because they lack information synthesis, which in biological organisms comes from the interplay between short-term and long-term memories. I am looking forward to LLMs which surpass rote and mechanical answer and reasoning capabilities.
I absolutely agree with information synthesis being a big missing piece in the quest to AGI. It's probably something that could eventually be conquered one way or another or just discovered by accident. However, we need to stop and think of the implications of this technology becoming a thing.