← Back to context

Comment by emmanueloga_

1 day ago

Lecun's thesis: "if we generate outputs that are too long, the per-token error will compound to inevitable failure".

> The finding that language models can get better by generating longer outputs directly contradicts Yann’s hypothesis.

The author's examples show that the error has been minimized for a few examples of a certain length. This doesn't contradict Lecun, afaict.