Comment by trhway

5 months ago

According to the Lecun's model a human walking step by step would have the error compounding with each step and thus would never make it to whatever intended target. Yet, as a toddlers we somehow manage to learn to walk to our targets. (and i'm an MS in Math, Control Systems :)

10 comments

trhway

Kye 5 months ago

A toddler can learn by trial and error mid-process. An LLM using autoregressive inference can only compound errors. The LLDM model paper was posted elsewhere, but: https://arxiv.org/pdf/2502.09992

It basically uses the image generation approach of progressively refining the entire thing at once, but applied to text. It can self-correct mid-process.

The blog post where I found it originally that goes into more detail and raises some issues with it: https://timkellogg.me/blog/2025/02/17/diffusion

psb217 5 months ago

Autoregressive vs non-autoregressive is a red herring. The non-autoregressive model is still susceptible to exponential blow up of failure rate as the output dimension increases (sequence length, number of pixels, etc). The final generation step in, eg, diffusion models is independent gaussian sampling per pixel. These models can be interpreted, like autoregressive models, as assigning log-likelihoods to the data. The average log-likelihood per token/pixel/etc can still be computed and the same "raise per unit error to the number of units power" argument for exponential failure rates still holds.
One potential difference between autoregressive and non-autoregressive models is the types of failures which occur. Eg, typical failures in autoregressive models might look like spiralling off into nonsense once the first "error" is made, while non-autoregressive models might produce failures that tend to remain relatively "close" to the true data.
trhway 5 months ago
>A toddler can learn by trial and error mid-process.
as a result of the whole learning process the toddler in particular learns how to self-correct itself, ie. as a grown up s/he knows, without much trial and errors anymore, how to continue in straight line if the previous step went sideways for whatever reason
>An LLM using autoregressive inference can only compound errors.
That is pretty powerful statement completely dismissing that some self-correction may be emerging there.
- Kye 5 months ago
  
  Can you expand on that? I don't see where it could emerge from.
  
  2 replies →

EdwardDiego 5 months ago

You're comparing our brain to an LLM? We can't even model a nematode's brain.

dimatura 5 months ago

A more apt analogy would be a human trying to walk somewhere with their eyes closed, i.e., what you may know as open-loop control.

jononor 5 months ago

If you close your eyes, error is expected to be higher the further you go.

bena 5 months ago

We don't learn to walk via statistics. So it's not like his model at all.