Comment by kadushka
9 days ago
There are LLMs which do not generate one token at a time: https://arxiv.org/abs/2502.09992
They do not reason significantly better than autoregressive LLMs. Which makes me question “one token at a time” as the bottleneck.
Also, Lecun has been pushing his JEPA idea for years now - with not much to show for it. With his resources one could hope we would see the benefits of that over the current state of the art models.
from the article: LeCun has been working in some way on V-JEPA for two decades. At least it's bold, and, everyone says it won't work until one day it might