Comment by in-silico
1 month ago
> nobody has tried to generalize it for example by combining the recurrence concept with next token prediction
Here you go: https://arxiv.org/abs/2502.05171
1 month ago
> nobody has tried to generalize it for example by combining the recurrence concept with next token prediction
Here you go: https://arxiv.org/abs/2502.05171
Thanks! This seems to work incredibly well.