Comment by in-silico
2 days ago
> nobody has tried to generalize it for example by combining the recurrence concept with next token prediction
Here you go: https://arxiv.org/abs/2502.05171
2 days ago
> nobody has tried to generalize it for example by combining the recurrence concept with next token prediction
Here you go: https://arxiv.org/abs/2502.05171
Thanks! This seems to work incredibly well.