Comment by cubefox
19 days ago
Sounds interesting, but...
> these models dominate both exponential attention and linear attention at long-context training
There is no exponential attention; standard attention is quadratic. Strange mistake.
19 days ago
Sounds interesting, but...
> these models dominate both exponential attention and linear attention at long-context training
There is no exponential attention; standard attention is quadratic. Strange mistake.
No comments yet
Contribute on Hacker News ↗