Comment by whimsicalism
1 month ago
Even subpar attention quality is typically better than human memory - we can imagine models that do some sort of triaging from shorter high-quality attention context and extremely long linear (or something else) context.
No comments yet
Contribute on Hacker News ↗