Comment by JackYoustra
1 year ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
1 year ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
No comments yet
Contribute on Hacker News ↗