Comment by JackYoustra
3 months ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
3 months ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
No comments yet
Contribute on Hacker News ↗