Comment by JackYoustra
14 days ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
14 days ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
No comments yet
Contribute on Hacker News ↗