Comment by JackYoustra
13 days ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
13 days ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
No comments yet
Contribute on Hacker News ↗