Comment by JackYoustra
10 months ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
10 months ago
Standard Transformer KV caches are empirically quite sparse. I wonder if they've made some fix along those lines
No comments yet
Contribute on Hacker News ↗