Comment by spott
6 months ago
KV cache for dense models is order 50% of parameters. For sparse moe models it can be significantly smaller I believe, but I don’t think it is measured in kb.
6 months ago
KV cache for dense models is order 50% of parameters. For sparse moe models it can be significantly smaller I believe, but I don’t think it is measured in kb.
No comments yet
Contribute on Hacker News ↗