Comment by zozbot234
8 hours ago
The thing about context/KV cache is that you can swap it out efficiently, which you can't with the activations because they're rewritten for every token. It will slow down as context grows (decode is often compute-limited when context is large) but it will run.
No comments yet
Contribute on Hacker News ↗