Comment by remexre
4 days ago
For each token generated, you only send one token’s worth between layers; the previous tokens are in the KV cache.
4 days ago
For each token generated, you only send one token’s worth between layers; the previous tokens are in the KV cache.
No comments yet
Contribute on Hacker News ↗