Comment by remexre
2 months ago
For each token generated, you only send one token’s worth between layers; the previous tokens are in the KV cache.
2 months ago
For each token generated, you only send one token’s worth between layers; the previous tokens are in the KV cache.
No comments yet
Contribute on Hacker News ↗