Comment by krackers
2 months ago
Yeah this part was confusing, because it's only mentioned halfway through the article that the attention step can only be batched across matching context-window sizes.
2 months ago
Yeah this part was confusing, because it's only mentioned halfway through the article that the attention step can only be batched across matching context-window sizes.
No comments yet
Contribute on Hacker News ↗