Comment by Centigonal
13 days ago
Gemini likely uses something based on RingAttention to achieve its long context sizes. This requires massive inference clusters, and can't be the same approach llama4 is using. Very curious how llama4 achieves its context length.
No comments yet
Contribute on Hacker News ↗