Comment by Centigonal

10 months ago

Gemini likely uses something based on RingAttention to achieve its long context sizes. This requires massive inference clusters, and can't be the same approach llama4 is using. Very curious how llama4 achieves its context length.

0 comments