Comment by Topfi

18 days ago

We have heard this before when 100k and 200k were first being normalized by Anthropic way back when and I tend to be skeptical in general when it comes to such predictions, but in this case, I have to agree.

Having used the previews for the last few weeks with different tasks and personally designed challenges, what I found is that these models are not only capable of processing larger context windows on paper, but are also far better at actually handling long, dense, complex documents in full. Referencing back to something upon specific request, doing extensive rewrites in full whilst handling previous context, etc. These models also have handled my private needle in haystack-type challenges without issues as of yet, though those have been limited to roughly 200k in fairness. Neither Anthropics, OpenAIs, Deepseeks or previous Google models handled even 75k+ in any comparable manner.

Cost will of course remain a factor and will keep RAG a viable choice for a while, but for the first time I am tempted to agree that someone has delivered a solution which showcases that a larger context window can in many cases work reliably and far more seemlessly.

Is also the first time a Google model actually surprised me (positively), neither Bard, nor AI answers or any previous Gemini model had any appeal to me, even when testing specificially for what other claimed to be strenghts (such as Gemini 1.5s alleged Flutter expertise which got beaten by both OpenAI and Anthropics equivalent at the time).