Comment by timr

8 months ago

Yeah, I'm aware of the benchmarks. Thomas (author of TFA) is also using Gemini 2.5, and his comments are much closer to what I experience:

> For the last month or so, Gemini 2.5 has been my go-to (because it can hold 50-70kloc in its context window). Almost nothing it spits out for me merges without edits.

I realize this isn't the same thing you're claiming, but it's been consistently true for me that the model hallucinates stuff in my own code, which shouldn't be possible, given the context window and the size of the code I'm giving to it.

(I'm also using it for other, harder problems, unrelated to code, and I can tell you factually that the practical context window is much smaller than 2M tokens. Also, of course, a "token" is not a word -- it's more like 1/3 of a word.)

1 comment

timr

simonw 8 months ago

That's why I said 1m tokens, not 2m tokens. I don't trust them for 2m tokens yet.