Comment by arnaudsm

6 months ago

I feel the same, but cannot measure the effect in any context benchmark like fiction.livebench.

Are they aggressively quantizing, or are our expectations silently increasing ?

Yeah, it's hard to measure. Not sure about our expectations, though I recall way better output when I first started using Gemini 2.5 vs now. It seems to be stupider and more headstrong somehow?