Comment by arnaudsm
6 months ago
I feel the same, but cannot measure the effect in any context benchmark like fiction.livebench.
Are they aggressively quantizing, or are our expectations silently increasing ?
6 months ago
I feel the same, but cannot measure the effect in any context benchmark like fiction.livebench.
Are they aggressively quantizing, or are our expectations silently increasing ?
Yeah, it's hard to measure. Not sure about our expectations, though I recall way better output when I first started using Gemini 2.5 vs now. It seems to be stupider and more headstrong somehow?