Comment by nusl

1 day ago

Been using Gemini for a few months, somehow it's gotten much, much worse in that time. Hallucinations are very common, and it will argue with you when you point it out. So, don't have much confidence.

In my experience with chat, Flash has gotten much, much better. It's my go-to model even though I'm paying for Pro.

Pro is frustrating because it too often won't search to find current information, and just gives stale results from before its training cutoff. Flash doesn't do this much anymore.

For coding I use Pro in Gemini CLI. It is amazing at coding, but I'm actually using it more to write design docs, decomp multi-week assignments down to daily and hourly tasks, and then feed those docs back to Gemini CLI to have it work through each task sequentially.

With a little structure like this, it can basically write its own context.

  • I like flash because when it's wrong it's wrong very quickly. You can either change the prompt or just solve the problem yourself. It works well for people who can spot the answer as being "wrong"

  • > Flash has gotten much, much better. It's my go-to model even though I'm paying for Pro.

    Same I think also Pro got worse...

  • interesting out of all "thinking models," I struggle with Gemini the most for coding. Just can't make it perform. I feel like they silently nerfed it over the last months.

I feel the same, but cannot measure the effect in any context benchmark like fiction.livebench.

Are they aggressively quantizing, or are our expectations silently increasing ?

Same here. I stopped using Gemini Pro because on top of it's hard to follow verbosity it was giving contradicting answers. Things that Claude Sonnet 4 could answer.

Speaking of Sonnet, I feel like it's closing the gap to Opus. After the new quotas I started to try it before Opus and now it gets complex things right more often than not. This wasn't my experience just a couple of months ago.

Is the problem mainly with tool use ? and are you using it through AI studio or through the API ?.

I've found that it hallucinates tool use for tools that aren't available and then gets very confident about the results.