← Back to context

Comment by GeorgeOldfield

11 hours ago

gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh

Are you really comparing flash to opus? Shouldn't you be comparing pro?

  • The benchmark tables in the Google announcement include Opus 4.7, and the numbers are very impressive. Caveat emptor, but it's not unreasonable to compare a new Flash to a current-gen Opus, even if some of the results confirm expectations

Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.

Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.