Comment by GeorgeOldfield

11 hours ago

gemini isn't even that good. just tested 3.5 on usual complex prompts to opus/chat 5.5. meh

4 comments

GeorgeOldfield

k8sToGo 11 hours ago

Are you really comparing flash to opus? Shouldn't you be comparing pro?

CognitiveLens 10 hours ago

The benchmark tables in the Google announcement include Opus 4.7, and the numbers are very impressive. Caveat emptor, but it's not unreasonable to compare a new Flash to a current-gen Opus, even if some of the results confirm expectations

bachmeier 10 hours ago

Who would have guessed that something costing roughly a third as much wouldn't do as well at certain tasks.

kmac_ 11 hours ago

Well, the first impression is that Gemini still goes off the instruction rails easier than other models, but I noticed that it tends to go back to the initial goal without holding a hand, which is a real improvement. It's really interesting that these models behave so differently.