Comment by achierius

1 year ago

How is this a notable release? It's strictly worse than Gemini 2.5 on coding &c, and only an iterative improvement over their own models. The only thing that struck me as particularly interesting was the native visual reasoning.

1 comment

achierius

famouswaffles 1 year ago

It's not worse on coding. SWE Bench, Aider, live bench coding all show noticeably better results.