Comment by achierius
4 days ago
How is this a notable release? It's strictly worse than Gemini 2.5 on coding &c, and only an iterative improvement over their own models. The only thing that struck me as particularly interesting was the native visual reasoning.
It's not worse on coding. SWE Bench, Aider, live bench coding all show noticeably better results.