Comment by mgrunwald_

3 hours ago

I ran this multiple times through GPT-4 and every single time it arrived at the same conclusion. The data was readily available and pretty clear. GPT-5 insisted that the objectively inferior option was better until I gave it my own benchmark data and it was like "Oh okay nevermind".

Gemini's answer was very opinionated and factually correct, whereas Claude gave a more nuanced answer, which was also very good.

1 comment

mgrunwald_

aspenmartin 3 hours ago

This sounds perfectly reasonable and consistent with our current understanding of these models