← Back to context

Comment by causal

2 days ago

And the example given was specific to OpenAI models, yet the title is a blanket statement.

I agree with the author that GPT-5 models are much more fixated on solving exactly the problem given and not as good at taking a step back and thinking about the big picture. The author also needs to take a step back and realize other providers still do this just fine.

He tests several Claude versions as well

  • Ah you're right, scrolled past that - the most salient contrast in the chart is still just GPT-5 vs GPT-4, and it feels easy to contrive such results by pinning one model's response as "ideal" and making that a benchmark for everything else.