← Back to context

Comment by andai

5 hours ago

Interesting that Opus 4.7 does better than 4.8. Too bad they didn't test 4.6, too. I witnessed a man here mocked yesterday for insisting it was better than its successors!

Although, the benchies are always tricksy ... On DeepSWE, GPT-5.5 beats Opus-4.8, by a fair margin, but on FrontierCode, the situation is the other way around.

The only benchmark you can trust is your actual workload!