Comment by andai

6 hours ago

Interesting that Opus 4.7 does better than 4.8. Too bad they didn't test 4.6, too. I witnessed a man here mocked yesterday for insisting it was better than its successors!

Although, the benchies are always tricksy ... On DeepSWE, GPT-5.5 beats Opus-4.8, by a fair margin, but on FrontierCode, the situation is the other way around.