Comment by andai
6 hours ago
Interesting that Opus 4.7 does better than 4.8. Too bad they didn't test 4.6, too. I witnessed a man here mocked yesterday for insisting it was better than its successors!
Although, the benchies are always tricksy ... On DeepSWE, GPT-5.5 beats Opus-4.8, by a fair margin, but on FrontierCode, the situation is the other way around.
The only benchmark you can trust is your actual workload!
No comments yet
Contribute on Hacker News ↗