Comment by XCSme
2 hours ago
It's 8.3 vs 8.1, I wouldn't call that significantly better.
I think GLM got a bit in front, because on some tests that both got wrong, GLM did sometimes (inconsistently) respond with the correct answer.
That being said, yes, in this case probably with more and more tests added, gpt-5.4 would edge in front, especially if a coding would be added (there are no coding tests yet).
No comments yet
Contribute on Hacker News ↗