Comment by XCSme

4 months ago

It's 8.3 vs 8.1, I wouldn't call that significantly better.

I think GLM got a bit in front, because on some tests that both got wrong, GLM did sometimes (inconsistently) respond with the correct answer.

That being said, yes, in this case probably with more and more tests added, gpt-5.4 would edge in front, especially if a coding would be added (there are no coding tests yet).

0 comments

XCSme

No comments yet

Contribute on Hacker News ↗