Comment by XCSme

12 hours ago

I think the problem is, as can also be seen on other benchmarks, is that most models nowadays are focused more and more purely on tool calling and coding.

This means, that models are losing more and more general and domain-specific knowledge.

Look at those graphs on ARtificialAnalysis, GLM-5.1 still performs similarly or better:

AA-Omnisicence Accuracy: https://i.snipboard.io/5DYmpx.jpg

IFBench: https://i.snipboard.io/74kg0R.jpg

I still feel like models are not getting any smarter for a few months already, they just changed their training to be focused more on some areas than others, so shifting the intelligence from one place to another, not necessarily increasing the overall intelligence or "AGI" score.

1 comment

XCSme

HDBaseT 1 hour ago

Well, in that example it still seems the big players are increasing overall "intelligence" as Fable tops the list.

OpenAI has big incentives to improve general interligence as a large percentage of users use ChatGPT for support, finances, questions, etc. Not just coding.