Comment by XCSme

12 hours ago

In my tests[0] GLM-5.2 is not much better than GLM-5, and overall DeepSeek V4 Flash seems to be the better/more cost-effective choice:

[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...

3 comments

XCSme

XCSme 12 hours ago

I think the problem is, as can also be seen on other benchmarks, is that most models nowadays are focused more and more purely on tool calling and coding.

This means, that models are losing more and more general and domain-specific knowledge.

Look at those graphs on ARtificialAnalysis, GLM-5.1 still performs similarly or better:

AA-Omnisicence Accuracy: https://i.snipboard.io/5DYmpx.jpg

IFBench: https://i.snipboard.io/74kg0R.jpg

I still feel like models are not getting any smarter for a few months already, they just changed their training to be focused more on some areas than others, so shifting the intelligence from one place to another, not necessarily increasing the overall intelligence or "AGI" score.

HDBaseT 1 hour ago

Well, in that example it still seems the big players are increasing overall "intelligence" as Fable tops the list.
OpenAI has big incentives to improve general interligence as a large percentage of users use ChatGPT for support, finances, questions, etc. Not just coding.

sourcecodeplz 12 hours ago

man, i love dsv4-flash but i found its weaknesses in complex projects with multiple moving parts. tried kimi 2.6 and it understood and could work on the task. bigger is better..