Comment by XCSme
12 hours ago
In my tests[0] GLM-5.2 is not much better than GLM-5, and overall DeepSeek V4 Flash seems to be the better/more cost-effective choice:
[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...
12 hours ago
In my tests[0] GLM-5.2 is not much better than GLM-5, and overall DeepSeek V4 Flash seems to be the better/more cost-effective choice:
[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...
I think the problem is, as can also be seen on other benchmarks, is that most models nowadays are focused more and more purely on tool calling and coding.
This means, that models are losing more and more general and domain-specific knowledge.
Look at those graphs on ARtificialAnalysis, GLM-5.1 still performs similarly or better:
AA-Omnisicence Accuracy: https://i.snipboard.io/5DYmpx.jpg
IFBench: https://i.snipboard.io/74kg0R.jpg
I still feel like models are not getting any smarter for a few months already, they just changed their training to be focused more on some areas than others, so shifting the intelligence from one place to another, not necessarily increasing the overall intelligence or "AGI" score.
Well, in that example it still seems the big players are increasing overall "intelligence" as Fable tops the list.
OpenAI has big incentives to improve general interligence as a large percentage of users use ChatGPT for support, finances, questions, etc. Not just coding.
man, i love dsv4-flash but i found its weaknesses in complex projects with multiple moving parts. tried kimi 2.6 and it understood and could work on the task. bigger is better..