Comment by CuriouslyC

16 hours ago

This was a problem with older Qwen/MiMo/Kimi models mostly. GLM has always been on the more robust side, and newer iterations from all those labs have improved as well. The only lab I've seen regressing this way is DeepSeek, 3.2 was fairly robust but 4.0 feels more benchmaxxed.

0 comments