Comment by sinatra
7 hours ago
I've tried Chinese open models few times before. They were fine, but they didn't come close to the benchmarks they were claiming.
Now, maybe GLM 5.2 is close to Opus 4.7, but I don't wanna keep checking them and keep finding that they're still benchmaxing and aren't at GPT (my choice) or Opus level. The boy who cried wolf, I guess.
Yes, my experience has been the same as yours. I find that the performance of open models is quite acceptable, even good, at one-off questions or small tasks. But they are quite unreliable at long horizon goals.