Comment by ehzb2827

7 days ago

GLM 4.7 scores 41.0% on Terminal Bench 2.0 [1] compared to 58.4% for GPT-5.3-Codex-Spark [2].

[1] https://z.ai/blog/glm-4.7 [2] https://openai.com/index/introducing-gpt-5-3-codex-spark/

1 comment

ehzb2827

Which is also bad compared to 5.3 codex. People don't seem to realize that this is not codex 5.3 quality. It's a large step down on the benchmarks to get lower latency.