Comment by Oras

17 hours ago

I find it odd that none of OpenAI models was used in comparison, but used Z GLM 5.1. Is Z (GLM 5.1) really that good? It is crushing Opus 4.5 in these benchmarks, if that is true, I would have expected to read many articles on HN on how people flocked CC and Codex to use it.

25 comments

Oras

ac29 17 hours ago

GLM 5.1 is pretty good, probably the best non-US agentic coding model currently available. But both GLM 5.0 and 5.1 have had issues with availability and performance that makes them frustrating to use. Recently GLM 5.1 was also outputting garbage thinking traces for me, but that appears to be fixed now.

cmrdporcupine 16 hours ago
Use them via DeepInfra instead of z.ai. No reliability issues.
https://deepinfra.com/zai-org/GLM-5.1
Looks like fp4 quantization now though? Last week was showing fp8. Hm..
- wolttam 16 hours ago
  
  Deepinfra's implementation of it is not correct. Thinking is not preserved, and they're not responding to my submitted issue about it.
  I also regularly experience Deepinfra slow to an absolute crawl - I've actually gotten more consistent performance from Z.ai.
  I really liked Deepinfra but something doesn't seem right over there at the moment.
  
  1 reply →

coder68 16 hours ago

In fact it is appreciated that Qwen is comparing to a peer. I myself and several eng I know are trying GLM. It's legit. Definitely not the same as Codex or Opus, but cheaper and "good enough". I basically ask GLM to solve a program, walk away 10-15 minutes, and the problem is solved.

Oras 16 hours ago

cheaper is quite subjective, I just went to their pricing page [0] and cost saving compared to performance does not sell it well (again, personal opinion).
CC has a limited capacity for Opus, but fairly good for Sonnet. For Codex, never had issues about hitting my limits and I'm only a pro user.
https://z.ai/subscribe

kardianos 17 hours ago

Yes. GLM 5.1 is that good. I don't think it is as good as Claude was in January or February of this year, but it is similar to how Claude runs now, perhaps better because I feel like it's performance is more consistent.

vidarh 16 hours ago

GLM 5.1 is the first model I've found good enough to spring for a subscription for other than Claude and Codex.

It's not crushing Opus 4.5 in real-life use for me, but it's close enough to be near interchangeable with Sonnet for me for a lot of tasks, though some of the "savings" are eaten up by seemingly using more tokens for similar complexity tasks (I don't have enough data yet, but I've pushed ~500m tokens through it so far.

pros 17 hours ago

I'm using GLM 5.1 for the last two weeks as a cheaper alternative to Sonnet, and it's great - probably somewhere between Sonnet and Opus. It's pretty slow though.

bensyverson 15 hours ago

This is what kills it for me… The long thinking blocks can make a simple task take 30 minutes.

Alifatisk 16 hours ago

GLM-5 is good, like really good. Especially if you take pricing into consideration. I paid 7$ for 3 months. And I get more usage than CC.

They have difficulty supplying their users with capacity, but in an email they pointed out that they are aware of it. During peak hours, I experience degraded performance. But I am on their lowest tier subscription, so I understand if my demand is not prioritized during those hours.

ekuck 16 hours ago
Where are you getting 3 months for $7?
- Alifatisk 14 hours ago
  
  They had a Christmas deal that ended January 31.

culi 15 hours ago

If you only look at open models, GLM 5.1 is the best performance you can get on on the Pareto distribution

https://arena.ai/leaderboard/text?viewBy=plot&license=open-s...

c0n5pir4cy 17 hours ago

I've been using it through OpenCode Go and it does seem decent in my limited experience. I haven't done anything which I could directly compare to Opus yet though.

I did give it one task which was more complex and I was quite impressed by. I had a local setup with Tiltdev, K3S and a pnpm monorepo which was failing to run the web application dev server; GLM correctly figured out that it was a container image build cache issue after inspecting the containers etc and corrected the Tiltfile and build setup.

cleaning 16 hours ago

Most HN commenters seem to be a step behind the latest developments, and sometimes miss them entirely (Kimi K2.5 is one example). Not surprising as most people don't want to put in the effort to sift through the bullshit on Twitter to figure out the latest opinions. Many people here will still prefer the output of Opus 4.5/4.6/4.7, nowadays this mostly comes down to the aesthetic choices Anthropic has made.

Oras 16 hours ago

Not just aesthetics though, from time to time I implement the same feature with CC and Codex just to compare results, and I yet to find Codex making better decisions or even the completeness of the feature.
For more complicated stuff, like queries or data comparison, Codex seems always behind for me.

throwaw12 17 hours ago

maybe they decided OpenAI has different market, hence comparing only with companies who are focusing in dev tooling: Claude, GLM

edwinjm 16 hours ago
Haven’t you heard about Codex?
- throwaw12 16 hours ago
  
  its an SKU from OpenAI's perspective, broader goal and vision is (was) different. Look at the Claude and GLM, both were 95% committed to dev tooling: best coding models, coding harness, even their cowork is built on top of claude code
  
  3 replies →

__blockcipher__ 17 hours ago

Yeah GLM’s great for coding, code review, and tool use. Not amazing at other domains.

esafak 17 hours ago

I use it and think its intelligence compares favorably with OpenAI and Anthropic workhorses. Its biggest weakness is its speed.