Comment by Ms-J

1 month ago

Z.ai and their GLM models are pretty low quality.

I've been testing it for awhile now since it seemed to have potential as a local model.

With this new update it still cannot parse simple, test PDFs correctly. It inconsistently tells me that the value in the name field in the document is incorrect, and has the name reversed to put the last name first. Or that a date is wrong as it's in the past/future, when it is not. Tons of fundamental errors like that.

Even when looking at the thinking process there are issues:

I used a test website for it to analyze and it says that the sites copyright year states 2026 which is in the future and to investigate as it could be an attack, but right after prints today's correct date.

I'm in the process of trying to get it uncensored. Hopefully that will create some use out of z.ai

Edit: by the way, which is the best uncensored model at the moment?

32 comments

Ms-J

rednb 1 month ago

I'e been using their models pretty much daily for the past 2 months to work on the codebase of a very complex B2B2C platform written in an unusual functional language (F#) with an angular frontend.

I also use Claude premium daily for another client, and i use Codex. and i can tell you that GLM5 is at this point much more capable than Claude and Codex for complex backend end work, complex feature planning, and long horizon tasks. One thing i've noticed is that it is particularly good at following instructions and guidelines, even deep into the execution of a plan.

To me the only problem is that z.ai have had trouble with inference : the performance of their API has been pretty poor at times. It looks like this is an hardware issue related to the Huawei chips they use rather than an issue with the model itself. The situation has been substantially improving over the past few weeks.

GLM5.1, GLM5-Turbo and GLM5v are at this point better than Opus, Codex, Gemini and other claude source models. We have reached a major turning point. To me, the only closed source model still in the game is codex as it is much faster at executing simple tasks and implementing already created plans.

Try GLM5v for your PDF work, it's their last generation vision model that has been released a couple of days ago.

0x008 1 month ago
Does anyone have inside info on what these Huawai chips look like? I know Google has a Torus architecture unlike Nvidias fully connected one. Maybe it’s a similar architectural decision on the huawai chips that leads to bottlenecks in serving?
- logicchains 1 month ago
  
  https://www.huawei.com/en/news/2026/3/mwc-superpod-ai
  >For AI computing, the Atlas 950 SuperPoD, powered by UnifiedBus, integrates 64 NPUs per cabinet and can scale up to 8,192 NPUs, delivering superior performance for large-scale AI training and high-concurrency inference.
blazarquasar 1 month ago

Plenty of other providers that offer much faster inference on GLM-5.1. Friendli, GMICloud, Venice, Fireworks, etc. And can be deployed through Bedrock already as well. Will probably be available generally in Bedrock soon, I would guess.
electroglyph 1 month ago

better than Opus? not even close. after struggling thru server overload for the past couple hours i finally put 5.1 thru the paces and it's....okay. failed some simple stuff that Sonnet/Opus/Gemini didn't. failed it badly and repeatedly actually. this was in typescript, btw. not sure if i'll keep the subscription or not
Ms-J 1 month ago
[flagged]
- rednb 1 month ago
  
  I appreciate that it's not working for your use case but it's unfortunate that you dismiss the experience of others. And i am not chinese, I am European. Thanks for your feedback anyway.
  
  1 reply →
- dahrkael 1 month ago
  
  I tried Gemini 3.1 pro once to implement a previously designed 7-phase plan. it only implemented a quarter of the plan before stopping, the code didnt even compile because half of the scaffolding was missing. it then confidently said everything was done.
  Codex and GLM didnt have any issue following the exact same plan and getting a working app. So I would argue Gemini is the failure here.
- nsonha 1 month ago
  
  Sounds like you two are taking pass each other. PDF work is a specific niche that according to you it fails, the other person say it's good at coding.
  
  1 reply →
WhitneyLand 1 month ago
“GLM5…better than Opus, Codex, Gemini…”
What wild claim to make. Unsupported by benchmarks, unsupported by the consensus of the community, no evidence provided.
Sounds like in another comment here even the GLM5 team concedes they are behind the frontier wrt tool calling, do you know something they don’t?
- rednb 1 month ago
  
  I know my use case and my personal experience :) i am not trying to pretend that it is the best in benchmarks, just sharing my experience so people know that some folks are having a very good experience with GLM models, compared to the competition.
  My only goal is to encourage people to try it out so they can see if it moves the needle for them, because there are fair chances that it will. I am not trying to start a flamewar or something.
  
  3 replies →

adrian_b 1 month ago

I do not know if it is good, because I have not tested it yet, but the most recent uncensored model is:

https://huggingface.co/trohrbaugh/gemma-4-31b-it-heretic-ara...

which was produced immediately after Google released their new Gemma 4 model.

uvu 1 month ago

Completely agree with this statement "Z.ai and their GLM models are pretty low quality." I have been trying out and it's kind of useless compare to SOTA models.

adrian_b 1 month ago

I do not doubt your experience, but such statements should always be qualified by specifying the kind of tasks for which you have tried the models.
For all existing models, including for all SOTA models, you can find contradictory statements, that they suck and that they are great.
It is very likely that all these statements are true simultaneously, because each model may succeed for some tasks and fail for others, so without specifying the tested tasks any claim that a model was good or bad is worthless.
Ms-J 1 month ago

[flagged]

ra 1 month ago

I still use GLM 4.7 for well defined coding tasks. I never got 5.0 to work satisfactorily, it felt like a hosting problem (z.ai) where it would work for a while then, for whatever reason, it couldn't respond to the context any more - but that's just a hunch.

I had no such trouble with 4.7 and find it fast and productive. Haven't tried 5.1; am using openAI models for coding most of the time.

knbknb 1 month ago

Same here.
Z.ai seem to promote 4.7 for smaller tasks, 5.1 for larger tasks (similar to Anthropic's recommendation for usage of Haiku and Sonnet/Opus models).
5.1 works for me already in the most economical basic paid tier ("lite coding plan"), unlike first release of v5 (5.0 ?)
stavros 1 month ago
I hit this as well. It just seems to hang and process for ages.
- alexfortin 1 month ago
  
  Try lowering thinking level with GLM-5.1, to me that seems to have an impact on mitigating the blocking behaviour.
  
  1 reply →

victorbjorklund 1 month ago

I don't agree. I think their models are pretty good. The company's infrastructure though seems to be so so.

orbital-decay 1 month ago

>by the way, which is the best uncensored model at the moment?

There are no such models, depending on your definition of censorship. If you're referring to abliteration and similar automated techniques, they're snake oil.

CamperBob2 1 month ago
That is absolutely not the case. Try HauHauCS's Qwen 3.5 models. They don't refuse anything, and they don't lose a noticeable amount of capability.
- orbital-decay 1 month ago
  
  Refusal training is only one part of censorship (hence "depending on your definition"). Most permanent biases baked in by the devs are impossible to correct automatically.
- Ms-J 1 month ago
  
  Thanks for the tip.
  That user has produced other interesting models, including aggressively uncensored variants which claim 0 refusals.
  I will definitely try it as there can never be too many uncensored AI implementations.

chimpanzee2 1 month ago

From what I gather qwen is currently the undisputed local LLM king.

joelwallis 1 month ago

So you're saying it's pretty low quality because it failed specifically to parse PDFs?