← Back to context

Comment by rednb

1 month ago

I'e been using their models pretty much daily for the past 2 months to work on the codebase of a very complex B2B2C platform written in an unusual functional language (F#) with an angular frontend.

I also use Claude premium daily for another client, and i use Codex. and i can tell you that GLM5 is at this point much more capable than Claude and Codex for complex backend end work, complex feature planning, and long horizon tasks. One thing i've noticed is that it is particularly good at following instructions and guidelines, even deep into the execution of a plan.

To me the only problem is that z.ai have had trouble with inference : the performance of their API has been pretty poor at times. It looks like this is an hardware issue related to the Huawei chips they use rather than an issue with the model itself. The situation has been substantially improving over the past few weeks.

GLM5.1, GLM5-Turbo and GLM5v are at this point better than Opus, Codex, Gemini and other claude source models. We have reached a major turning point. To me, the only closed source model still in the game is codex as it is much faster at executing simple tasks and implementing already created plans.

Try GLM5v for your PDF work, it's their last generation vision model that has been released a couple of days ago.

Does anyone have inside info on what these Huawai chips look like? I know Google has a Torus architecture unlike Nvidias fully connected one. Maybe it’s a similar architectural decision on the huawai chips that leads to bottlenecks in serving?

Plenty of other providers that offer much faster inference on GLM-5.1. Friendli, GMICloud, Venice, Fireworks, etc. And can be deployed through Bedrock already as well. Will probably be available generally in Bedrock soon, I would guess.

better than Opus? not even close. after struggling thru server overload for the past couple hours i finally put 5.1 thru the paces and it's....okay. failed some simple stuff that Sonnet/Opus/Gemini didn't. failed it badly and repeatedly actually. this was in typescript, btw. not sure if i'll keep the subscription or not

[flagged]

  • I appreciate that it's not working for your use case but it's unfortunate that you dismiss the experience of others. And i am not chinese, I am European. Thanks for your feedback anyway.

  • I tried Gemini 3.1 pro once to implement a previously designed 7-phase plan. it only implemented a quarter of the plan before stopping, the code didnt even compile because half of the scaffolding was missing. it then confidently said everything was done.

    Codex and GLM didnt have any issue following the exact same plan and getting a working app. So I would argue Gemini is the failure here.

  • Sounds like you two are taking pass each other. PDF work is a specific niche that according to you it fails, the other person say it's good at coding.

    • Scroll down to my other comment, I've used it specifically for coding as well.

      "It couldn't even debug some moderately complicated python scripts reliably."

“GLM5…better than Opus, Codex, Gemini…”

What wild claim to make. Unsupported by benchmarks, unsupported by the consensus of the community, no evidence provided.

Sounds like in another comment here even the GLM5 team concedes they are behind the frontier wrt tool calling, do you know something they don’t?

  • I know my use case and my personal experience :) i am not trying to pretend that it is the best in benchmarks, just sharing my experience so people know that some folks are having a very good experience with GLM models, compared to the competition.

    My only goal is to encourage people to try it out so they can see if it moves the needle for them, because there are fair chances that it will. I am not trying to start a flamewar or something.

    • It’s not a flame war, and you’re not just sharing your experience and encouraging others to try it out.

      You’re making a claim, and I’m pointing out that it’s unsubstantiated and not consistent with any other source of data, including that internal to the company that makes the model.

      I hope you can see that that’s different than saying it’s worked well for me

      1 reply →