← Back to context

Comment by owentbrown

1 day ago

From your experience, is it comparable to Claude Code with Opus 4.8? How does it feel? How do the two differ?

From my experience (~10M tokens on each): Opus remains better at more or less everything, and seems to hold its own better in large context work. It’s also more thorough. That said, GLM is worlds cheaper and a great little planner if you do a couple of rounds covering edge cases or things it might have forgotten.

It's comparable, but not the same.

For some tasks, it's better. Opus refuses tasks for me pretty regularly. GLM 5.2 has never refused a task. So for anything security-related or that touches on topics that trigger Opus's safety guardrails, I use GLM 5.2.

OTOH, for anything related to UI design, I use Opus 4.8. It's much better at taking relatively vague descriptions of user interfaces and a mockup of a related UI and combining them into an immaculate design.

For anything else, I tend to run tasks in Opus and then have GLM review them and write a Markdown file with anything it finds. Then I have Opus review the markdown file and fix the issues it agrees with. The reason I usually go with Opus 4.8 first is mainly that it's faster. Opus 4.8 is, on average, about twice as fast as GLM 5.2 running on z'ai's infrastructure for the same task. There's a large variance (sometimes GLM 5.2 is pretty fast and Opus 4.8 is pretty slow), but on average it's a very noticeable difference.

When I run into Anthropic's Quota, I switch to GLM 5.2 rather than Sonnet. I don't think there's much reason to ever use Sonnet for anything if you can use GLM 5.2 instead.

This is all pretty subjective, of course. On average, I think Opus 4.8 is still a better, more reliable, and faster model, but if it went away tomorrow and I only had GLM 5.2, I wouldn't be too sad about it; I'd get things done with GLM 5.2 just fine.

  • What kinds of tasks does Opus refuse? I’m a light daily user for the past 3 months and Opus has never refused a task for me.

    • The later Opus models (4.7/4.8), Sonnet 5, and particularly Fable 5 will refuse to do tasks related to offensive security.

      One example I've hit is working on a benchmark of how well LLMs handle Kubernetes security tasks, there's a section on them exploiting security misconfigurations. Opus 4.6 was fine with that section, 4.7 and 4.8 saw some refusals and Fable point blank refused to do any of it.

      The only other model I've seen refuse is OpenAI GPT-5.5, all the open weight models seem fine with it.

      Ofc if you need to do that kind of work a lot you might be able to get on OpenAI/Anthropics allow-list for cyber work.

    • One project I have deals with countries, and any time it touches code related to countries, it stops.

      I've also had it refuse security-related tasks, and occasionally it stops without any discernible reason.

    • I’ve never had a refusal coding, and in some areas (AI red teaming specifically) I’ve found it quite good at recognizing and discussing “white hat” stuff that in the past I think would have got refusals.

      But when there was the Hantavirus thing a while back, I asked it if there was a vaccine under development and got a refusal immediately. I’ve had a few like that. It seems they’ve implemented really poor guardrails on certain topics (CBRN and cyber) that have lots of false positives. But if you actually chat with the model itself it’s quite lucid about what is legitimately dangerous and what is just performative “AI Safety” style refusal.

      3 replies →

  • Are you micromanaging your GLM costs? It seems the best bang for buck strategy right now is a Opencode Go subscription to get the subsidized rate and then switch to Openrouter's model above and beyond that + make use of a dual model strategy by having GLM 5.2 do planning and Deepseek V4 Flash for implementation.

  • Do you guys use it through open router? Do you have any concerns about how the data you send is being intercepted? Not that I trust Anthropic but it’s widely agreed that it’s kosher to use them for commercial work, I can’t see comfortably sending any customer data to openrouter.

    Edit- I see down-thread you use z.ai directly. Same concern, aren’t you worried about using it for professional stuff.

    • I'm worried, but I'm worried about all of these providers. There's a good chance Anthropic and OpenAI will go bankrupt in the next five to ten years, and all of their data will go to the highest bidder.

      There's no customer data sent to anyone, though. I run OpenCode and Claude Code in a Docker container that only has access to a subset of my code base. There are no secrets in there, and I'm vaguely ok with z.ai using this to train their models.