← Back to context

Comment by zschallz

1 day ago

Curious what people's experience is with these models. Anecdotally I tried these out earlier in the year and found it struggled with pretty basic full-stack coding I was doing, when Sonnet 4.6 and Haiku 4.5 didn't break a sweat. Was hoping to use it while my Claude usage was resetting but was disappointed.

I've been using GLM-5/5.1 for about 6 months and it has been a fairly capable model. I've seen a lot of mixed opinions that tend to align with harness usage so it is worth trying out a couple with a model before writing it off. For example, I'm using crush and have had a good experience while others using CC have had a much more mixed experience. For task complexity, I treat it as I would sonnet with the same care in building out plans/prompts before firing it off and letting it go.

I use intelliJ for much of my development and also set the built in AI tools to use my GLM sub (BYOK) and it has worked out well albeit a bit slow.

Overarll, it's my main model and has been getting better with each release.

I've got a GLM subscription (mostly because I like supporting open model makers, pretty sure my monthly usage is so low that pay-per-token would be more cost effective), so I generally use GLM-5.1 for any personal projects and I use Opus at work.

To be entirely honest I haven't noticed much of a capability gap between the two for the sorts of things I ask of an AI agent. Maybe Opus is _slightly_ smarter or slightly better at long-running tasks but the difference is slim enough it could just be a placebo from the Claude branding / hype.

I'm looking forward to giving GLM-5.2 a spin sometime soon and seeing how it stacks up. If nothing else 1M context is a great improvement, feels like between DeepSeek v4, then MiniMax M3, and now GLM-5.2 adding it 1M is rapidly becoming "table stakes" for agentic models.

Which specific models were you using?

In March I switched to Opencode + Kimi K2.5 and found it was a step behind. I switched to GLM 5.1 and has felt like a step above. Its probably some combination of me forgetting the baseline, model improvements, and OpenCode improvements.

$20 a month has been good enough for my coding use cases. I wouldn't call myself a vibe coder. Stuff I do is create graphs/visualizations, review, polish code, generate toy examples for learning.

They're pretty good for casual use. I mostly use GLM and occasionally sprinkle some opus via api in when I think it'll help

In my experience these models (glm 5.1) struggle after 100K tokens.

  • GLM-5.1 had a coherency bug at launch, it might be worth retrying it if you haven't in a while. It can now use the full 256k context as intended.