I'm somewhat surprised that this is not open source (from what I can tell). Compare to Mimo Code https://github.com/XiaomiMiMo/MiMo-Code (which is a CLI, while this is a desktop app).
I don't even know what I would do with a desktop app. I'm running these things in headless VMs, so I can run them with `--dangerously-skip-permissions` or whatever. I don't trust them, even without that flag, on my desktop/laptop.
I've contributed to https://github.com/0xferrous/agent-box which allows you to bind-mount git repositories into containers that agents operate in, preventing the agents from accessing files that aren't bind-mounted. Your usual .gitignore can then be used to also ignore files within the repo to be bind-mounted, which prevents agents from accessing them at all, essentially working as a sandbox.
I also maintain https://github.com/nothingnesses/agent-images which allows you to use Nix to reproducibly spin up OCI container images containing agents and any other tools you need for development and use these with agent-box.
I use both at the moment to work on some personal projects with agents, where I set up multiple separate git worktrees for the agents to work in, preventing them from accessing anything outside of the worktrees and from trampling over each other's work.
a well-design IDE should abstract that away, i.e. run the agent in the headless VMs while give you an abstraction that you would feel like you are running the agent locally with all the benefits (editor, browser, diffs, debugger, etc)
I shared your fear some weeks/months ago so I was always using my harness in the cloud. However, latency started to become an issue when I traveled to other countries where I needed a VPN... so I ended up cooking skynot to be able to trust running my harness in my own computer: https://github.com/tarsgate/skynot (PRs welcome if you want to add support for another harness different than Pi)
What's your setup like and what do you use it for?
I have a M2 Max MBP with plenty of ram and I use VSCode + Zoo Code plugin with Qwen3-Coder-Next-GGUF:UD-Q4_K_XL to run local agentic coding sessions, but I'm intrigued by being able to run headless as I could probably run multiple instances in parallel to do stuff?
Like are you using UTM with some pre-built VM and a local LLM?
It's only a cli because they yanked out the opencode desktop code. (As well as the opencode go/zen model provider)
Edit: my theory is they wanted to mimic being the primary provider in a quick way with a lot of string replace. Though they could have added opencode back as a regular provider.
MiMo Code adds a lot of cool orchestration features to OpenCode! It definitely is NOT a quick find-replace job, it's genuinely someone's research project to create a better agent harness building on top of free software, and that's awesome. See https://mimo.xiaomi.com/blog/mimo-code-long-horizon
You're surprised? I think harnesses are almost as important as the underlying model. Folks have been able to improve benchmark results by nearly 2x based on harness alone.
Harnesses are quickly becoming critical components of the "model" itself imo. Not shocking to me at all that a company that spots a revenue opportunity is keeping its harness closed source.
I'm a neophyte. What makes a harness special or all that unique from another? I've had a reasonable experience with Zed and local models, but could be persuaded to put something else in the mix if there is a measurable benefit to be had.
They might be sending some user requests to Anthropic to gather trading data for their own models. If they do so, perhaps they need to add some tracer to request that they prefer to hide.
Given that there's such severe concern being expressed by Anthropic about Claude being distilled, and the idea that the harness is part of the the moat, it doesn't seem super surprising that the other side of that would try to also make it harder for them to tell how well they're doing and what their approach is.
I don't find a closed-source Chinese agent system trustworthy.
It is essentially a black box with full user permissions, meaning you are just handing over your entire system to a Chinese-owned server. With OpenCode and its GLM provider, at least I can monitor which files were read, which were edited, and what commands were executed.
Not to mention that Chinese national security laws legally obligate companies to cooperate with state intelligence and counter-espionage efforts [0]. If you have this installed on a corporate workstation, and your company is large enough, the possibility of them spying on you is not just a risk—it's almost a certainty.
I am not surprised it is not open source. These harnesses are hard to build - they are not just wrappers - and often they contain business logic that is not suitable for public distribution for all kinds of reasons.
Interesting to see how their harness will show up here. So far, https://cursor.com/evals
this even shows still a big gap in performance, and almost no real win in terms of money vs gpt5.5 and sonnet 5.
Which make me raise a question. Why would I install a close source black box, that will send data to a country that you can't make legally liable for even most crazy miss doings.
The market of a hosted commercial version of glm is very weird. yeah you can deploy an open source version or run it locally, sure. This.... hm, i don't know why any company would take any risks to use GLM
How is this different from installing Claude Code or Codex? Maybe in the US someone has some hopes about having those makers "legally liable", but in Europe both US and China products seem equally far away and equally closed in all senses.
Yeah that’s why Google lost 4 billion to EU for some elusive non competitive practices. Every single big tech us company is never out of court in eu. Good luck brining any of Chinese companies for distillation . They steal your IP , your data and you have 0 protection .
If you're already used to your TUI coding agent, you don't need the desktop agent. Although it is nice that it is there for folks who prefer the Codex App/Claude App UI approach.
Yeah, I use GLM 5.2 in OpenCode, running in a Docker container with CodeNomad as the web-based GUI. It works perfectly; I can access it from anywhere, and it runs all models (except for Anthropic's subscriptions).
Looks quite pretty! Not sure if I want to try that instead of OpenCode, maybe. OpenCode also has a desktop app, I will admit that I like their TUI one better (and honestly more than Claude Code TUI) but whole the desktop version is kinda more basic, it's nice enough: https://opencode.ai/download
That said, it's interesting that they're releasing a bunch of stuff: ZCode, OCR.z.ai, Image.z.ai, Audio.z.ai, AutoClaw and some other stuff that https://chat.z.ai/ links to. That's a lot of stuff for one org to pull off.
Figured I'd try out their Pro coding plan, seems like it doesn't necessarily give me that much quota than Opus (at least given how many tokens are needed for accomplishing a certain task), but GLM 5.2 in of itself seems like a beefier Sonnet model, pretty good.
Have not used claude code but have used opencode tui a LOT and it does seem to crash quite a bit. Not like it breaks every session but enough that I have come to expect it but still not bothered enough to change. I don't like switching setups mostly
It's impressive all these companies are getting away with "base usage allowance included" [1] or "standard limits" [2], layering the higher plans as a multiplier of that "base" but never disclosing what it is.
I guess the base is whatever the profit margin needs to be this month.
Now, if only we can figure out what all the others are providing as part of their subscriptions we can compare. (Though 3 million tokens of the top model per day seems kinda low. But, I guess that's what the 5x plan is for. I'd still like to be able to compare against all the big providers.)
GLM 5.2 is in an uncanny valley where it's too big to run at home, too expensive and slow in comparison to similarly capable model (a good chart here - https://deepswe.datacurve.ai/), and that's just comparing API prices.
When looking at subscription offering by Anthropic and OpenAI, it's not even comparable, as a Codex $200 subscription can easily use a billion tokens per week on GPT 5.5 high/xhigh.
It's an interesting model from the perspective of being the most capable open weight model. But it doesn't have a solid place in this marketplace right now.
I tried it for a couple of hours this morning and yeah, it's a bit slow, and I needed it in peak hours so it also often can't reach the server so that makes it even slower. And I'm not even sure it's just the model, it could very much be the harness. Stalled for 40 minutes on trivial tool calls like `find`, two times...
It shows potential, answer/code quality was solid, but I would need more time with it.
no ACP support it seems :( Of all the AI buzzwords I love ACP because of the separation of concern. Let the editor be an editor, the harness be the ai code agent, and the llm be the llm
I would very much agree. Even the hand icon, the usage in the text field, and the sidebar style are 1:1 identical to Codex. It's a misleading title - it's not close the Claude Code.
Does anyone use an agnostic TUI or harness for development tasks that can fairly seamlessly switch between providers?
I'm wanting local context in the spirit of "here are 3 AI providers available, for coding tasks use this one... and for writing prose use this one... and for generating images use this one..." etc.
OpenCode was the first agent harness I used, and I have always like it. You can configure a wide variety of providers, but it's open source and has a number of core contributors.
The other opinionated option is Pi (the Pi agent harness). This is a great lightweight option and also supports a number of providers. You can also use local model servers.
have used both pi and opencode for the last 6 months, haven't opened a proprietary harness (cc, codex, cursor) in that same amount of time. right now i'm on pi and i can switch seamlessly between any model across any provider i want, even mid session. can even point them at locally running models.
i think people don't realize how much better life is over on this side, cc and codex rely entirely on vendor lock in imo.
You can do this with role-model, the model router I've built. It routes based on roles and tasks among other things. It has an extension for Pi that lets your coding agent specify request metadata for roles and capabilities etc.
If you haven't yet you should give a chance to https://pi.dev
I've been using it exclusively (and extending it, see https://a.l3x.in/ai) for months with mainly GLM-4.7 then 5.1 and now 5.2 and I could hardly be any happier.
I'm still working on a "Github/Forgejo first" based workflow but also quite happy with it already, basically most of my sessions run as a ci/cd job (triggered by "/pi" comments) and generate PRs or push commits to PRs, see https://github.com/shaftoe/pi-coding-agent-action
I’ve written a skill for codex and Claude code that designates an orchestrator on the primary worktree and is agnostic about what type of AI workers are on the N supporting worktrees.
The orchestrator knows which AI client is running in any given worktree, so it would be fairly easy to designate which AI should receive what kind of tasks.
You run either Claude or Codex in tabs for each work tree.
I do have some AI TUI specific instructions, for instance codex is primitive at monitoring compared to CC. So, there are additional notes for Codex workers on how to properly monitor for new "mail."
You work with the orchestrator on the primary worktree and allow it to delegates tasks to the workers and answer their smaller questions.
It surfaces results and assisting them with context clearing when needed.
The orchestrator and workers communicate using a simple shared file system under tmp/* and together they can handle a big and varied workload.
I use iterm2, so I’ve also added iterm2 specific python that allows the orchestrator to “kick” a worker or perform tasks otherwise veto'd by the TUIs (ie /clear) by modifying the input and submitting it.
I use the one that I've been developing since 2023. It's intended to be used in exactly this spirit! Written in Go, has image support (which has yet to be fleshed out).
It supports MCP (unlike Pi), sandboxing (with user-mode networking), and runs efficiently at huge contexts.
i like Chinese open weight model that offer cheap token but i only use it for my personal project.
China have a history of stealing IPs/trade secrets and Chinese court favored its own local companies. while US have a robust court that can enforce IPs. if you want to risk your company's IPs/trade secrets/data for some cheap token. Go ahead and use Z.ai's services.
I don't find a closed-source Chinese agent system trustworthy.
It is essentially a black box with full user permissions, meaning you are just handing over your entire system to a Chinese-owned server. With OpenCode and its GLM provider, at least I can monitor which files were read, which were edited, and what commands were executed.
Not to mention that Chinese national security laws legally obligate companies to cooperate with state intelligence and counter-espionage efforts [0]. If you have this installed on a corporate workstation, and your company is large enough, the possibility of them spying on you is not just a risk—it's almost a certainty.
Thank you. It doesn't make sense to me how much people trust our companies so much more than Chinese ones for no reason. This country has an abysmal track record when it comes to respecting its citizen's rights or privacy. Propaganda working as intended I suppose.
At least the model weights are open, I’m not American, so to me this is much more trustworthy in every possible way.
You’re talking as if US intelligence are the good guys, and to me at least, they are not to any extent.
What can you gain by looking at the weights, whether open source or not? Are they not what determines the model's output, but in an oblique way? We can't really fix the weights ourselves, weight by weight, or can we?
There's no way to safely use SOTA LLMs if privacy, and IP protection are your concern. Unless you want to spend 100k+ to host a 1T param model. Even if you use OpenCode you're sending all that information to random data centers you know nothing about.
But yes, US intelligence has killed and ruined the lives of far more people than China has. Not sure how so many people buy into the narrative that they're protecting freedom and democracy.. They're protecting their freedom to kill and crush all their enemies and control every "democracy" on earth.
That's why I like to use Reasonix with Deepseek. Hitting cache makes requests basically free and that's through unsubsidized American providers like Digital Ocean or cloudflare.
In a sense it's a clean reminder that all these, especially non-local, llm tools should NEVER run outside a container. I'm currently looking at z-jail specifically for these scenarios; VMs are too heavy & expose too many sec issues of their own for continual integrated use in my case.
Run it in a container under Opencode. It works great, and I even upgraded to their pro plan (~$60/month). If you want it in a container, there's info in my profile under my projects. That code is entirely open source, and it's there simply because I built what I needed for my own work. I'm sure there a zillion other ways to do it. However, I highly advise against running any agent on bare metal, regardless of the company's country of origin. My thesis addresses this directly and repeatedly.
By the way, some pedant recently asked why anyone would run software with only a few stars. My thoughts on that are minimal: people can practice whatever slop logic they want. I've architected and built systems that handled tens of thousands of users. I'm not fucking around. The way I build isn't typical, and I don't suggest anyone try to mimic my approach, but it works for me and the way my mind processes complex systems.
To the peanut gallery: use it or don't, but don't give me a hard time unless you're ready to get one back. I've made plenty of mistakes in my career, and accountability is a crucial part of growth. I'm more than willing to work with anyone using my code, provided they bring valid, substantial criticism to the table.
I think it’s a real concern. Chinese companies are much more closely tied to the state, as in if you decide to go to China one day they might already have all the data on how you have interacted with their models.
The US is certainly inching in that direction but it’s not like someone from the US government sits at Anthropic’s HQ reading chats from state people of interest.
As someone who loves using OpenCode w/ local Chinese open source models, this is basically my take on this as well. There's no way I would ever put a piece of proprietary Chinese software that gets full system control on anything important. This is definitely something I would only ever run sandboxed in a lab environment for toy projects, not for serious work. I feel only marginally better about Codex/Claude Code, hence my strong preference for local LLMs w/ OpenCode, but a proprietary approach to Chinese models is a hard no from me dawg.
For GLM Coding Plan subscribers, quota consumed via Coding Plan for GLM-5.2 in ZCode is discounted by the coefficients below — the same usage draws down less quota, roughly 1.5x the effective allowance.
Peak hours (14:00–18:00 daily) 3x -> 2x
Off-peak (remaining 20 hours) 1x -> 0.67x
I wonder whether that is referring to local time, or CST (UTC+8)?
Thanks. Those are some odd hours though, why would evening time be peak hours? Usually (in the western world anyway), 9AM - 12PM would be peak hours. Things normally slow down post-lunch, and be its slowest at close-of-business.
if you're going to try this one out, don't be surprised to get this message repeatedly, like 4 out of 5 prompts you're trying to send, 24/7, this is gonna be your new friend, then you'll learn to write the only prompt that matters: "retry", "retry", "retry"
Here's the message: "Cannot connect to API: write EPIPE"
When the harnesses commoditize, it will be the dynamic things like skills that will be the most valuable, useful thing you can bring to a harness. That seems like a long ways away though. There are still meaningful performance differences between agent harnesses.
Can anyone tell me if Z.AI's cheapest plan is more or less generous than Claude's cheapest plan? If it is more or less generous, could you describe the extent of the difference?
(If this comment is too formal, I'm sorry. I used Google Translate to it [this line was NOT translated])
The plans may have comparable prices, but the API rates are much cheaper. Especially because it is open weights, so there is competition on places like OpenRouter.
"Quality" of the harness matters a lot to the user experience, and the construction of the harness will depend on the behavior/quirks of the underlying model. So, if you're using Claude Code, you can expect it to work best with Anthropic models, and expect other model-makers to want you to use the harness they've developed.
There are different grades of vendor lock-in. There's mechanical lock-in (which is a thing, like .claude folders) and economic lock-in but then we don't pay enough attention to behavioral lock-in. Habit is powerful, and if you can habituate users into a certain flow, change feels bad and they are more likely to stay.
separation of model and tooling is as important as legislative and judicative, and just ignore any tooling or harness not true open source. they will all slowly creep into your life and choke you trying to lock you in.
It's sad to see that the teams that have the most resources that can contribute to development of next-gen harnesses are essentially copying the same exact thing from each other, with no meaningful changes.
And most of the advancement and experimentation happens in some random 0-star github repos.
I've been working on my own private harness for the past 8 months, and I've been collecting ideas from such repos I've stumbled upon.
pi-tmux is one such example (seems to be archived now) which inspired me to use tmux as communication layer and provide visibility of subagents of multiple models in their native harnesses [1].
There's also herdr, which is not 0-stars, but is super interesting but lesser known project [2]. This also has interesting substrates to allow agent coordination.
None of these are harnesses per se, but they're pointing towards clear gaps in existing harnesses. For example, we've known for a while now that compounding knowledge of different class of models achieves better performance. Why is there no harness where this is a native functionality? And there's no harness where subagents are first class citizens both in terms of capabilities and UX.
Because a harness can more easily stop backdoors of a model. A packaged app on the other hand ... let's say I'll skip this until I can compile and package it.
Very interesting work! I put some effort into getting it to work with models my hardware can actually run well and they just fall over immediately. gemma4 12b runs like molasses on my 2080 super but it was the only model able to, with your harness, actually do anything useful. It was the only useful thing I've gotten any model runnable with my hardware with any harness I've tried, very impressive!
I suspect smaller models need more work than is practical to fit harnesses around. The smaller the model, the more work, and it doesn't carry over to other small models.
Deepseek r1 7b could not emit tool calls to save its life, gemma4 e4b couldn't get the names of files right, qwen3.5 4b gets stuck in dumb rabbit holes, I pointed it at a ruby script and asked it to run it, it tried running it with bash then got caught in a loop investigating.
Noble effort though! I guess I'll keep working on my barebones ruby_llm harness, with very tempered expectations. Each of these failure modes can be worked around, but there's too many of them to work around in the general sense.
I could use them as a provider if they shown concrete price per token. Or concrete number of tokens in each plan. Now I don't know what I would rent from them. If I were to buy hell knows what, I would go to Anthropic.
Yea not touching this with an any-foot pole. They are just keeping up with the Joneses now. There is no reason for this to exist but there IS a reason it is not open source. ;)
There are now more and more Harness clients. I hope we can have the best open-source client and the best open-source models, as this would greatly facilitate our work and operations. However, this seems unlikely in the short term.
I'm somewhat surprised that this is not open source (from what I can tell). Compare to Mimo Code https://github.com/XiaomiMiMo/MiMo-Code (which is a CLI, while this is a desktop app).
I don't even know what I would do with a desktop app. I'm running these things in headless VMs, so I can run them with `--dangerously-skip-permissions` or whatever. I don't trust them, even without that flag, on my desktop/laptop.
Good desktop apps in this category can manage agents across any number of remote SSH hosts.
31 replies →
I've contributed to https://github.com/0xferrous/agent-box which allows you to bind-mount git repositories into containers that agents operate in, preventing the agents from accessing files that aren't bind-mounted. Your usual .gitignore can then be used to also ignore files within the repo to be bind-mounted, which prevents agents from accessing them at all, essentially working as a sandbox.
I also maintain https://github.com/nothingnesses/agent-images which allows you to use Nix to reproducibly spin up OCI container images containing agents and any other tools you need for development and use these with agent-box.
I use both at the moment to work on some personal projects with agents, where I set up multiple separate git worktrees for the agents to work in, preventing them from accessing anything outside of the worktrees and from trampling over each other's work.
1 reply →
a well-design IDE should abstract that away, i.e. run the agent in the headless VMs while give you an abstraction that you would feel like you are running the agent locally with all the benefits (editor, browser, diffs, debugger, etc)
I shared your fear some weeks/months ago so I was always using my harness in the cloud. However, latency started to become an issue when I traveled to other countries where I needed a VPN... so I ended up cooking skynot to be able to trust running my harness in my own computer: https://github.com/tarsgate/skynot (PRs welcome if you want to add support for another harness different than Pi)
> I'm running these things in headless VMs
What's your setup like and what do you use it for?
I have a M2 Max MBP with plenty of ram and I use VSCode + Zoo Code plugin with Qwen3-Coder-Next-GGUF:UD-Q4_K_XL to run local agentic coding sessions, but I'm intrigued by being able to run headless as I could probably run multiple instances in parallel to do stuff?
Like are you using UTM with some pre-built VM and a local LLM?
Curious.
Might wanna check out https://github.com/LuD1161/agentjail - policy guardrails for coding agents.
shameless self-plug. I've been dogfooding it for the last 3 weeks now.
1 reply →
Zcode allows you to connect to a Docker container, or to a VM using ssh.
I finally repurposed an old server just for that and for anyone reading who has not had a chance to use --dangerously-etc. it's awesome, do it :)
I just back up my entire home folder to another device, then let it rip
It's only a cli because they yanked out the opencode desktop code. (As well as the opencode go/zen model provider)
Edit: my theory is they wanted to mimic being the primary provider in a quick way with a lot of string replace. Though they could have added opencode back as a regular provider.
MiMo Code adds a lot of cool orchestration features to OpenCode! It definitely is NOT a quick find-replace job, it's genuinely someone's research project to create a better agent harness building on top of free software, and that's awesome. See https://mimo.xiaomi.com/blog/mimo-code-long-horizon
1 reply →
You're surprised? I think harnesses are almost as important as the underlying model. Folks have been able to improve benchmark results by nearly 2x based on harness alone.
Harnesses are quickly becoming critical components of the "model" itself imo. Not shocking to me at all that a company that spots a revenue opportunity is keeping its harness closed source.
I'm a neophyte. What makes a harness special or all that unique from another? I've had a reasonable experience with Zed and local models, but could be persuaded to put something else in the mix if there is a measurable benefit to be had.
1 reply →
Source? The most trusted benchmark right now (deepSWE) scores better or just as well on their minimal harness than when using CC or codex
They might be sending some user requests to Anthropic to gather trading data for their own models. If they do so, perhaps they need to add some tracer to request that they prefer to hide.
I wonder if you're as cynical and untrustworthy of American companies as well or is it more of a racism kinda thing
2 replies →
Wireshark would catch that easy-peasy.
1 reply →
Source? Or is it "trust me bro"?
4 replies →
or more likely, sending it to the CCP
2 replies →
Given that there's such severe concern being expressed by Anthropic about Claude being distilled, and the idea that the harness is part of the the moat, it doesn't seem super surprising that the other side of that would try to also make it harder for them to tell how well they're doing and what their approach is.
Unlikely considering they’re publishing the Crown Jewels (GLM 5.2) as open weights.
> and the idea that the harness is part of the the moat,
That idea is wrong, though. These same people thinking harnesses are part of a moat are also boasting that s/ware is easily writable now.
There's no secret sauce in a harness that you can't vibe-code into your own harness.
3 replies →
I don't find a closed-source Chinese agent system trustworthy.
It is essentially a black box with full user permissions, meaning you are just handing over your entire system to a Chinese-owned server. With OpenCode and its GLM provider, at least I can monitor which files were read, which were edited, and what commands were executed.
Not to mention that Chinese national security laws legally obligate companies to cooperate with state intelligence and counter-espionage efforts [0]. If you have this installed on a corporate workstation, and your company is large enough, the possibility of them spying on you is not just a risk—it's almost a certainty.
[0]: https://en.wikipedia.org/wiki/National_Intelligence_Law_of_t...
You shouldn’t find American ones trustworthy either.
I am not surprised it is not open source. These harnesses are hard to build - they are not just wrappers - and often they contain business logic that is not suitable for public distribution for all kinds of reasons.
hard? wut lol....
no. they. are. not.
Some people are just terrible at it.
1 reply →
I'd prefer a CLI over a desktop. But then why don't I just use OpenCode?
That looks to be a copy of OpenCode
A fork, yes.
Interesting to see how their harness will show up here. So far, https://cursor.com/evals this even shows still a big gap in performance, and almost no real win in terms of money vs gpt5.5 and sonnet 5.
Which make me raise a question. Why would I install a close source black box, that will send data to a country that you can't make legally liable for even most crazy miss doings.
The market of a hosted commercial version of glm is very weird. yeah you can deploy an open source version or run it locally, sure. This.... hm, i don't know why any company would take any risks to use GLM
How is this different from installing Claude Code or Codex? Maybe in the US someone has some hopes about having those makers "legally liable", but in Europe both US and China products seem equally far away and equally closed in all senses.
Yeah that’s why Google lost 4 billion to EU for some elusive non competitive practices. Every single big tech us company is never out of court in eu. Good luck brining any of Chinese companies for distillation . They steal your IP , your data and you have 0 protection .
2 replies →
Z.ai documents integrations with nearly all the popular CLI-based agents: https://docs.z.ai/devpack/tool/others
If you're already used to your TUI coding agent, you don't need the desktop agent. Although it is nice that it is there for folks who prefer the Codex App/Claude App UI approach.
Yeah, I use GLM 5.2 in OpenCode, running in a Docker container with CodeNomad as the web-based GUI. It works perfectly; I can access it from anywhere, and it runs all models (except for Anthropic's subscriptions).
From your experience, is it comparable to Claude Code with Opus 4.8? How does it feel? How do the two differ?
15 replies →
[dead]
Also, kudos to the Z.ai team for adding Linux support from day one.
I believe the incentive here is more tokens. I recall limits being more generous with their inhouse harness
[dead]
Looks quite pretty! Not sure if I want to try that instead of OpenCode, maybe. OpenCode also has a desktop app, I will admit that I like their TUI one better (and honestly more than Claude Code TUI) but whole the desktop version is kinda more basic, it's nice enough: https://opencode.ai/download
That said, it's interesting that they're releasing a bunch of stuff: ZCode, OCR.z.ai, Image.z.ai, Audio.z.ai, AutoClaw and some other stuff that https://chat.z.ai/ links to. That's a lot of stuff for one org to pull off.
Figured I'd try out their Pro coding plan, seems like it doesn't necessarily give me that much quota than Opus (at least given how many tokens are needed for accomplishing a certain task), but GLM 5.2 in of itself seems like a beefier Sonnet model, pretty good.
Their tui is quite heavy and crashing quite often as compared to claude code.
Which are you talking about? OpenCode or ZCode?
3 replies →
Have not used claude code but have used opencode tui a LOT and it does seem to crash quite a bit. Not like it breaks every session but enough that I have come to expect it but still not bothered enough to change. I don't like switching setups mostly
[dead]
It's impressive all these companies are getting away with "base usage allowance included" [1] or "standard limits" [2], layering the higher plans as a multiplier of that "base" but never disclosing what it is.
I guess the base is whatever the profit margin needs to be this month.
[1]: https://zcode.z.ai/en#:~:text=Base%20usage%20allowance%20inc...
[2]: https://support.google.com/gemini/answer/16275805?hl=en#:~:t...
When running the app, it actually tells you what the base usages are, but the name of the plans are different from the page. It reads:
Start plan: 5 Million tokens a day (GLM-5.2 3M, GLM-5 Turbo 2M)
For individuals: (+150% quota) $18.00USD+ For individual developers with a dedicated Coding Plan quota.
Heh, so it sounds like you have to pay them first to find out what you're going to get for the money? That's an interesting business model..
Now, if only we can figure out what all the others are providing as part of their subscriptions we can compare. (Though 3 million tokens of the top model per day seems kinda low. But, I guess that's what the 5x plan is for. I'd still like to be able to compare against all the big providers.)
3 replies →
Agreed this sucks. We publish ours here and try to be as transparent as possible: https://synthetic.new/rate-limits
Love both the approach and the transparency. Kudos.
Yeah, this is why I like the ACCC in Australia. They wouldn't allow this sort of thing to fly if this was an Australian company.
A strategy that can backfire. An unpredictable tool is worse than a bad tool.
For anyone who uses GPT-5.5/Codex as their daily driver, how does GLM-5.2/ZCode compare, esp in a codebase already set up for agentic coding?
GLM 5.2 is in an uncanny valley where it's too big to run at home, too expensive and slow in comparison to similarly capable model (a good chart here - https://deepswe.datacurve.ai/), and that's just comparing API prices.
When looking at subscription offering by Anthropic and OpenAI, it's not even comparable, as a Codex $200 subscription can easily use a billion tokens per week on GPT 5.5 high/xhigh.
It's an interesting model from the perspective of being the most capable open weight model. But it doesn't have a solid place in this marketplace right now.
Thanks, that DeepSWE comparison is really useful. Yeah the Codex $100 plan with xHigh reasoning is very practical and cost efficient.
I tried it for a couple of hours this morning and yeah, it's a bit slow, and I needed it in peak hours so it also often can't reach the server so that makes it even slower. And I'm not even sure it's just the model, it could very much be the harness. Stalled for 40 minutes on trivial tool calls like `find`, two times...
It shows potential, answer/code quality was solid, but I would need more time with it.
TLDR GLM will take a lot longer to do a task, and maybe spend more tokens depending how complex it is
Its a hell of a lot cheaper though, so for me its worth it. I have more claude experience though, and I would say its almost en par with Opus 4.1
It's cheaper if you pay API prices. If you pay a gpt sub then codex is much much cheaper.
no ACP support it seems :( Of all the AI buzzwords I love ACP because of the separation of concern. Let the editor be an editor, the harness be the ai code agent, and the llm be the llm
UI-wise this looks a lot closer to Codex than Claude Code. It's basically an exact copy of Codex.
I would very much agree. Even the hand icon, the usage in the text field, and the sidebar style are 1:1 identical to Codex. It's a misleading title - it's not close the Claude Code.
Which makes keeping Codex closed source look even sillier. Software is no longer anyone's moat. Just let it go.
I thought codex was open source https://github.com/openai/codex
2 replies →
Does anyone use an agnostic TUI or harness for development tasks that can fairly seamlessly switch between providers?
I'm wanting local context in the spirit of "here are 3 AI providers available, for coding tasks use this one... and for writing prose use this one... and for generating images use this one..." etc.
https://opencode.ai/
OpenCode was the first agent harness I used, and I have always like it. You can configure a wide variety of providers, but it's open source and has a number of core contributors.
The other opinionated option is Pi (the Pi agent harness). This is a great lightweight option and also supports a number of providers. You can also use local model servers.
have used both pi and opencode for the last 6 months, haven't opened a proprietary harness (cc, codex, cursor) in that same amount of time. right now i'm on pi and i can switch seamlessly between any model across any provider i want, even mid session. can even point them at locally running models.
i think people don't realize how much better life is over on this side, cc and codex rely entirely on vendor lock in imo.
Try the role-model Pi extension I built, to let Pi determine when to switch to a different model in your pool.
https://github.com/try-works/role-model
Does a mid-session provider switch result in loading the entire context into the new model, inflating session cost?
I don't think I understand the token/cost implications of this feature
2 replies →
Haha I pretty much commented the same thing one minute apart.
You can use Claude Code with a self hosted model no problem. I don't believe you can switch during a session though.
Are you using openrouter or something else?
codex is open source https://github.com/openai/codex/ it's definitely geared towards openai but it is completely open source
why did you switch from oc to pi?
2 replies →
[dead]
You can do this with role-model, the model router I've built. It routes based on roles and tasks among other things. It has an extension for Pi that lets your coding agent specify request metadata for roles and capabilities etc.
https://github.com/try-works/role-model
If you haven't yet you should give a chance to https://pi.dev
I've been using it exclusively (and extending it, see https://a.l3x.in/ai) for months with mainly GLM-4.7 then 5.1 and now 5.2 and I could hardly be any happier.
I'm still working on a "Github/Forgejo first" based workflow but also quite happy with it already, basically most of my sessions run as a ci/cd job (triggered by "/pi" comments) and generate PRs or push commits to PRs, see https://github.com/shaftoe/pi-coding-agent-action
I’ve written a skill for codex and Claude code that designates an orchestrator on the primary worktree and is agnostic about what type of AI workers are on the N supporting worktrees.
The orchestrator knows which AI client is running in any given worktree, so it would be fairly easy to designate which AI should receive what kind of tasks.
You run either Claude or Codex in tabs for each work tree. I do have some AI TUI specific instructions, for instance codex is primitive at monitoring compared to CC. So, there are additional notes for Codex workers on how to properly monitor for new "mail."
You work with the orchestrator on the primary worktree and allow it to delegates tasks to the workers and answer their smaller questions.
It surfaces results and assisting them with context clearing when needed.
The orchestrator and workers communicate using a simple shared file system under tmp/* and together they can handle a big and varied workload.
I use iterm2, so I’ve also added iterm2 specific python that allows the orchestrator to “kick” a worker or perform tasks otherwise veto'd by the TUIs (ie /clear) by modifying the input and submitting it.
Is this open source?
Circus Chief allows you to do this: https://github.com/ferrislucas/Circus-Chief
(Full disclosure: it’s my project)
I’ve been using Crush with Openrouter and have good success lately
https://github.com/charmbracelet/crush
I stumbled upon https://omp.sh and haven't really felt the need to ever use anything different.
"omp is a fork of Pi by Mario Zechner, rewritten as a coding-first surface: sessions, subagents, slash commands, extensions — all TypeScript..."
I use Kilo Code for that it's based in OpenCode and it's OpenSource.
I prefer having a GUI for diffs and session history,but if you prefer TUI you can just use OoenCode
I use the one that I've been developing since 2023. It's intended to be used in exactly this spirit! Written in Go, has image support (which has yet to be fleshed out).
It supports MCP (unlike Pi), sandboxing (with user-mode networking), and runs efficiently at huge contexts.
https://codeberg.org/mlow/lmcli
(The screenshot in the folder is a little bit out of date, but is still representative of the overall look)
Also Goose from the Agentic AI Foundation (AAIF) (subsidy of the Linux Foundation).
https://goose-docs.ai/
i like Chinese open weight model that offer cheap token but i only use it for my personal project.
China have a history of stealing IPs/trade secrets and Chinese court favored its own local companies. while US have a robust court that can enforce IPs. if you want to risk your company's IPs/trade secrets/data for some cheap token. Go ahead and use Z.ai's services.
FYI you can use Z.AI models on infra not in China...
But this harness app is chinese?
The US of A ditto
What's your top secret project?
I don't find a closed-source Chinese agent system trustworthy.
It is essentially a black box with full user permissions, meaning you are just handing over your entire system to a Chinese-owned server. With OpenCode and its GLM provider, at least I can monitor which files were read, which were edited, and what commands were executed.
Not to mention that Chinese national security laws legally obligate companies to cooperate with state intelligence and counter-espionage efforts [0]. If you have this installed on a corporate workstation, and your company is large enough, the possibility of them spying on you is not just a risk—it's almost a certainty.
[0]: https://en.wikipedia.org/wiki/National_Intelligence_Law_of_t...
I agree. I don't find the US competitors trustworthy either. I think open source is the way here.
Thank you. It doesn't make sense to me how much people trust our companies so much more than Chinese ones for no reason. This country has an abysmal track record when it comes to respecting its citizen's rights or privacy. Propaganda working as intended I suppose.
25 replies →
At least the model weights are open, I’m not American, so to me this is much more trustworthy in every possible way. You’re talking as if US intelligence are the good guys, and to me at least, they are not to any extent.
We are talking about an agent harness here, not a model.
Nevertheless, Americans thinking they are morally superior to China is always quite funny.
This administration is corrupt, cruel and doesn’t care about human rights.
And the worst is… Americans have voted for that administration…. twice!
I digress…
5 replies →
What can you gain by looking at the weights, whether open source or not? Are they not what determines the model's output, but in an oblique way? We can't really fix the weights ourselves, weight by weight, or can we?
There's no way to safely use SOTA LLMs if privacy, and IP protection are your concern. Unless you want to spend 100k+ to host a 1T param model. Even if you use OpenCode you're sending all that information to random data centers you know nothing about.
But yes, US intelligence has killed and ruined the lives of far more people than China has. Not sure how so many people buy into the narrative that they're protecting freedom and democracy.. They're protecting their freedom to kill and crush all their enemies and control every "democracy" on earth.
7 replies →
This is exactly the same with providers from the USA.
That's why I like to use Reasonix with Deepseek. Hitting cache makes requests basically free and that's through unsubsidized American providers like Digital Ocean or cloudflare.
You can always run it in bwrap or rootless podman.
nono, the sandboxing tool, has been working great for me
In a sense it's a clean reminder that all these, especially non-local, llm tools should NEVER run outside a container. I'm currently looking at z-jail specifically for these scenarios; VMs are too heavy & expose too many sec issues of their own for continual integrated use in my case.
Run it in a container under Opencode. It works great, and I even upgraded to their pro plan (~$60/month). If you want it in a container, there's info in my profile under my projects. That code is entirely open source, and it's there simply because I built what I needed for my own work. I'm sure there a zillion other ways to do it. However, I highly advise against running any agent on bare metal, regardless of the company's country of origin. My thesis addresses this directly and repeatedly.
By the way, some pedant recently asked why anyone would run software with only a few stars. My thoughts on that are minimal: people can practice whatever slop logic they want. I've architected and built systems that handled tens of thousands of users. I'm not fucking around. The way I build isn't typical, and I don't suggest anyone try to mimic my approach, but it works for me and the way my mind processes complex systems.
To the peanut gallery: use it or don't, but don't give me a hard time unless you're ready to get one back. I've made plenty of mistakes in my career, and accountability is a crucial part of growth. I'm more than willing to work with anyone using my code, provided they bring valid, substantial criticism to the table.
How's that different from Codex (gui app) or Claude?
Codex is open source: https://github.com/openai/codex
Well, it's different from OpenCode
The codex cli too is open source, afaik.
If you are not US based that’s not really a big concern.
I think it’s a real concern. Chinese companies are much more closely tied to the state, as in if you decide to go to China one day they might already have all the data on how you have interacted with their models.
The US is certainly inching in that direction but it’s not like someone from the US government sits at Anthropic’s HQ reading chats from state people of interest.
9 replies →
yes but the americans are also doing it, and i don’t really work on anything worth spying on
NSA can also legally force companies to spy. Secret spy courts and gag orders are a thing.
Actually there are more such cases against the USA than China in public.
I'm in the US. The benefit of the Chinese spying on me vs a US company is the Chinese can't come to my door and take me to jail.
As someone who loves using OpenCode w/ local Chinese open source models, this is basically my take on this as well. There's no way I would ever put a piece of proprietary Chinese software that gets full system control on anything important. This is definitely something I would only ever run sandboxed in a lab environment for toy projects, not for serious work. I feel only marginally better about Codex/Claude Code, hence my strong preference for local LLMs w/ OpenCode, but a proprietary approach to Chinese models is a hard no from me dawg.
so basically no worse than europe or usa, but they are just more open about it
> It is essentially a black box with full user permissions,
You mean, like Windows and Android?
I wonder whether that is referring to local time, or CST (UTC+8)?
From https://z.ai/subscribe#code-plans-container:
> Explanation and Recommendations Regarding Usage for Plan-Supported Models
> Note: Peak hours are from 14:00 to 18:00 daily (UTC+8).
Peak hours are 14:00–18:00 (UTC+8)
https://docs.z.ai/devpack/overview
Thanks. Those are some odd hours though, why would evening time be peak hours? Usually (in the western world anyway), 9AM - 12PM would be peak hours. Things normally slow down post-lunch, and be its slowest at close-of-business.
3 replies →
if you're going to try this one out, don't be surprised to get this message repeatedly, like 4 out of 5 prompts you're trying to send, 24/7, this is gonna be your new friend, then you'll learn to write the only prompt that matters: "retry", "retry", "retry"
Here's the message: "Cannot connect to API: write EPIPE"
When the harnesses commoditize, it will be the dynamic things like skills that will be the most valuable, useful thing you can bring to a harness. That seems like a long ways away though. There are still meaningful performance differences between agent harnesses.
Can anyone tell me if Z.AI's cheapest plan is more or less generous than Claude's cheapest plan? If it is more or less generous, could you describe the extent of the difference?
(If this comment is too formal, I'm sorry. I used Google Translate to it [this line was NOT translated])
I got around 17m tokens on glm 5.2 then blocked for 4 days on the weekly limit on that plan.
17M tokens... I think it is a lot. What were you working on?
4 replies →
Closed source? No Thanks
This isn't a CLI, so not really like Claude Code. Looks more like Cursor or Conductor.
The plans on first glance is the same as Anthropic’s. I thought GLM was supposed to be cheaper. Am I missing something?
I haven't tried Z.ai, but both Ollama ($20) and OpencodeGo ($10) seem to give me more generous limits than the Claude $20
They give you much better quotas, on the 20USD plan using opus you will quickly run into limits.
The plans may have comparable prices, but the API rates are much cheaper. Especially because it is open weights, so there is competition on places like OpenRouter.
Is this GUI only?
Yes.
is there cli version available for this harness?
As someone who doesnt use these tools, why does every AI company need their own version of Claude Code? Is there more to it than vendor lock-in?
"Quality" of the harness matters a lot to the user experience, and the construction of the harness will depend on the behavior/quirks of the underlying model. So, if you're using Claude Code, you can expect it to work best with Anthropic models, and expect other model-makers to want you to use the harness they've developed.
But mostly vendor lock-in, I imagine.
There are different grades of vendor lock-in. There's mechanical lock-in (which is a thing, like .claude folders) and economic lock-in but then we don't pay enough attention to behavioral lock-in. Habit is powerful, and if you can habituate users into a certain flow, change feels bad and they are more likely to stay.
Why not? They are relatively easy to make so why not. Even I made one: https://github.com/computerex/z
implementing their own version of steganographic monitoring lol
A joke but also not a joke.
sweet! i'm heaviliy using glm 5.2 in mouse.dev which is great for mobile. the ui looks really good, similar to cursor agents window ect.
separation of model and tooling is as important as legislative and judicative, and just ignore any tooling or harness not true open source. they will all slowly creep into your life and choke you trying to lock you in.
It's sad to see that the teams that have the most resources that can contribute to development of next-gen harnesses are essentially copying the same exact thing from each other, with no meaningful changes.
And most of the advancement and experimentation happens in some random 0-star github repos.
Could you share some of these 0-star github repos?
I've been working on my own private harness for the past 8 months, and I've been collecting ideas from such repos I've stumbled upon.
pi-tmux is one such example (seems to be archived now) which inspired me to use tmux as communication layer and provide visibility of subagents of multiple models in their native harnesses [1].
There's also herdr, which is not 0-stars, but is super interesting but lesser known project [2]. This also has interesting substrates to allow agent coordination.
None of these are harnesses per se, but they're pointing towards clear gaps in existing harnesses. For example, we've known for a while now that compounding knowledge of different class of models achieves better performance. Why is there no harness where this is a native functionality? And there's no harness where subagents are first class citizens both in terms of capabilities and UX.
[1] https://github.com/offline-ant/pi-tmux
[2] https://github.com/ogulcancelik/herdr
There the ones with most to prove
What’s with the 3 subscription plans that are suggestive of being mapped to plans from Anthropic and Open AI?
Do they really correspond roughly? Seems like they’re trying to suggest a discount while still being worth a significant amount of monthly spend.
I don't get why not open source it? You are already open-sourcing your weights!
Because a harness can more easily stop backdoors of a model. A packaged app on the other hand ... let's say I'll skip this until I can compile and package it.
One of these is not like the other.
I've been using this for a few weeks and it's a real workhorse.
First-party harnesses are great, but i'd really wish this was a CLI and not a GUI
Has anyone come up with a decent harness for small local models, say, gemma4 e4b? I'm trying to roll my own but man, the capability gap is real.
This is precisely what I've been working on targeting with https://dirge-code.github.io/
I've written up an explanation of what trips small models ups and how the harness can address that here https://yogthos.net/posts/2026-06-08-dirge-code.html
Very interesting work! I put some effort into getting it to work with models my hardware can actually run well and they just fall over immediately. gemma4 12b runs like molasses on my 2080 super but it was the only model able to, with your harness, actually do anything useful. It was the only useful thing I've gotten any model runnable with my hardware with any harness I've tried, very impressive!
I suspect smaller models need more work than is practical to fit harnesses around. The smaller the model, the more work, and it doesn't carry over to other small models.
Deepseek r1 7b could not emit tool calls to save its life, gemma4 e4b couldn't get the names of files right, qwen3.5 4b gets stuck in dumb rabbit holes, I pointed it at a ruby script and asked it to run it, it tried running it with bash then got caught in a loop investigating.
Noble effort though! I guess I'll keep working on my barebones ruby_llm harness, with very tempered expectations. Each of these failure modes can be worked around, but there's too many of them to work around in the general sense.
1 reply →
Do you have benchmarks comparing against Pi? The blog post doesn't include any hard numbers.
For example, so far I haven't seen any evidence that LSP integration improves performance for small models vs using grep via a bash tool.
2 replies →
This really resonates. Thanks for mentioning.
This is very impressive!
1 reply →
I could use them as a provider if they shown concrete price per token. Or concrete number of tokens in each plan. Now I don't know what I would rent from them. If I were to buy hell knows what, I would go to Anthropic.
literally I paid in the morning for the pro plan and then they launched this. currently are my fav lab after Anthropic.
Lucky you! I'm considering switching from Kimi 2.7. What's your experience so far?
Try to understand the token usage/cost with subscription plan comparing with Claude Pro. Is there benchmark somewhere for such info?
I think they market is as 3x the usage for the same price. Although, the prices are not the same, and Anthropic's usage constantly changes, so...
it's an electron app, it highlights wrong spelling but doesn't suggest corrections. how does someone exhibit so much incompetence?
Welcome to using v1.0.0 of any product
v3.2.2 as of today
Does it support Azure openai and aws bedrock models as well?
how is this cheaper?
Coding plans are often out of stock, it's miraculous
Is there any desktop coding app that can be used with local LLM?
OpenCode (TUI and desktop app) can use Qwen local
I built vibn.dev for this purpose, it’s very rough around the edges tho
Yea not touching this with an any-foot pole. They are just keeping up with the Joneses now. There is no reason for this to exist but there IS a reason it is not open source. ;)
Isn't competition and open markets a reason for this to exist?
Funny, I think the same about Claude.
Didn't Claude Code pioneer this style of agent?
1 reply →
Is it possible to use their subscription pricing with Opencode?
I use the coding subscription in both Pi and OpenCode without issue.
This comes with a little bit of free credits. (after login)
It did last week. Wow. That didn't last long.
I couldn’t find if it is soc 2 etc
Those are some odd hours though, why would evening time be peak hours? Usually (in the western world anyway), 9AM - 12PM would be peak hours.
Z.ai is based in China and serves out of Singapore, that's surely why.
eager for zcode-cli. and their coding plan is always selled out.
cool to see how fast they are catching up
Is there a CLI version of it?
I tried it but went back to OC, which feels smarter.
It does have a 1.5x usage promotion for GLM 5.2 on the coding plan so now is a good time to test it...
GLM-5.2 seems capable. It’s just much slower than Opus.
OpenRouter + Current IDE for me. Cant be buying a new plan and change IDE every time a new model drops beyond testing for curiosity.
There are now more and more Harness clients. I hope we can have the best open-source client and the best open-source models, as this would greatly facilitate our work and operations. However, this seems unlikely in the short term.
what is then VS code with GitHub Copilot ? It primarily does the similar things.
With Musk buying Cursor, it is good to have more alternatives on the market.
For those that want something based on Pi Mono:
- https://igorwarzocha.github.io/howcode/
- https://github.com/ruuxi/stella
- https://www.pi-gui.com/
Not using Pi, but based on PI (no extensions possible)
- https://twotimespi.dev/
GLM-5.2 is a great model!
But it already works really well with existing harnesses, I'm not sure why a dedicated one is needed?
I use it with https://swival.dev and everything works perfectly, no tool calling issues and it works fine with long sessions.
Telemetry enabled?
How about no? I'd rather use something open source and local. We have enough of 3rd party controlled AI tools.
[flagged]
[flagged]
[flagged]
[dead]
[flagged]
[dead]
The original submission was to [0] which I feel must be mentioned.
[0]: https://zcode.z.ai/cn
You're referring to https://zcode.z.ai/en, so I took that one and re-upped it in order to have a place to merge the thread. Seemed fairest!