Comment by joshribakoff
16 hours ago
This is just sub agents, built into Claude. You don’t need 300,000 line tmux abstractions written in go. You just tell Claude to do work in parallel with background sub agents. It helps to have a file for handing off the prompt, tracking progress, and reporting back. I also recommend constraining agents to their own worktrees. I am writing down the pattern here https://workforest.space while nearly everyone is building orchestrators i also noticed claude is already the best orchestrator for claude.
It isn't sub agents. The gap with existing tooling is that the abstraction is over a task rather than a conversation (due to the issue with third-party apps, Claude Code has been inherently limited to conversations which is why they have been lacking in this area, Claude Code Web was the first move in this direction), and the AI is actually coordinating the work (as opposed to being constantly prompted by the user).
One of the issues that people had which necessitated this feature is that you have a task, you tell Claude to work on it, and Claude has to keep checking back in for various (usually trivial) things. This workflow allows for more effective independent work without context management issues (if you have subagents, there is also an issue with how the progress of the task is communicated by introducing things like task board, it is possible to manage this state outside of context). The flow is quite complex and requires a lot of additional context that isn't required with chat-based flow, but is a much better way to do things.
The way to think about this pattern - one which many people began concurrently building in the past few months - is an AI which manages other AIs.
It isn't "just" sub agents, but you can achieve most of this just with a few agents that take on generic roles, and a skill or command that just tells claude to orchestrate those agents, and a CLAUDE.md that tells it how to maintain plans and task lists, and how to allow the agents to communicate their progress.
It isn't all that hard to bootstrap. It is, however, something most people don't think about and shouldn't need to have to learn how to cobble together themselves, and I'm sure there will be advantages to getting more sophisticated implementations.
Right, but the model is still: you tell the AI what to do, this is the AI tells other AIs what to do. The context makes a huge difference because it has to be able to run autonomously. It is possible to do this with SDK and the workflow is completely different.
It is very difficult to manage task lists in context. Have you actually tried to do this? i.e. not within a Claude Code chat instance but by one-shot prompting. It is possible that they have worked out some way to do this, but when you have tens of tasks, merge conflicts, you are running that prompt over months, etc. At best, it doesn't work. At worst, you are burning a lot of tokens for nothing.
It is hard to bootstrap because this isn't how Claude Code works. If you are just using OpenRouter, it is also not easy because, after setting up tools/rebuilding Claude Code, it is very challenging to setup an environment so the AI can work effectively, errors can be returned, questions returned, etc. Afaik, this is basically what Aider does...it is not easy, it is especially not easy in Claude Code which has a lot of binding choices from the business strategy that Anthropic picked.
2 replies →
> Claude Code has been inherently limited to conversations
How so? I’ve been using “claude -p” for a while now.
But even within an interactive session, an agent call out is non-interactive. It operates entirely autonomously, and then reports back the end result to the top level agent.
Because of OAuth. If they gave people API keys then no-one buys their ludicrously priced API product (I assume their strategy is to subsidise their consumer product with the business product).
You can use Claude Code SDK but it requires a token from Claude Code. If you use this token anywhere else, your account gets shut down.
Claude -p still hits Claude Code with all the tools, all the Claude Code wrapping.
6 replies →
It’s even less of a feature, Claude Code already has subagents; this new feature just ensures Claude Code actually uses this for implementation.
imho the plans of Claude Code are not detailed enough to pull this off; they’re trying to do it to preserve context, but the level of detail in the plans is not nearly enough for it to be reliable.
I agree with this. Any time I make a plan I have to go back and fill it in, fill it in, what did we miss, tdd, yada yada. And yes, I have all this stuff in CLAUDE.md.
You start to get a sense for what size plan (in kilobytes) corresponds to what level of effort. Verification adds effort, and there's a sort of ... Rocket equation? in that the more infrastructure you put in to handle like ... the logistics of the plan, the less you have for actual plan content, which puts a cap on the size of an actionable plan. If you can hit the sweet spot though... GTFO.
I also like to iterate several times in plan mode with Claude before just handing the whole plan to Codex to melt with a superlaser. Claude is a lot more ... fun/personable to work with? Codex is a force of nature.
Another important thing I will do is now that launching plans clear context, it's good to get out of planning mode early, hit an underspecified bit, go back into planning mode and say something like "As you can see the plan was underspecified, what will the next agent actually need to succeed?" and iterate that way before we actually start making moves. This is made possible by lots of explicit instructions in CLAUDE.md for Claude to tell me what it's planning/thinking before it acts. Suppressing the toolcall reflex and getting actual thought out helps so much.
It’s moving fast. Just today I noticed Claude Code now ends plans with a reference to the entire prior conversation (as a .jsonl file on disk) with instructions to check that for more details.
Not sure how well it’s working though (my agents haven’t used it yet)
Interesting about the level of detail. I’ve noticed that myself but I haven’t done much to address it yet.
I can imagine some ideas (ask it for more detail, ask it to make a smaller plan and add detail to that) but I’m curious if you have any experience improving those plans.
I’m trying to solve this myself by implementing a whole planner workflow at https://github.com/solatis/claude-config
Effectively it tries to resolve all ambiguities by making all decisions explicit — if the source cannot be resolved to documentation or anything, it’s asked to the user.
It also tries to capture all “invisible knowledge” by documenting everything, so that all these decisions and business context are captured in the codebase again.
Which - in theory - should make long term coding using LLMs more sane.
The downside is that it takes 30min - 60min to write a plan, but it’s much less likely to make silly choices.
I iterate around issues. I have a skill to launch a new tmux window for worktree with Claude in one pane and Codex in another pane with instructions on which issue to work on, Claude has instructions to create a plan, while Codex has instructions to understand the background information necessary for this issue to be worked on. By the time they're both done, then I can feed Claude's plan into Codex, and Codex is ready to analyze it. And then Codex feeds the plan back to Claude, and they kind of ping pong like that a couple times. And after a certain or several iterations, there's enough refinement that things usually work. Then Claude clears context and executes the plan. Then Codex reviews the commit and it still has all the original context so it knows what we have been planning and what the research was about the infrastructure. And it does a really good job reviewing. And again, then they ping pong back and forth a couple times, and the end product is pretty decent. Codex's strength is that it really goes in-depth. I usually do this at a high reasoning effort. But Codex has zero EQ or communication skills, so it works really well as a pedantic reviewer. Claude is much more pleasant to interact with. There's just no comparison. That's why I like planning with Claude much more because we can iterate.. I am just a hobbyist though. I do this to run my Ansible/Terraform infrastructure for a good size homelab with 10 hosts. So we actually touch real hardware a lot and there's always some gotchas to deal with. But together, this is a pretty fun way to work. I like automating stuff, so it really scratches that itch.
I have had good success with the plans generated by https://github.com/obra/superpowers I also really like the Socratic method it uses to create the plans.
Claude already had subagents. This is a new mode for the main agent to be in (bespoke context oriented to delegation), combined with a team-oriented task system and a mailbox system for subagents to communicate with each other. All integrated into the harness in a way that plugins can't achieve.
Wow there goes a lot of harnesses out the window. The main limitation of subagents was they couldn’t communicate back and forth with the main agent. How do we invoke swarm mode in Claude Code?
OT: Your visual on "stacked PRs" instantly made me understand what a stacked PR is. Thank you!
I had read about them before but for whatever reason it never clicked.
Turns out I already work like this, but I use commits as "PRs in the stack" and I constantly try to keep them up to date and ordered by rebasing, which is a pain.
Given my new insight with the way you displayed it, I had a chat with chatGPT and feel good about giving it a try:
If you’re interested in exploring tooling around stacked PRs, I wrote git-spice (https://abhinav.github.io/git-spice/) a while ago. It’s free and open-source, no strings attached.
If you're rebasing a lot, definitely set up rerere (reuse recorded solution) - it improves things enormously.
Do make sure you know how to reset the cache, in case you did a bad conflict resolution because it will keep biting you. Besides that caveat it's a must.
Isn’t this just “Gitflow”?
https://www.atlassian.com/git/tutorials/comparing-workflows/...
After a quick read it seems like gitflow is intended to model a release cycle. It uses branches to coordinate and log releases.
Stacking is meant to make development of non-trivial features more manageable and more likely to enter main safer and faster.
it's specific to each developer's workflow and wouldn't necessarily produce artifacts once merged into main (as gitflow seems to intentionally have a stance on)
Please don’t use git-flow. Every time I see it, it looks like an over-engineer’s wet dream.
3 replies →
Yeah, since they introduced (possibly async) subagents, I've had my main claude instance act as a manager overseeing implementation agents, keeping it's context clean, and ensuring everything goes to plan in the highest quality way.
yep this is exactly how I use the main agent too, I explicitly instruct to only ever use background async subagents. Not enough people understand that the claude code harness is event driven now and will wake up whenever these subagent completion events happen.
Any recommendations on sandboxing agents? Last time I asked folks recommended docker.
https://github.com/finbarr/yolobox/
I like running remotely using exe.dev with SyncThing to sync files to my laptop.
I use Shelley (their web-based agent) but they have Claude Code installed too.