← Back to context

Comment by ggoo

18 hours ago

Is this satire?

Nope it isn’t. I did it as a joke initially (I also had a version where every 2 stories there was a meeting and if a someone underperformed it would get fired). I think there are multiple reasons why it actually works so well:

- I built a system where context (+ the current state + goal) is properly structured and coding agents only get the information they actually need and nothing more. You wouldn’t let your product manager develop your backend and I gave the backend dev only do the things it is supposed to and nothing more. If an agent crashes (or quota limits are reached), the agents can continue exactly where the other agents left off.

- Agents are ”fighting against” each other to some extend? The Architect tries to design while the CAB tries to reject.

- Granular control. I wouldn’t call “the manager” _a deterministic state machine that is calling probabilistic functions_ but that’s to some extent what it is? The manager has clearly defined tasks (like “if file is in 01_design —> Call Architect)

Here’s one example of an agent log after a feature has been implemented from one of the older codebases: https://pastebin.com/7ySJL5Rg

  • Thanks for clarifying - I think some of the wording was throwing me off. What a wild time we are in!

  • What OpenCode primitive did you use to implement this? I'd quite like a "senior" Opus agent that lays out a plan, a "junior" Sonnet that does the work, and a senior Opus reviewer to check that it agrees with the plan.

  • > [...]coding agents only get the information they actually need and nothing more

    Extrapolating from this concept led me to a hot-take I haven't had time to blog about: Agentic AI will revive the popularity of microservices. Mostly due to the deleterious effect of context size on agent performance.

    • In a fresh project that is well documented and set up it might work better. Many issues that Agents have in my work is that the endpoints are not always documented correctly.

      Real example that happened to me, Agent forgets to rename an expected parameter in API spec for service 1. Now when working on service 2, there is no other way of finding this mistake for the Agent than to give it access to service 1. And now you are back to "... effect of context size on agent performance ...". For context, we might have ~100 services.

      One could argue these issues reduce over time as instruction files are updated etc but that also assumes the models follow instructions and don't hallucinate.

      That being said, I do use Agents quite successfully now - but I have to guide them a bit more than some care to admit.

    • Why would they revive the popularity of microservices? They can just as well be used to enforce strict module boundaries within a modular monolith keeping the codebase coherent without splitting off microservices.

      1 reply →

  • Isn't all this a manual implementation of prompt routing, and, to a lesser extent, Mixture of Experts?

    These tools and services are already expected to do the best job for specific prompts. The work you're doing pretty much proves that they don't, while also throwing much more money at them.

    How much longer are users going to have to manually manage LLM context to get the most out of these tools? Why is this still a problem ~5 years into this tech?

  • I'm confused when you say you have a manager, scrum master, archetech, all supposdely sharing the same memory, do each of those "employees" "know" what they are? And if so, based on what are their identities defined? Prompts? Or something more. Or am I just too dumb to understand / swimming against the current here. Either way, it sounds amazing!

    • Their roles are defined by prompts. Only memory are shared files and the conversation history that’s looped back to stateless API calls to an LLM.

It's not satire but I see where you're coming from.

Applying distributed human team concepts to a porting task squeezes extra performance from LLMs much further up the diminishing returns curve. That matters because porting projects are actually well-suited for autonomous agents: existing code provides context, objective criteria catch more LLM-grade bugs than greenfield work, and established unit tests offer clear targets.

I guess what I'm trying to say is that the setup seems absurd because it is. Though it also carries real utility for this specific use case. Apply the same approach to running a startup or writing a paid service from scratch and you'd get very different results.

  • I don't know about something this complex, but right this moment I have something similar running in Claude Code in another window, and it is very helpful even with a much simpler setup:

    If you have these agents do everything at the "top level" they lose track. The moment you introduce sub-agents, you can have the top level run in a tight loop of "tell agent X to do the next task; tell agent Y to review the work; repeat" or similar (add as many agents as makes sense), and it will take a long time to fill up the context. The agents get fresh context, and you get to manage explicitly what information is allowed to flow between them. It also tends to mean it is a lot easier to introduce quality gates - eg. your testing agent and your code review agent etc. will not decide they can skip testing because they "know" they implemented things correctly, because there is no memory of that in their context.

    Sometimes too much knowledge is a bad thing.

    • Humans seem to be similar. If a real product designer would dive into all the technical details and code of a product, he would likely forget at least some of the vision behind what the product is actually supposed to be.

Doubt it. I use a similar setup from time to time.

You need to have different skills at different times. This type of setup helps break those skills out.

why would it be? It's a creative setup.

  • I just actually can't tell, it reads like satire to me.

    • Why would it be satire? I thought that's a pretty stranded Agentic workflows.

      My current workplace follows a similar workflow. We have a repository full of agent.md files for different roles and associated personas.

      E.g. For project managers, you might have a feature focused one, a delivery driven one, and one that aims to minimise scope/technology creep.

      10 replies →

I think many people really like the gamification and complex role playing. That is how GitHub got popular, that is how Rube Goldberg agent/swarm/cult setups get popular.

It attracts the gamers and LARPers. Unfortunately, management is on their side until they find out after four years or so that it is all a scam.

  • I've heard some people say that "vibe coding" with chatbots is like slot machines, you just keep "propmting" until you hit the jackpot. And there was some earlier study that people _felt_ more productive even if they weren't (caveat that this was with older models), which aligns with the sort of time-dilation people feel when gambling.

    I guess "agentic swarms" are the next evolution of the meta-game, the perfect nerd-sniping strategy. Now you can spend all your time minmaxing your team, balancing strengths/weaknesses by tweaking subagents, adding more verifiers and project managers. Maybe there's some psychological draw, that people can feel like gods and have a taste of the power execs feel, even though that power is ultimately a simulacra as well.

    • Extending this -- unlike real slot machines, there is no definite state of won or not for the person prompting, only if they've been convinced they've won, and that comes down to how much you're willing to verify the code it has provided, or better, fully test it (which no one wants to do), versus the reality where they do a little light testing and say it's good enough and move on.

      Recently fixed a problem over a few days, and found that it was duplicated though differently enough that I asked my coworker to try fixing it with an LLM (he was the originator of the duplicated code, and I didn't want to mess up what was mostly functioning code). Using an LLM, he seemingly did in 1 hour what took me maybe a day or two of tinkering and fixing. After we hop off the call, I do a code read to make sure I understand it fully, and immediately see an issue and test it further only to find out.. it did not in fact fix it, and suffered from the same problems, but it convincingly LOOKED like it fixed it. He was ecstatic at the time-saved while presenting it, and afterwards, alone, all I could think about was how our business users were going to be really unhappy being gaslit into thinking it was fixed because literally every tester I've ever met would definitely have missed it without understanding the code.

      People are overjoyed with good enough, and I'm starting to think maybe I'm the problem when it comes to progress? It just gives me Big Short vibes -- why am I drawing attention to this obvious issue in quality, I'm just the guy in the casino screaming "does no one else see the obvious problem with shipping this?" And then I start to understand, yes I am the problem: people have been selling eachother dog water product for millenia because at the end of the day, Edison is the person people remember, not the guy who came after that made it near perfect or hammered out all the issues. Good enough takes its place in history, not perfection. The trick others have found out is they just need to get to the point that they've secured the money and have time to get away before the customer realizes the world of hurt they've paid for.