Comment by mafriese

1 month ago

Nope it isn’t. I did it as a joke initially (I also had a version where every 2 stories there was a meeting and if a someone underperformed it would get fired). I think there are multiple reasons why it actually works so well:

- I built a system where context (+ the current state + goal) is properly structured and coding agents only get the information they actually need and nothing more. You wouldn’t let your product manager develop your backend and I gave the backend dev only do the things it is supposed to and nothing more. If an agent crashes (or quota limits are reached), the agents can continue exactly where the other agents left off.

- Agents are ”fighting against” each other to some extend? The Architect tries to design while the CAB tries to reject.

- Granular control. I wouldn’t call “the manager” _a deterministic state machine that is calling probabilistic functions_ but that’s to some extent what it is? The manager has clearly defined tasks (like “if file is in 01_design —> Call Architect)

Here’s one example of an agent log after a feature has been implemented from one of the older codebases: https://pastebin.com/7ySJL5Rg

13 comments

mafriese

ggoo 1 month ago

Thanks for clarifying - I think some of the wording was throwing me off. What a wild time we are in!

stavros 1 month ago

What OpenCode primitive did you use to implement this? I'd quite like a "senior" Opus agent that lays out a plan, a "junior" Sonnet that does the work, and a senior Opus reviewer to check that it agrees with the plan.

mafriese 1 month ago
You can define the tools that agents are allowed to use in the opencode.json (also works for MCP tools I think). Here’s my config: https://pastebin.com/PkaYAfsn
The models can call each other if you reference them using @username.
This is the .md file for the manager : https://pastebin.com/vcf5sVfz
I hope that helped!
- stavros 1 month ago
  
  This is excellent, thank you. I came up with half of this while waiting for this reply, but the extra pointers about mentioning with @ and the {file} syntax really helps, thanks again!

overfeed 1 month ago

> [...]coding agents only get the information they actually need and nothing more

Extrapolating from this concept led me to a hot-take I haven't had time to blog about: Agentic AI will revive the popularity of microservices. Mostly due to the deleterious effect of context size on agent performance.

throwup238 1 month ago
Why would they revive the popularity of microservices? They can just as well be used to enforce strict module boundaries within a modular monolith keeping the codebase coherent without splitting off microservices.
- WesBrownSQL 1 month ago
  
  And that's why they call it a hot take. No, it isn't going to give rise to microservices. You absolutely can have your agent perform high-level decomposition while maintaining a monolith. A well-written, composable spec is awesome. This has been true for human and AI coders for a very, very long time. The hat trick has always been getting a well-written, composable spec. AI can help with that bit, and I find that is probably the best part of this whole tooling cycle. I can actually interact with an AI to build that spec iteratively. Have it be nice and mean. Have it iterate among many instances and other models, all that fun stuff. It still won't make your idea awesome or make anyone want to spend money on it, though.
tripledry 1 month ago
In a fresh project that is well documented and set up it might work better. Many issues that Agents have in my work is that the endpoints are not always documented correctly.
Real example that happened to me, Agent forgets to rename an expected parameter in API spec for service 1. Now when working on service 2, there is no other way of finding this mistake for the Agent than to give it access to service 1. And now you are back to "... effect of context size on agent performance ...". For context, we might have ~100 services.
One could argue these issues reduce over time as instruction files are updated etc but that also assumes the models follow instructions and don't hallucinate.
That being said, I do use Agents quite successfully now - but I have to guide them a bit more than some care to admit.
- overfeed 1 month ago
  
  > In a fresh project that is well documented and set up it might work better.
  I guess this may be dependent on domain, language, codebase, or soke combination of the 3. The biggest issues I've had with agents is when they go down the wrong path and it snowballs from there. Suddenly they are loading more context unrelated to the tasks and getting more confused. Documenting interfaces doesn't help if the source is available to the agent.
  My agentic sweet spot is human-designed interfaces. Agents cannot mess up code they don't have access to, e.g. by inadvertently changing the interface contract and the implementation.
  > Agent forgets to rename an expected parameter in API spec for service 1
  Document and test your interfaces/logic boundaries! I have witnessed this break many times with human teams with field renames, change in optionality, undocumented field dependencies, etc, there are challenging trade-offs with API versioning. Agents can't fix process issues.

imiric 1 month ago

Isn't all this a manual implementation of prompt routing, and, to a lesser extent, Mixture of Experts?

These tools and services are already expected to do the best job for specific prompts. The work you're doing pretty much proves that they don't, while also throwing much more money at them.

How much longer are users going to have to manually manage LLM context to get the most out of these tools? Why is this still a problem ~5 years into this tech?

nobody_r_knows 1 month ago

I'm confused when you say you have a manager, scrum master, archetech, all supposdely sharing the same memory, do each of those "employees" "know" what they are? And if so, based on what are their identities defined? Prompts? Or something more. Or am I just too dumb to understand / swimming against the current here. Either way, it sounds amazing!

Jimmc414 1 month ago

Their roles are defined by prompts. Only memory are shared files and the conversation history that’s looped back to stateless API calls to an LLM.

simultsop 1 month ago

quite a storyteller