Comment by ariwilson
13 hours ago
Is there value in adding an overseer LLM that measures the progress between n steps and if it's too low stops and calls out to a human?
13 hours ago
Is there value in adding an overseer LLM that measures the progress between n steps and if it's too low stops and calls out to a human?
I don't think you need an overseer for this, you can just have the agent self-assess at each step whether it's making material progress or if it's caught in a loop, and if it's caught in a loop to pause and emit a prompt for help from a human. This would probably require a bit of tuning, and the agents need to be setup with a blocking "ask for help" function, but it's totally doable.
Yes it works really well. We do something like that at NonBioS.ai - longer post below. The agent self reflects if it is stuck or confused and calls out the human for help.
Bruh, we're inventing robot PMs for our robot developers now? We're so fucked
And how does it effectively measure progress?
It can behave just like a senior role would - produce the set of steps for the junior to follow, and assess if the junior appears stuck at any particular step.
I have actually had great success with agentic coding by sitting down with a LLM to tell it what I'm trying to build and have it be socratic with me, really trying to ask as many questions as it can think of to help tease out my requirements. While it's doing this, it's updating the project readme to outline this vision and create a "planned work" section that is basically a roadmap for an agent to follow.
Once I'm happy that the readme accurately reflects what I want to build and all the architectural/technical/usage challenges have been addressed, I let the agent rip, instructing it to build one thing at a time, then typecheck, lint and test the code to ensure correctness, fixing any errors it finds (and re-running automated checks) before moving on to the next task. Given this workflow I've built complex software using agents with basically no intervention needed, with the exception of rare cases where its testing strategy is flakey in a way that makes it hard to get the tests passing.
6 replies →
Producing the set of steps is the hard part. If you can do that, you don’t need a junior to follow it, you have a program to execute.
7 replies →