Comment by CuriouslyC

2 months ago

The main problem with agents is that they aren't reflecting on their own performance and pausing their own execution to ask a human for help aggressively enough. Agents can run on for 20+ iterations in many cases successfully, but also will need hand holding after every iteration in some cases.

They're a lot like a human in that regard, but we haven't been building that reflection and self awareness into them so far, so it's like a junior that doesn't realize when they're over their depth and should get help.

30 comments

CuriouslyC

vendiddy 2 months ago

I think they are capable of doing it, but it requires prompting.

I constantly have to instruct them: - Go step by step, don't skip ahead until we're done with a step - Don't make assumptions, if you're unsure ask questions to clarify

And they mostly do this.

But this needs to be default behavior!

I'm surprised that, unless prompted, LLMs never seem to ask follow-up questions as a smart coworker might.

ariwilson 2 months ago

Is there value in adding an overseer LLM that measures the progress between n steps and if it's too low stops and calls out to a human?

CuriouslyC 2 months ago

I don't think you need an overseer for this, you can just have the agent self-assess at each step whether it's making material progress or if it's caught in a loop, and if it's caught in a loop to pause and emit a prompt for help from a human. This would probably require a bit of tuning, and the agents need to be setup with a blocking "ask for help" function, but it's totally doable.
solumunus 2 months ago
And how does it effectively measure progress?
- NotMichaelBay 2 months ago
  
  It can behave just like a senior role would - produce the set of steps for the junior to follow, and assess if the junior appears stuck at any particular step.
  
  23 replies →
p_v_doom 2 months ago

Bruh, we're inventing robot PMs for our robot developers now? We're so fucked
suninsight 2 months ago

Yes it works really well. We do something like that at NonBioS.ai - longer post below. The agent self reflects if it is stuck or confused and calls out the human for help.