Comment by embedding-shape
8 hours ago
I feel like you're missing the initial context of this conversation (no pun intended):
> Like for example a trusted user makes feedback -> feedback gets curated into a ticket by an AI agent, then turned into a PR by an Agent, then reviewed by an Agent, before being deployed by an Agent.
Once you add "humans for clarifications and take direction" then yeah, things can be useful, but that's far away from the non-human-involvment-loop earlier described in this thread, which is what people are pushing back against.
Of course, involving people makes things better, that's the entire point here, and that by removing the human, you won't get as good results. Going back to benchmarks, obviously involving humans aren't possible here, so again we're back to being unable to score these processes at all.
I'm confused on the scenario here. There is human in the loop, it's the feedback part...there is business context, it is either seeded or maintained by the human and expanded by the agent. The agent can make inferences about the world, especially when embodiment + better multimodal interaction is rolled out [embodiment taking longer].
Benchmarks ==> it's absolutely not a given that humans can't be involved in the loop of performance measurement. Why would that be the case?