Comment by suprfnk

18 hours ago

But then, if an agent picks the best response, how would you know that that is reliable?

You could get the agents to output something structured and then use a deterministic test if you're worried about that.

Obviously you have multiple agents justify why they picked a certain response and then create another agent that picks the solution with the best justification.