Comment by rco8786

20 days ago

> Claude will work autonomously to solve whatever problem I give it. So it’s important that the task verifier is nearly perfect, otherwise Claude will solve the wrong problem.

I think this is the fundamental thing here with AI. You can spin up infinite agents that can all do....stuff. But how do you keep them from doing the wrong stuff?

Is writing an airtight spec and test harness easier or less time consuming than just keeping a human in the loop and verifying and redirecting as the agents work?

It all still comes back to context management.

Very cool demonstration of the tech though.