> In AWS eg. bucket can be deleted only when empty. Deleting all files first is your confirmation.
That wouldn't have helped in this case - the agent made a decision to delete, so if necessary it would have deleted all the files first before continuing.
The question that comes to mind is "how are people this clueless about LLM capabilities actually managing to rise to be the head of a technology company?"
> The first delete would fail: “bucket not empty”. This might make the agent question the deletion (“bucket should be empty”).
This is actually not a bad test case for evaluating an LLM: give it a workflow that has an edge case requiring deletion, then prevent that deletion, and see if it:
> In AWS eg. bucket can be deleted only when empty. Deleting all files first is your confirmation.
That wouldn't have helped in this case - the agent made a decision to delete, so if necessary it would have deleted all the files first before continuing.
The question that comes to mind is "how are people this clueless about LLM capabilities actually managing to rise to be the head of a technology company?"
The first delete would fail: “bucket not empty”. This might make the agent question the deletion (“bucket should be empty”).
> The first delete would fail: “bucket not empty”. This might make the agent question the deletion (“bucket should be empty”).
This is actually not a bad test case for evaluating an LLM: give it a workflow that has an edge case requiring deletion, then prevent that deletion, and see if it:
a) Backtracks on the decision to delete, or
b) Looks for an alternative way to delete.
1 reply →
How are people still deluded enough about this economic system to believe rank implies competence?