← Back to context

Comment by falcor84

7 hours ago

It's not common, but I've personally built APIs where requests for dangerous modifications like this perform a dry run, giving in the response the resources that would be deleted/changed and a random token, which then needs to be provide to actually make the change. The idea was that this would be presented in the UI for the user to confirm, but it should be as useful or more by AI agents. Also, you get the benefit that the token only approves that particular modification operation, so if the resources change in between, you need to reapprove.

I guess we don’t know what the agent would do after seeing these warnings and a request for extra action.

Perhaps it would stop and rethink, perhaps it would focus on the fact that extra action is needed - and perform that automatically.

I suppose the decision would depend on multiple factors too (model, prompt, constraints).

Measure twice cut once seems to be forgotten these days.

  • As well as: A computer can never be held accountable

    • Let me ask you this - can a company be held accountable? I.e. are you ok with the legal manner in which when I hire a company to provide me a service and they fail to provide it, or cause harm in the process, I can sue them, potentially in a way that would lead to their bankruptcy?

      If so, I can imagine a potential future in which we have limited liability companies each run by a single AI (potentially on a particular physical computer). In that future, if you hired an AI to do a project for you, and it ended up deleting the production database, you'd be able to sue it, and get a payout and/or bankrupt it, which I imagine would then lead to an "antifragile" ecosystem whereby AIs adapt to be more careful.