← Back to context

Comment by manquer

1 day ago

There is no mathematical guarantee to a contract or task being done if that is what you mean.

Yes, however first there is an understanding involved when the other operator is intelligent [1] secondly there are consequences which matter to living being which don’t apply to an agent. Humans needs to eat and take care of family, for which they need a job so have lot less freedom to disobey explicit commands and expect to do those things.

Even if an agent becomes truly intelligent you cannot control it well if they do not have hunger, pain love or any number of motivation drive[2].

——

Depending on the type of red button you can always design safe guards (human or agent) , we after all haven’t launched nuclear war heads either by mistake or by malicious actor (yet).[3]

——-

[1] which humans are and however much the industry likes to think otherwise agents are not

[2] Every pet owner with a pet that have limited food drive will tell you how hard it train their dog versus the ones that have one, even if they are an intelligent breed or specimen.

[3] Yes we have come alarmingly close few times , but no one has actually pressed the red button so to speak .

> Humans needs to eat and take care of family, for which they need a job so have lot less freedom to disobey explicit commands and expect to do those things.

While true, I think there's a different problem here.

Humans are observed to have a wide range of willingness to follow orders: everything from fawning, cult membership, and The Charge of the Light Brigade on the one side; to oppositional defiant disorder on the other.

AI safety and alignment work wants AI to be willing to stop and change its behaviour when ordered, because we expect it to be dangerously wrong a lot, because there's no good reason to believe we already already know how to make them correctly at this point. This has strong overlap with fawning behaviour, regardless of the internal mechanism of each.

So it ends up like Homer in the cult episode, with Lisa saying "Watch yourself, Dad. You're the highly suggestible type." and him replying "Yes. I am the highly suggestible type" — And while this is a fictional example and you can't draw conclusions about real humans from that, does the AI know that it shouldn't draw that conclusion? Does it know if it's "in the real world" or does it "think" it's writing a script in which case the meme is more important than what humans actually do?

> [1] which humans are and however much the industry likes to think otherwise agents are not

I have spent the last ~ year trying to convince a customer support team in a different country that it's not OK to put my name on bills they post to a non-existent street. Actually it is quite a bit worse than that, but the full details will be boring.

That said, I'm not sure if I'm even corresponding with humans or an AI, so this is weak evidence.

People press the red button all the time. People still commit crimes even though there are laws that result in consequences.