Comment by ben_w

1 day ago

> Humans needs to eat and take care of family, for which they need a job so have lot less freedom to disobey explicit commands and expect to do those things.

While true, I think there's a different problem here.

Humans are observed to have a wide range of willingness to follow orders: everything from fawning, cult membership, and The Charge of the Light Brigade on the one side; to oppositional defiant disorder on the other.

AI safety and alignment work wants AI to be willing to stop and change its behaviour when ordered, because we expect it to be dangerously wrong a lot, because there's no good reason to believe we already already know how to make them correctly at this point. This has strong overlap with fawning behaviour, regardless of the internal mechanism of each.

So it ends up like Homer in the cult episode, with Lisa saying "Watch yourself, Dad. You're the highly suggestible type." and him replying "Yes. I am the highly suggestible type" — And while this is a fictional example and you can't draw conclusions about real humans from that, does the AI know that it shouldn't draw that conclusion? Does it know if it's "in the real world" or does it "think" it's writing a script in which case the meme is more important than what humans actually do?

> [1] which humans are and however much the industry likes to think otherwise agents are not

I have spent the last ~ year trying to convince a customer support team in a different country that it's not OK to put my name on bills they post to a non-existent street. Actually it is quite a bit worse than that, but the full details will be boring.

That said, I'm not sure if I'm even corresponding with humans or an AI, so this is weak evidence.

0 comments

ben_w

No comments yet

Contribute on Hacker News ↗