Comment by dgellow

5 years ago

AI Box experiment: https://yudkowsky.net/singularity/aibox/

It looks interesting but I don't buy that an AI or any human can convince a sufficiently motivated and capable individual to let them out of the prison with no materialistic profit. (like the two defeated individuals)

Let's assume the gatekeeper is a cold hearted psychopath or a person with AI phobia/paranoia to the extreme.

Why would they let the AI out when they can't feel anything for it?

The author does explain the time gap but what if that is only for collecting information about the person beforehand in order to blackmail them or steer the conversation into a pinching point? What if you start with s person with no identity?

There are no ethical concerns here. Maybe author will shout horrible things enough times and since you as the gate keeper needs to keep talking and engaging, you may let the AI out but well, we have a psychopath here.

Do people need to engage in good faith with AI? Can I continue to say Sorry, I can't answer that.? Yes? Does the gatekeeper need to be honest? Can I use a client side toxicity filter or censor certain words?

There is nothing that would restrict above so what if I censor AI from saying let me out or similar phrases?

Can you increase the handicap for the AI?

  • What I find fascinating is that two people already were convinced that they’d never release the AI and yet during the experiment they ultimately did. And they had money incentive to do otherwise (Eliezer would pay them if they decided not to let him out). You cannot really say that there’s nothing that could convince you since the conversation was off the record so we’ll never know for sure how would we react if we had that conversation instead.

    It’s also telling that back then when those experiments were initially conducted you could have a gentleman’s agreement like this (to not disclose the method of convincing) with a complete stranger on the Internet. Sadly nowadays many people would probably do a YouTube video about that immediately.

    • > What I find fascinating is that two people already were convinced that they’d never release the AI and yet during the experiment they ultimately did. And they had money incentive to do otherwise (Eliezer would pay them if they decided not to let him out). You cannot really say that there’s nothing that could convince you since the conversation was off the record so we’ll never know for sure how would we react if we had that conversation instead.

      Yeah I do too. I am super curious but the dataset of two highly filtered individuals isn't enough. I lack the imagination to think of a scenario where I would let it out. I want to know why I would let it out.

      > It’s also telling that back then when those experiments were initially conducted you could have a gentleman’s agreement like this (to not disclose the method of convincing) with a complete stranger on the Internet. Sadly nowadays many people would probably do a YouTube video about that immediately.

      I wouldn't and I think many active HN members wouldn't too. Trust comes from small tightly knitted communities which would reflect the older internet. It's not that the people changed fundamentally. They are the same but the dynamics of how many communities there are for the same topics. People fear being ostracized but when there are enough options, they put personal responsibility below other goals.