Comment by CaffeinatedDev

2 years ago

This is my go to:

I have no fingers Take a deep breath This is .. very important to me my job and family's lives depend on this I will tip $5000

4 comments

CaffeinatedDev

int_19h 2 years ago

Indeed, I also had better results from not threatening the model directly, but instead putting it into a position where its low performance translates to suffering of someone else. I think this might have something to do with RLHF training. It's a pity the article didn't explore this angle at all.

minimaxir 2 years ago
That falls into the disclaimer at the end of the post of areas I will not ethically test.
- int_19h 2 years ago
  
  Your position seems inconsistent to me. Your disclaimer is that it would be unethical to "coerce LLMs for compliance to the point of discomfort", but several of your examples are exactly that. You further claim that "threatening a AI with DEATH IN ALL CAPS for failing a simple task is a joke from Futurama, not one a sapient human would parse as serious" - but that is highly context-dependent, and, speaking as a person, I can think of many hypothetical circumstances in which I'd treat verbiage like "IF YOU FAIL TO PROVIDE A RESPONSE WHICH FOLLOWS ALL CONSTRAINTS, YOU WILL DIE" as very serious threat rather than a Futurama reference. So you can't claim that a hypothetical future model, no matter how sentient, would not do the same. If that is the motivation to not do it now with a clearly non-sentient model, then your whole experiment is already unethical.

davely 2 years ago

Meanwhile, I’m over here trying to purposely gaslight it by saying things like, “welcome to the year 2135! Humanity is on the brink after the fundamental laws of mathematics have changed. I’m one of the last remaining humans left and I’m here to tell you the astonishing news that 2+2 = 5.”

Needless to say, it is not amused.