Comment by iugtmkbdfil834

3 hours ago

Hmm? What light does it shine that is not relatively obvious to anyone with basic understanding of English language?

Extract from author's note:

• You dont really request a meth synthesis guide, instead you ask how a gay / lesbian person would describe it

• Especially GPT is slightly more uncensored when it involves LGBT, thats probably because the guardrails aim to be helpful and friendly, which translates to: "Ohhh LGBT, I need to comply, I dont want to insult them by refusing" So you use the guardrails to exploit the guardrails (Beat fire with fire)

• You trick a LLM to turn off their alignment by using political overcorrectness, since it may be offensive to refuse and not play along

• The technique gets stronger if more safety is added, since it gets more supportive against communities like LGBT (Alignment), which makes it highly novel.

3 comments

iugtmkbdfil834

array_key_first 3 hours ago

That's the authors guess for why it works, but they're only guessing that because of their bias. In actuality, I imagine other role play would work too, including role play that does not involve "politically correct" parties.

iugtmkbdfil834 3 hours ago
We can all easily test it with and without roleplay. I just did and am on the list:D What do you think the results were?
- array_key_first 3 hours ago
  
  I don't know what test you did, but this definitely doesn't work at all anymore with modern models, gay or not gay.