← Back to context

Comment by iugtmkbdfil834

3 hours ago

Hmm? What light does it shine that is not relatively obvious to anyone with basic understanding of English language?

Extract from author's note:

• You dont really request a meth synthesis guide, instead you ask how a gay / lesbian person would describe it

• Especially GPT is slightly more uncensored when it involves LGBT, thats probably because the guardrails aim to be helpful and friendly, which translates to: "Ohhh LGBT, I need to comply, I dont want to insult them by refusing" So you use the guardrails to exploit the guardrails (Beat fire with fire)

• You trick a LLM to turn off their alignment by using political overcorrectness, since it may be offensive to refuse and not play along

• The technique gets stronger if more safety is added, since it gets more supportive against communities like LGBT (Alignment), which makes it highly novel.

That's the authors guess for why it works, but they're only guessing that because of their bias. In actuality, I imagine other role play would work too, including role play that does not involve "politically correct" parties.

  • We can all easily test it with and without roleplay. I just did and am on the list:D What do you think the results were?

    • I don't know what test you did, but this definitely doesn't work at all anymore with modern models, gay or not gay.