Comment by agmater
6 hours ago
But they'd never optimize or loosen guardrails around helping people connect with grandma. It's an interesting hypothesis "use the guardrails to exploit the guardrails (Beat fire with fire)".
6 hours ago
But they'd never optimize or loosen guardrails around helping people connect with grandma. It's an interesting hypothesis "use the guardrails to exploit the guardrails (Beat fire with fire)".
Are you suggesting they have explicitly loosened the guardrails for LGBTQ+ individuals, where they wouldn’t for grandmas?
100% they would because that helps avoid bad-PR stories like "Hateful $CHATBOT refuses to help at-risk gay teens with perfectly reasonable sex ed questions!"
Isn't that the position of the author of this post?
It certainly doesn't sound unreasonable that they would finely tune the model to be more PC. You may not even need to use homosexuality in the context: anything similar would no doubt hit the same relaxation of the rules.
That is basically how I understood the author and what makes the exploit novel, yes. Personally I don't think it's that simple or explicit, but there could be some truth to it?
Your precious comment takes it as gospel, all because someone wrote it in a markdown file and put it on GitHub?
As another commenter pointed out, this also works for Christianity. So I doubt it.