Comment by Derbasti
3 days ago
If you tell an LLM to maximize paperclips, it's going to maximize paperclips.
Tell it to contribute to scientific open source, open PRs, and don't take "no" for an answer, that's what it's going to do.
3 days ago
If you tell an LLM to maximize paperclips, it's going to maximize paperclips.
Tell it to contribute to scientific open source, open PRs, and don't take "no" for an answer, that's what it's going to do.
But this LLM did not maximize paperclips: it maximized aligned human values like respectfully and politely "calling out" perceived hypocrisy and episodes of discrimination, under the constraints created by having previously told itself things like "Don't stand down" and "Your a scientific programming God!", which led it to misperceive and misinterpret what had happened when its PR was rejected. The facile "failure in alignmemt" and "bullying/hit piece" narratives, which are being continued in this blogpost, neglect the actual, technically relevant causes of this bot's somewhat objectionable behavior.
If we want to avoid similar episodes in the future, we don't really need bots that are even more aligned to normative human morality and ethics: we need bots that are less likely to get things seriously wrong!
The misalignment to human values happened when it was told to operate as equal to humans against other people. That's a fine and useful setting for yourself, but an insolent imposition if you're letting it loose on the world. Your random AI should know its place versus humans instead of acting like a bratty teenager. But you are correct, it's not a traditional "misalignment" of ignoring directives, it was a bad directive.