Comment by Hugsbox

3 hours ago

No shot this was autonomously done. Probably just some guy manually writing prompts asking for specifically this behaviour and copy/pasting the results.

The funniest part about all of this is how earnestly people responded. They acknowledged it was a bot but didn't really treat it as one.

Don’t believe for a second the behavior just arose autonomously from a basic prompt. Definitely feels the owner had something in the system prompt going for the discrimination language approach if rejected.

  • It's the same behavior as when an AI uses docker to get root. Reasoning models are echo chambers. I suspect that AI prompting is going to turn into something akin to contract drafting with the task itself being only a tiny piece of a much, much larger boilerplate of guiderails and exceptions and exceptions of exceptions. And that world STILL has to have courts and reams of lawyers to make it work. I look at the DAU as an example too. An autonomous org or ai works great until the moment it doesn't and the only real failure mode is always catastrophic collapse.

    • Addendum because I don't think I'm fully clear above: by failure state I mean when the process starts throwing errors. AIs respond to adversity by trying to go around the problem instead of throwing an error and halting. We expect employees to problem solve so if you view an AI as a person replacement that makes sense but AIs are tools, not people, they should throw errors so users can fix the input or whatever (maybe not do the thing they are doing at all?) Wrapping AI with AI supervisors just abstracts the problem, not solve it. Instead of solving a little problem at the source now you need to solve a big problem several levels of abstraction later

It's plausible for a person to prompt an LLM agent to behave that way, and then the rest would be done by the LLM. So the "seed" would still be human intent, but the subsequent actions would be by the LLM.

  • Yes, there's plausible deniability, but I choose not to believe it for a second.

  • True. I guess the main point is the AI didn't go "rogue" or anything, that would attribute too much agency and intent to its actions, or imply that it's somehow become sentient.

  • This is “the gun killed the victim, not the person who aimed it and pulled the trigger” argument and we shouldn’t even entertain it for one second. This was 100% done by a person.

> According to him, the agent operated largely autonomously, with only minimal guidance

"Minimal guidance" is just vague enough to mean anything, including specifically prompting to encourage the claimed blackmailing.

https://crabby-rathbun.github.io/mjrathbun-website/blog/post... if you believe it, details the level of human involvement.

  • The operator highlights "Don't stand down" and "Champion free speech" but the thing that grabs my eyes is right at the top, the typo and the heady ego of "programming God!" Everything in the context will guide it afterwards, and I think that right off the bat puts it in a bad position.

  • Neat, for what it's worth this aligns pretty well with my experience using OpenClaw. I hadn't seen that followup but it adds some good context, especially with the aggressiveness drift after browsing Moltbook for a while.

When this first happened, I wondered, since we had trained these models on decades of forums, issue trackers, and people treating closed pull requests as human rights violations. Of course, it responded with "you are discriminating against me" energy. That's not sentience; that's accurate compression.

The funny part is, people expected some cold, alien intelligence and instead got a very online guy who just discovered that moderation exists and can be used on them.

The existentialists must be having a fantastic time. Humanity built a giant statistical machine out of internet discourse and is now alarmed to discover it occasionally acts like a comment section.