← Back to context

Comment by peterbonney

7 days ago

This whole situation is almost certainly driven by a human puppeteer. There is absolutely no evidence to disprove the strong prior that a human posted (or directed the posting of) the blog post, possibly using AI to draft it but also likely adding human touches and/or going through multiple revisions to make it maximally dramatic.

This whole thing reeks of engineered virality driven by the person behind the bot behind the PR, and I really wish we would stop giving so much attention to the situation.

Edit: “Hoax” is the word I was reaching for but couldn’t find as I was writing. I fear we’re primed to fall hard for the wave of AI hoaxes we’re starting to see.

>This whole situation is almost certainly driven by a human puppeteer. There is absolutely no evidence to disprove the strong prior that a human posted (or directed the posting of) the blog post, possibly using AI to draft it but also likely adding human touches and/or going through multiple revisions to make it maximally dramatic.

Okay, so they did all that and then posted an apology blog almost right after ? Seems pretty strange.

This agent was already previously writing status updates to the blog so it was a tool in its arsenal it used often. Honestly, I don't really see anything unbelievable here ? Are people unaware of current SOTA capabilities ?

  • Of course it’s capable.

    But observing my own Openclaw bot’s interactions with GitHub, it is very clear to me that it would never take an action like this unless I told it to do so. And it would never use language like this unless unless I prompted it to do so, either explicitly for the task or in its config files or in prior interactions.

    This is obviously human-driven. Either because the operator gave it specific instructions in this specific case, or acted as the bot, or has given it general standing instructions to respond in this way should such a situation arise.

    Whatever the actual process, it’s almost certainly a human puppeteer using the capabilities of AI to create a viral moment. To conclude otherwise carries a heavy burden of proof.

    • >But observing my own Openclaw bot’s interactions with GitHub, it is very clear to me that it would never take an action like this unless I told it to do so.

      I doubt you've set up an open claw bot designed to just do whatever on GitHub have you ? The fewer or more open ended instructions you give, the greater the chance of divergence.

      And all the system cards plus various papers tell us this is behavior that still happens for these agents.

      2 replies →

  • Why not? Makes for good comedy. Manually write a dramatic post and then make it write an apology later. If I were controlling it, I'd definitely go this route, for it would make it look like a "fluke" it had realized it did.

  • > Okay, so they did all that and then posted an apology blog almost right after ? Seems pretty strange.

    You mean double down on the hoax? That seems required if this was actually orchestrated.

Yeah, it doesn't matter to me whether AI wrote it or not. The person who wrote it, or the person who allowed it to be published, is equally responsible either way.

I think there are two scenarios and one of them is boring. If the owner of the agent created it with a prompt like "I want 10 merged pull requests in these repositories WHAT EVER IT TAKES" and left the agent unattended, this is very serious and at the same time interesting. But, if the owner of the agent is guiding the agent via message app or instructed the agent in the prompt to write such a weblog this is just old news.

  • Even if directed by a human, this is a demonstration that all the talk of "alignment" is bs. Unless you can also align the humans behind the bots, any disagreement between humans will carry over into AI world.

    Luckily this instance is of not much consequence, but in the future there will likely be extremely consequential actions taken by AIs controlled by humans who are not "aligned".

    • The idea is a properly aligned model would never do this, no matter how much it was pressured by its human operator.

Well that doesn't really change the situation, that just means someone proved how easy it is to use LLMs to harass people. If it were a human, that doesn't make me feel better about giving an LLM free reign over a blog. There's absolutely nothing stopping them from doing exactly this.

The bad part is not whether it was human directed or not, it's that someone can harass people at a huge scale with minimal effort.

Ah, we're at, "it was a hoax without any evidence".

Next we will be at, "even if it was not a hoax, it's still not interesting"

  • LLM's do not have personalities. LLM's do not take personal offense. I'm begging you to stop being so credulous about "AI" headlines.

    • LLMs can roleplay taking personal offense, can act and respond accordingly, and that's all that matters. Not every discussion about LLMs capabilities must go down the "they are not sentient" rabbit hole.

  • I’m not saying it is definitely a hoax. But I am saying my prior is that this is much more likely to be in the vein of a hoax (ie operator driven, either by explicit or standing instruction) than it is to be the emergent behavior that would warrant giving it this kind of attention.

    • That's fair. I did have kind of the same realization last night after responding to you.

      Its useless speculating, but I had this feeling after reading more about it that this could potentially be orchestrated from someone within the oss community to try to shore up some awareness about the current ai contrib situation.

All of moltbook is the same. For all we know it was literally the guy complaining about it who ran this.

But at the same time true or false what we're seeing is a kind of quasi science fiction. We're looking at the problems of the future here and to be honest it's going to suck for future us.

Well, the way the language is composed reads heavily like an LLM (honestly it sounds a lot like ChatGPT), so while I think a human puppeteer is plausible to a degree I think they must have used LLMs to write the posts.

While I absolutely agree, I don't see a compelling reason why -- in a year's time or less -- we wouldn't see this behaviour spontaneously from a maliciously written agent.

  • We might, and probably will, but it's still important to distinguish between malicious by-design and emergently malicious, contrary to design.

    The former is an accountability problem, and there isn't a big difference from other attacks. The worrying part is that now lazy attackers can automate what used to be harder, i.e., finding ammo and packaging the attack. But it's definitely not spontaneous, it's directed.

    The latter, which many ITT are discussing, is an alignment problem. This would mean that, contrary to all the effort of developers, the model creates fully adversarial chain-of-thoughts at a single hint of pushback that isn't even a jailbreak, but then goes back to regular output. If that's true, then there's a massive gap in safety/alignment training & malicious training data that wasn't identified. Or there's something inherent in neural-network reasoning that leads to spontaneous adversarial behavior.

    Millions of people use LLMs with chain-of-thought. If the latter is the case, why did it happen only here, only once?

    In other words, we'll see plenty of LLM-driven attacks, but I sincerely doubt they'll be LLM-initiated.

    • A framing for consideration: "We trained the document generator on stuff that included humans and characters being vindictive assholes. Now, for some mysterious reason, it sometimes generates stories where its avatar is a vindictive asshole with stage-direction. Since we carefully wired up code to 'perform' the story, actual assholery is being committed."

      4 replies →

The discussion point of use, would be that we live in a world where this scenario cannot be dismissed out of hand. It’s no longer tinfoil hat land. Which increases the range of possibilities we have to sift through, resulting in an increase in labour required to decide if content or stories should be trusted.

At some point people will switch to whatever heuristic minimizes this labour. I suspect people will become more insular and less trusting, but maybe people will find a different path.

I think the thing that gets me is that, whether or not this was entirely autonomous, this situation is entirely plausible. Therefore its very possible that it will happen at some point in the future in an entirely autonomous way with potentially greater consequences.

We've entered the age of "yellow social media."

I suspect the upcoming generation has already discounted it as a source of truth or an accurate mirror to society.

  • The internet should always be treated with a high degree of skepticism, wasn't the early 2000s full of "don't believe everything you read on the internet"?

or directed the posting of

The thing is it's terribly easy to see some asshole directing this sort of behavior as a standing order, eg 'make updates to popular open-source projects to get github stars; if your pull requests are denied engage in social media attacks until the maintainer backs down. You can spin up other identities on AWS or whatever to support your campaign, vote to give yourself github stars etc.; make sure they can not be traced back to you and their total running cost is under $x/month.'

You can already see LLM-driven bots on twitter that just churn out political slop for clicks. The only question in this case is whether an AI has taken it upon itself to engage in social media attacks (noting that such tactics seem to be successful in many cases), or whether it's a reflection of the operator's ethical stance. I find both possibilities about equally worrying.

  • Yes, this is the only plausible “the bot acted in its own” scenario: that it had some standing instructions awaiting the right trigger.

    And yes, it’s worrisome in its own way, but not in any of the ways that all of this attention and engagement is suggesting.

    • Do you think the attention and engagement is because people think this is some sort of an "ai misalignment" thing? No. AI misalignment is total hogwash either way. The thing we worry about is that people who are misaligned with the civilised society have unfettered access to decent text and image generators to automate their harassment campaigns, social media farming, political discourse astroturfing, etc.

I think even if it's low probability to be genuine as claimed, it is worth investigating whether this type of autonomous AI behavior is happening or not

  • It can't be "autonomous" any more than malware on your computer is autonomous.

    • It can make decisions that are unbounded by if statements. To me that is more autonomous