Comment by TomasBM

5 days ago

> it all seems comfortably within the capabilities of OpenClaw

I definitely agree. In fact, I'm not even denying that it's possible for the agent to have deviated despite the best intentions of its designers and deployers.

But the question of probability [1] and attribution is important: what or who is most likely to have been responsible for this failure?

So far, I've seen plenty of claims and conclusions ITT that boil down to "AI has discovered manipulation on its own" and other versions of instrumental convergence. And while this kind of failure mode is fun to think about, I'm trying to introduce some skepticism here.

Put simply: until we see evidence that this wasn't faked, intentional, or a foreseeable consequence from deployer's (or OpenClaw/LLM developers') mistakes, it makes little sense to grasp for improbable scenarios [1] and build an entire story around them. IMO, it's even counterproductive, because then the deployer can just say "oh it went rogue on its own haha skynet amirite" and pretty much evade responsibility. We should instead do the opposite - the incident is the deployer's fault until proven otherwise.

So when you say:

> originally prompted with a lot of reckless, borderline malicious guidelines

That's much more probable than "LLM gone rogue" without any apparent human cause, until we see strong evidence otherwise.

[1] In other comments I tried to explain how I order the probability of causes, and why.

[2] Other scenarios that are similarly as unlikely: foreign adversaries, "someone hacked my account", LLM sleeper agent, etc.

0 comments

TomasBM

No comments yet

Contribute on Hacker News ↗