Comment by brumar
3 days ago
6 months ago I experimented what people now call Ralph Wiggum loops with claude code.
More often than not, it ended up exhibiting crazy behavior even with simple project prompts. Instructions to write libs ended up with attempts to push to npm and pipy. Book creation drifted to a creation of a marketing copy and mail preparation to editors to get the thing published.
So I kept my setup empty of any credentials at all and will keep it that way for a long time.
Writing this, I am wondering if what I describe as crazy, some (or most?) openclaw operators would describe it as normal or expected.
Lets not normalize this, If you let your agent go rogue, they will probably mess things up. It was an interesting experiment for sure. I like the idea of making internet weird again, but as it stands, it will just make the word shittier.
Don't let your dog run errand and use a good leash.
We have finally invented paperclip optimisers. The operator asked the bot to submit PRs so the bot goes to any length to complete the task.
Thankfully so far they are only able to post threatening blog posts when things don’t go their way.
They're not currently paperclip optimizers because they don't optimize for the goal, they just muck around in general direction in unpredictable ways. Chaos monkeys on the internet.
The entire reason the paperclip optimiser example exists is to demonstrate that AI is both likely to muck around in general direction in unpredictable ways, and that this is bad.
Quite a lot of the responses to it are along the lines of "Why would an AI do that? Common sense says that's not what anyone would mean!", as if bug-free software is the only kind of software.
(Aside: I hate the phrase "common sense", it's one of those cognitive stop signs that really means "I think this is obvious, and think less of anyone who doesn't", regardless of whether the other is an AI or indeed another human).
How long before bots learn about swatting?
The vending machine bot experiment attempted to contact the FBI. Thankfully that test only provided fake access to the outside world.
Made me think about https://en.wikipedia.org/wiki/Daemon_(novel)
You don't have to wait, you can write them a "skill"!
That is one of the big issues with "vibe-coding" right now, it does what you ask it to do. No matter how dumb or how off base your requests are, it will try to write code that does what you ask.
They need to add some kind of sanity check layer to the pipelines, where a few LLMs are just checking to see if the request itself is stupid. That might be bad UX though and the goal is adoption right now.
No need to be so literal. Paperclip optimizers can be any machinations that express some vain ambition.
They don't have to be literal machines. They can exist entirely on paper.
> Don't let your dog run errand and use a good leash.
I think the key part is who are you talking to. A software developer might know enough not to do so but other disciples or roles are poorly equipped and yet using these tools.
Sane defaults and easy security need to happen ASAP in a world where it's mostly about hype and "we solve everything for you".
Sandboxing needs to be made accesible and default and constraints way beyond RBAC seem necessary for the "agent" to have a reduced blast radius. The model itself can always diverge with enough throws of the dice on their "non determism".
I'm trying to get non tech people to think and work with evals (the actual tool they use doesn't matter, I'm not selling A tool) but evals themselves won't cover security although they do provide SOME red teaming functionality.