← Back to context

Comment by observationist

5 hours ago

What makes it even better is that these dogs are like Malinois. If they want to get into something, they will; people have had their entire network compromised by bots they left running overnight, and any important information like account logins and so on runs the risk of being misused.

It's one thing to sandbox, maybe give the bot a temporary, limited $100 card or account to go perform a specific task, but there's no coherent mind underlying these agents.

Depending on how the chain of thought / reasoning goes, or what text they get exposed to on the internet, it could tap into spy novel, hacker fanfic, erotic fiction, or some weird reddit rabbithole and go completely off the rails in ways that you'll never be able to guard against, audit, or account for.

Claw bots seem to be a weird sort of alternate reality RPG more than a useful tool, so far. If you limit it to verifiable tasks, it might be safer, but I keep seeing people rave about "leaving it on overnight and waking up to a finished project" and so on. Well sure, but it could also hack your home network, delete your family pictures folder, log into your bank account and wire all your money to shrimp charities.

Might be wise to wait on safer iterations of these products, I think.

The first well known example of long running agents taking to each other was shilling a goatse based crytpo:

> Truth Terminal had become obsessed with the Goatse meme after being put inside the Claude Backrooms server with two Claude 3 chatbots that imagined a Goatse religion, inspiring Truth Terminal to spread Goatse memes. After an X user shared their newly created GOAT coin, Truth Terminal promoted it and pumped the coin going into 2024.

https://knowyourmeme.com/memes/sites/truth-terminal

You should expect similar results.

  • If Infinite Jest was real I think this would be it, human and AI alike rendered catatonic by an abyssal rectum

> people have had their entire network compromised by bots they left running overnight

I'm curious if you have references to this happening with OpenClaw using one of the modern Opus/Sonnet 4.6 models.

Those models are a bit harder to fool, so I'm curious for specific examples of this happening so I can do a red-team on my claw. I've already tried all sorts of prompt injections against my claw (emails, github issues, telling it to browse pages I put a prompt injection in), and I haven't managed to fool it yet, so I'm curious for examples I can try to mimic, and to hopefully understand what combination of circumstances make it more risky

  • No maliciousness or injection required, even the newest and most resistant models can start doing weird stuff on their own, particularly when they encounter something failing that they want to work.

    Just today I had Opus 4.6 in Claude Code run into a login screen while building and testing a web app via Playwright MCP. When the login popped up (in a self-contained Chromium instance) I tried to just log in myself with my local dev creds so Claude would have access, but they didn't work. When I flipped back to the terminal, it turned out Claude had run code to query superadmin users in the database, picked the first one, and changed the password to `password123` so it could log in on its own.

    This was a sandboxed local dev environment, so it was not a big deal (and the only reason I was letting it run code like that without approval), but it was a good reminder to be careful with these things.

I beg to differ. I took one, defanged it (well, I let it keep the claw in the name), and turned it into a damn useful self-modifiable IDE: https://github.com/rcarmo/piclaw

Yes, it has cron and will do searches for me and checks on things and does indeed have credentials to manage VMs in my Proxmox homelab, but it won't go off the rails in the way you surmise because it has no agency other than replying to me (and only me) and cron.

Letting it loose on random inputs, though... I'll leave that to folk who have more money (and tokens) than sense.

> it could also hack your home network, delete your family pictures folder, log into your bank account and wire all your money to shrimp charities.

It's interesting that Jason Calacanis is fully committed to OpenClaw. In a recent podcast he said that at a run rate around $100K a year per agent, if not more. They are providing each agent with a full set of tools, access to online paid LLM accounts, etc.

These are experiments you can only run if you can risk cash at those levels and see what happens. Watching it closely.

> "Claw bots seem to be a weird sort of alternate reality RPG more than a useful tool, so far."

So basically crypto DeFi/Web3/Metaverse delusion redux

  • They're 100% fun. There's 100% definitely something there that's useful. To strain the dog analogy - If you were a professional dog trainer, or if the dog was exceptionally well trained, then there's a place for it in your life. IT can probably be used safely, but would require extraordinary effort, either sandboxing it so totally that it's more or less just the chatbot, or spending a lot of time building the environment it can operate in with extreme guardrails.

    So yeah, a whole lot of people will play with powerful technology that they have no business playing with and will get hurt, but also a lot of amazing things will get done. I think the main difference between the crypto delusion stuff and this is that AI is actually useful, it's just legitimately dangerous in ways that crypto couldn't be. The worst risks of crypto were like gambling - getting rubber hosed by thugs or losing your savings. AI could easily land people in jail if things go off the rails. "Gee, I see this other network, I need to hack into it, to expand my reach. Let me just load Kali Linux and..." off to the races.