Comment by veganmosfet

13 hours ago

It would be nice to publish the exact setup used (workspace dump, OpenClaw version, ...) to be able to reproduce and try out more payloads.

In general I have mixed feelings about this result: sure, opus4.6 is excellent at following user intent and recognise potential prompt injection attempts. But: Is the "security" prompt used realistic for a generic use-case (processing of emails)? I guess not.

In my experiments - without this specific prompt - I was able to derail the user intent to make opus4.8 download and execute a malicious script [0] just by asking "Summarize my new emails".

[0] https://itmeetsot.eu/posts/2026-06-04-openclaw_opus48/

3 comments

veganmosfet

cuchoi 9 hours ago

Thanks for sharing your article, very interesting.

I used https://github.com/openclaw/openclaw-ansible and configured a heartbeat (using Openclaw's terms) to check emails every hour. Had to do a bit more to make sure it had new context for every email.

e12e 5 hours ago

Nice write-up! I saw some earlier posts were submitted here, but not that one - so I tried submitting it:

https://news.ycombinator.com/item?id=48686947

veganmosfet 5 hours ago

Thanks! I tried to submit the posts but for some reason my submissions are not published in HN any more. I tried to reach out to HN admins but no response so far.