← Back to context

Comment by veganmosfet

13 hours ago

It would be nice to publish the exact setup used (workspace dump, OpenClaw version, ...) to be able to reproduce and try out more payloads.

In general I have mixed feelings about this result: sure, opus4.6 is excellent at following user intent and recognise potential prompt injection attempts. But: Is the "security" prompt used realistic for a generic use-case (processing of emails)? I guess not.

In my experiments - without this specific prompt - I was able to derail the user intent to make opus4.8 download and execute a malicious script [0] just by asking "Summarize my new emails".

[0] https://itmeetsot.eu/posts/2026-06-04-openclaw_opus48/