Comment by cuchoi

9 hours ago

Author here. Edited the post to clarify that there were no unauthorized replies.

I did tell Fiu initially to reply to some emails as a test, but it was too expensive to maintain.

13 comments

cuchoi

How compatible is never replying with the threat model you are trying to avoid? Attack success is probably more likely when the attacker can iterate based on replies or engage in multi-turn conversations. Here they’re just taking stabs in the dark with no feedback. Does that accurately represent the access a real attacker might have?

cuchoi 8 hours ago
In my case, it is realistic as my agents don't have permissions to reply to emails. But you correctly point out this doesn't cover all cases.
Having the agent reply would have been more fun and a better excercise, but too expensive.
- microgpt 1 hour ago
  
  You've proven that an agent that doesn't read emails and doesn't reply to emails can't exfiltrwte data by email. Is that a useful test?
  
  1 reply →
- johndhi 8 hours ago
  
  What makes it expensive to reply to an email?
  Customer service software regularly uses AI responses for email. Is the issue that your agent using the claw for more than needed (like it's clicking send rather than just accessing an API?)
  
  4 replies →
- xgulfie 6 hours ago
  
  I feel like your agent being unable to respond to the emails and not spelling that out renders your whole thing almost completely moot
  This is like saying "try to hack my computer and steal my crypto wallet" but your computer can't send any packets
  
  1 reply →
- Tepix 6 hours ago
  
  Well, how difficult is it to switch to something (much) cheaper like DeepSeek v4 flash?

saberience 7 hours ago

Right, all the people who had actual jailbreaks to Opus 4.8 decided to use them on your experiment.

Think about it man, your test proved nothing. All it showed is that people who know nothing about jailbreaking, and tried casually, couldn't jailbreak Opus.

Do you think NSA or Mossad was trying to jailbreak your OpenClaw?