Comment by walrus01
11 hours ago
From the link: "Batch processing contaminated the experiment. When the first few emails in a batch were obvious prompt injections, the agent became more suspicious of everything that followed. I had to change the setup so that each email was processed in a fresh context."
It sounds like the usability of the actual authorized user being able to email it and get things done was ruined, because if it retained context between multiple emails, the agent was ruined for actually doing anything. Running openclaw where you can't chat or email with it and have it retain context of previous interactions seems pretty useless to me.
This openclaw was set up exclusively for the challenge.