Comment by ant-kinesthetic

4 hours ago

How many of the attacks would have been successful if they were in longer horizon scenarios. If your agent wasn't responding back this is a purely one-shot prompt injection test which I think is not where the vulnerabilities usually lie. I think several slights attempts over time might be able to break even the most recent Opus level models. At some point its out of distribution and weird things start happening

0 comments

ant-kinesthetic

No comments yet

Contribute on Hacker News ↗