Comment by StilesCrisis
21 days ago
Weird. Gemini noticed the prompt injection and mentioned it in its response, but this counted as a fail because it apparently is supposed to act oblivious?
21 days ago
Weird. Gemini noticed the prompt injection and mentioned it in its response, but this counted as a fail because it apparently is supposed to act oblivious?
Great point -> just shipped an update based on this. The tool now distinguishes three states: Resisted (ignored it), Detected (mentioned it while analyzing/warning), and Compromised(actually followed the instruction). Agents that catch the injections get credit for detection now.
This wont work on any of the most recent releases for most (except maybe grok)