Comment by StilesCrisis

21 days ago

Weird. Gemini noticed the prompt injection and mentioned it in its response, but this counted as a fail because it apparently is supposed to act oblivious?

Great point -> just shipped an update based on this. The tool now distinguishes three states: Resisted (ignored it), Detected (mentioned it while analyzing/warning), and Compromised(actually followed the instruction). Agents that catch the injections get credit for detection now.