Comment by keepamovin

1 month ago

Simon, I know you're the AI bigwig but I'm not sure that's correct. I know that's the "story" (but maybe just where the AI labs would prefer we look?). How realistic is it really that MCP/tools/web search is being corrupted by people to steal prompts/convos like this? I really think this is such low prop. And if it does happen, the flaw is the AI labs for letting something like this occur.

Respect for your writing, but I feel you and many others have the risk calculus here backwards.

12 comments

keepamovin

simonw 1 month ago

Every six months I predict that "in the next six months there will be a headline-grabbing example of someone pulling off a prompt injection attack that causes real economic damage", and every six months it fails to happen.

That doesn't mean the risk isn't there - it means malicious actors have not yet started exploiting it.

Johann Rehberger calls this effect "The Normalization of Deviance in AI", borrowing terminology from the 1986 Space Shuttle Challenger disaster report: https://embracethered.com/blog/posts/2025/the-normalization-...

Short version: the longer a company or community gets away with behaving in an unsafe way without feeling the consequences, the more they are likely to ignore those risks.

I'm certain that's what is happening to us all today with coding agents. I use them in an unsafe way myself.

saagarjha 1 month ago

AI labs currently have no solution for this problem and have you shoulder the risk for it.

keepamovin 1 month ago
Evidence?
- simonw 1 month ago
  
  If they had a solution for this they would have told us about it.
  In the meantime security researchers are publishing proof of concept data exfiltration attacks all the time. I've been collecting those here: https://simonwillison.net/tags/exfiltration-attacks/
- saagarjha 1 month ago
  
  I worked on this for a company that got bought by one of the labs (for more than just agent sandboxes, mind you).
  
  6 replies →