Comment by marysol5
25 days ago
I was at an "AI Security" talk recently, that centred around "While we blindly will injest inputs to and from AI, and that's a security issue. There's nothing we can do, so just deal with the aftermath".
Including saying "If a threat actor updates your internal documentation, they can use that to influence the AI".
If a THREAT ACTOR IS UPDATING DOCUMENTATION, YOU'RE COMPROMISE!
We're not talking about "Wikipedia Vandals" here
A "threat actor" can be a company employee who is intentionally permitted to update internal documentation, but not intentionally permitted to change the behavior of an LLM whose context window includes that documentation.
I think it's reasonable for a security conference to talk about how if you put the internal documentation in the LLM context, that means you're elevating the permissions of anyone who can edit the documentation by transitively giving them the ability to instruct the LLM in its "actions" (outputs).
While it should be obvious that's what you're doing, I would say most people I talk to about LLMs do not understand that all parts of the context window together shape LLM output, and there is no such thing as "only obey instructions from the system prompt".
My first thought was in agreement, “do they not realize that docs are context, sometimes even prompts, for humans too?”
My second thought was “perhaps they’re just very forward-thinking”, and now I’m sad about the future again.