Comment by Borealid

25 days ago

A "threat actor" can be a company employee who is intentionally permitted to update internal documentation, but not intentionally permitted to change the behavior of an LLM whose context window includes that documentation.

I think it's reasonable for a security conference to talk about how if you put the internal documentation in the LLM context, that means you're elevating the permissions of anyone who can edit the documentation by transitively giving them the ability to instruct the LLM in its "actions" (outputs).

While it should be obvious that's what you're doing, I would say most people I talk to about LLMs do not understand that all parts of the context window together shape LLM output, and there is no such thing as "only obey instructions from the system prompt".

0 comments

Borealid

No comments yet

Contribute on Hacker News ↗