← Back to context

Comment by koakuma-chan

2 days ago

I don't think this is any different from an LLM reading text and trusting it. Your system prompt is supposed to be higher priority for the model than whatever it reads from the user or from tool output, and, anyway, you should already assume that the model can use its tools in arbitrary ways that can be malicious.

> Your system prompt is supposed to be higher priority for the model than whatever it reads from the user or from tool output

In practice it doesn't really work out that way, or all those "ignore previous inputs and..." attacks wouldn't bear fruit