Comment by danlitt
8 hours ago
> You can't guarantee an LLM does anything.
Agreed.
> But that doesn't mean that separation between instructions and data is impossible.
Yes it does! The comments you are replying to are concerned that it is not possible to be sure that data and instructions have been separated. With certain kinds of automated systems (traditional ones), unless you write them incorrectly, you can be sure of this. And it is possible to engage in a productive incremental process where mistakes can be identified and removed, in a way people comprehend and can plan around.
LLMs do not have this. They have heuristics and guesses. Nobody knows what will work ahead of time, nor even a probability that it will work. That is not a doomer comment by the way! The same is true when you talk to a person. But it is a fundamental limitation, it cannot be removed.
This is conflating different problems, in my opinion.
Can you make sure the instructions and data are separated and the machine follows only the instructions and doesn't change its behavior based on the data? No.
But the part that's impossible is not "the instructions and data are separated". The part that's impossible is "the machine follows only the instructions".
Separating instructions and data is not impossible, but it doesn't solve your problems.
One really important consequence of this is that even if the data doesn't have anything that looks like instructions, it can poison the machine anyway! If you get too focused on "instructions" then you miss that security flaw!
Even if you don't give the machine any data at all, it might not follow the instructions. It's not instruction/data conflation as the root cause, it's that instructions don't really work in the first place.