Comment by Dylan16807
2 hours ago
You can't guarantee an LLM does anything. Custom data can often subvert the machine whether or not it's instructions.
But that doesn't mean that separation between instructions and data is impossible. You can format them in different ways, and you can prevent the output tokens from ever using instruction formatting.
> You can't guarantee an LLM does anything.
Agreed.
> But that doesn't mean that separation between instructions and data is impossible.
Yes it does! The comments you are replying to are concerned that it is not possible to be sure that data and instructions have been separated. With certain kinds of automated systems (traditional ones), unless you write them incorrectly, you can be sure of this. And it is possible to engage in a productive incremental process where mistakes can be identified and removed, in a way people comprehend and can plan around.
LLMs do not have this. They have heuristics and guesses. Nobody knows what will work ahead of time, nor even a probability that it will work. That is not a doomer comment by the way! The same is true when you talk to a person. But it is a fundamental limitation, it cannot be removed.
What we have is a machine trained on many old documents that takes one new document and dreams up stuff to append. The LLM algorithm cannot specially recognize contents as "instructions" to itself-the-author.
Even if special tokens are used absolutely perfectly (somehow avoiding escapes or ambiguities or reflected attacks) they are ultimately the same as highlighting all the parts of the document in different colors. You've saved the signal, but there's no mind to receive the intended meaning.
This means that your markers--while far more exclusive--ultimately exist on the same data-level as punctuation and using ? to indicate a question.
> you can prevent the output tokens from ever using instruction formatting
The right words may still outweigh the formatting around them, the same way that they can already outweigh other words around them.