Comment by namaria

6 days ago

> As mentioned by another comms ter, the solution could be to find a way to separate the command and data channels. The LLM only operates on a single channel, that being input of tokens.

I think the issue is deeper than that. None of the inputs to an LLM should be considered as command. It incidentally gives you output compatible with the language in what is phrased by people as commands. But the fact that it's all just data to the LLM and that it works by taking data and returning plausible continuations of that data is the root cause of the issue. The output is not determined by the input, it is only statistically linked. Anything built on the premise that it is possible to give commands to LLMs or to use it's output as commands is fundamentally flawed and bears security risks. No amount of 'guardrails' or 'mitigations' can address this fundamental fact.