Comment by cyanydeez
9 hours ago
The problem with all these LLM instructed security features is the `codeword` poison probability.
The way LLMs process instructions isn't intelligence as we humans know it, but as the probability that an instruction will lead to an output.
When you don't mention $HOME in the context, the probability that it will do anything with $HOME remains low. However, if you mention it in the context, the probability suddenly increases.
No amount of additional context will have the same probability of never having poisoned the context by mentioning it. Mentioning $HOME brings in a complete change in probabilities.
These coding harnesses aren't enough to secure a safe operating environment because they inject poison context that _NO_ amount of textual context can rewire.
You just lost the game.
No comments yet
Contribute on Hacker News ↗