Comment by nialse
14 hours ago
It's not necessarily "ignoring" instructions, it's the ironic effect of mentioning something not to focus on, which produces focus on said thing. The classic version is: "For the next minute, try not to think about a pink elephant. You can think about anything else you like, just not a pink elephant."
Yes exactly. But for llms it's more that it's not really "thinking" about what it's saying per se, it's that it's predicting next token. Sure, in a super fancy way but still predicting next token. Context poisoning is real