Comment by simonw

1 month ago

Sadly this has been tried before and doesn't work.

If an attacker can send enough tokens they can find a combination of tokens that will confuse the LLM into forgetting what the boundary was meant to be, or override it with a new boundary.