Comment by simonw
1 month ago
Sadly this has been tried before and doesn't work.
If an attacker can send enough tokens they can find a combination of tokens that will confuse the LLM into forgetting what the boundary was meant to be, or override it with a new boundary.
No comments yet
Contribute on Hacker News ↗