Comment by alexbecker
1 day ago
I'm working on _prompt injection_, the problem where LLMs can't reliably distinguish between the user's instructions and untrusted content like web search results.
Just published a blog post a few minutes ago: https://alexcbecker.net/blog/prompt-injection-benchmark.html
Good post. Thanks for sharing. I enjoyed it as much as I enjoyed your anime list. I agree on many.