Comment by estearum
11 hours ago
One nice thing about humans for contexts like this is that they make a lot of random errors, as opposed to LLMs and other automated systems having systemic (and therefore discoverable + exploitable) flaws.
How many caught attempts will it take for someone to find the right prompt injection to systematically evade LLMs here?
With a random selection of sub-competent human reviewers, the answer is approximately infinity.
No comments yet
Contribute on Hacker News ↗