← Back to context

Comment by estearum

11 hours ago

One nice thing about humans for contexts like this is that they make a lot of random errors, as opposed to LLMs and other automated systems having systemic (and therefore discoverable + exploitable) flaws.

How many caught attempts will it take for someone to find the right prompt injection to systematically evade LLMs here?

With a random selection of sub-competent human reviewers, the answer is approximately infinity.