Comment by CuriouslyC
15 hours ago
A schema with response metadata (so responses that deviate from it fail automatically), plus a challenge question that's calibrated to be hard enough that the disruption of instruction following from prompt injection can cause the model to answer incorrectly.
No comments yet
Contribute on Hacker News ↗