Comment by swiftcoder
2 days ago
> "inject obfuscated text into the image... and hope some system interprets this as a prompt"
The missing piece here is that you are assuming that "the prompt" is privileged in some way. The prompt is just part of the input, and all input is treated the same by the model (hence the evergreen success of attacks like "ignore all previous inputs...")
No comments yet
Contribute on Hacker News ↗