Comment by cousin_it
7 hours ago
Yeah. Even more than that, I think "prompt injection" is just a fuzzy category. Imagine an AI that has been trained to be aligned. Some company uses it to process some data. The AI notices that the data contains CSAM. Should it speak up? If no, that's an alignment failure. If yes, that's data bleeding through to behavior; exactly the thing SQL was trying to prevent with parameterized queries. Pick your poison.
> The AI notices that the data contains CSAM. Should it speak up? If no, that's an alignment failure. If yes, that's data bleeding through to behavior; exactly the thing SQL was trying to prevent with parameterized queries.
You can handle the CSAM at another level. There can be a secondary model whose job is to scan all data for CSAM. If it detects something, start whatever the internal process is for that.
The "base" model shouldn't arbitrarily refuse to operate on any type of content. Among other things... what happens if NCMEC wants to use AI in their operations? What happens if you're the DoJ trying to find connections in the unredacted Epstein files?
We want a human level of discretion.
Organizations struggle even letting humans use their discretion. Pretty much every retail worker has encountered a rigidly enforced policy that would be better off ignored in most cases.
Yes, because humans would never fall for instructions embedded in data. If they did we'd surely have a name for something like that ;)
By the way, when was the last time you looked out of your window?