Comment by Wowfunhappy

3 hours ago

> The AI notices that the data contains CSAM. Should it speak up? If no, that's an alignment failure. If yes, that's data bleeding through to behavior; exactly the thing SQL was trying to prevent with parameterized queries.

You can handle the CSAM at another level. There can be a secondary model whose job is to scan all data for CSAM. If it detects something, start whatever the internal process is for that.

The "base" model shouldn't arbitrarily refuse to operate on any type of content. Among other things... what happens if NCMEC wants to use AI in their operations? What happens if you're the DoJ trying to find connections in the unredacted Epstein files?

0 comments

Wowfunhappy

No comments yet

Contribute on Hacker News ↗