← Back to context

Comment by LinuxBender

4 years ago

Does this not encourage spawning a new arms race? New or modified apps that randomly change hashes of multimedia files as they are stored? If the CSAM DB is just simple hashes like sha256, md5/md4, etc.. then evading detection would be trivial. Or would Apple block applications that could rewrite randomized data into files? People don't have to be against CSAM to dislike something scanning their devices and many developers love puzzle challenges. I assume perhaps incorrectly that whatever app is doing the scanning could potentially also accept additional hash DB's, allowing Apple to enable categories to detect on per region. One of the iPhone emulators should facilitate reverse engineering the application.

The hash is a pictorial representation of the image, and not quite a checksum of the raw file data (like MD5 etc.). I would expect that even photos of printed photos would still have the same pictorial hash (if the photos are properly aligned), where obviously the cryptographic hash would be much different (since it's not an exact replica of the original image) but in the ML's eyes (bearing in mind the pictorial hash is generated through machine learning afaik), there would be a very strong match between visually similar images.

I suppose that it's a bit like when you do a reverse image search on your favourite search engine. When you upload an image, the engine will try and find images that the ML thinks look the same, even if the bits and bytes that make up the file are different. From what I can see, the similarity detection will be much more specific so as to not generate false positives. As you theorise though, it might be possible to modify images to evade detection if the hash's match specificity is high enough.

All bearing in mind that the pictorial hash also is supposedly designed to be a one-way function to ensure that those who know file hashes don't know what the original contents of the file are.