Comment by bayindirh
3 days ago
From my understanding, CSAM scanning is always considered a separate, always on and mandatory subsystem in any cloud storage system.
3 days ago
From my understanding, CSAM scanning is always considered a separate, always on and mandatory subsystem in any cloud storage system.
Yes, any non E2EE cloud storage system has strict scanning for CSAM. And it's based on perceptual hashes, not AI (because AI systems can be tricked with normal-looking adversarial images pretty easily)
I built a similar photo ID system, not for this purpose or content, and the idea of platforms using perceptual hashes to potentially ruin people's lives is horrifying.
Depending on the algorithm and parameters, you can easily get a scary amount of false positives, especially using algorithms that shrink images during hashing, which is a lot of them.
Yeah, it’s not a great system due to the fact that perceptual hashes can and have been tricked in the past. It is better than machine learning though because you can make any image trigger an ML model without necessarily looking like a bad image. That is, perceptual hashes are much harder to adversarially fool.
1 reply →
I imagine you'd add more heuristics and various types of hashes? If the file is just sitting there, rarely accessed and unshared, or if the file only triggers on 2/10 hashes, it's probably a false alarm. If the file is on a public share, you can probably run an actual image comparison...
1 reply →
I thought Apple’s approach was very promising. Unfortunately, instead of reading about how it actually worked, huge amounts of people just guessed incorrectly about how it worked and the conversation was dominated by uninformed outrage about things that weren’t happening.
15 replies →
Perceptual hashes? An embedding in a vector space by a learned encoder.
Phew, not AI then… ?