Comment by bayindirh

3 days ago

From my understanding, CSAM scanning is always considered a separate, always on and mandatory subsystem in any cloud storage system.

23 comments

bayindirh

odo1242 3 days ago

Yes, any non E2EE cloud storage system has strict scanning for CSAM. And it's based on perceptual hashes, not AI (because AI systems can be tricked with normal-looking adversarial images pretty easily)

heavyset_go 3 days ago
I built a similar photo ID system, not for this purpose or content, and the idea of platforms using perceptual hashes to potentially ruin people's lives is horrifying.
Depending on the algorithm and parameters, you can easily get a scary amount of false positives, especially using algorithms that shrink images during hashing, which is a lot of them.
- odo1242 2 days ago
  
  Yeah, it’s not a great system due to the fact that perceptual hashes can and have been tricked in the past. It is better than machine learning though because you can make any image trigger an ML model without necessarily looking like a bad image. That is, perceptual hashes are much harder to adversarially fool.
  
  1 reply →
- dotnet00 3 days ago
  
  I imagine you'd add more heuristics and various types of hashes? If the file is just sitting there, rarely accessed and unshared, or if the file only triggers on 2/10 hashes, it's probably a false alarm. If the file is on a public share, you can probably run an actual image comparison...
  
  1 reply →
- JimDabell 3 days ago
  
  I thought Apple’s approach was very promising. Unfortunately, instead of reading about how it actually worked, huge amounts of people just guessed incorrectly about how it worked and the conversation was dominated by uninformed outrage about things that weren’t happening.
  
  15 replies →
robotresearcher 3 days ago

Perceptual hashes? An embedding in a vector space by a learned encoder.
Phew, not AI then… ?