← Back to context

Comment by heavyset_go

3 days ago

I built a similar photo ID system, not for this purpose or content, and the idea of platforms using perceptual hashes to potentially ruin people's lives is horrifying.

Depending on the algorithm and parameters, you can easily get a scary amount of false positives, especially using algorithms that shrink images during hashing, which is a lot of them.

Yeah, it’s not a great system due to the fact that perceptual hashes can and have been tricked in the past. It is better than machine learning though because you can make any image trigger an ML model without necessarily looking like a bad image. That is, perceptual hashes are much harder to adversarially fool.

  • I agree, and maybe I'm wrong, but I see a similarity between phash quantization and DCT and ML kernels. I think you could craft "invisible" adversarial images similarly for phash systems like you can ML ones and the results could be just as bad. They'd probably replicate better than adversarial ML images, too.

    I think the premise for either system is flawed and both are too error prone for critical applications.

I imagine you'd add more heuristics and various types of hashes? If the file is just sitting there, rarely accessed and unshared, or if the file only triggers on 2/10 hashes, it's probably a false alarm. If the file is on a public share, you can probably run an actual image comparison...

  • A lot of classic perceptual hash algorithms do "squinty" comparisons, where if an image kind of looks like one you've hashed against, you can get false positives.

    I'd imagine outside of egregious abuse and truly unique images, you could squint at a legal image and say it looks very much like another illegal image, and get a false positive.

    From what I'm reading about PhotoDNA, it's your standard phashing system from 15 years ago, which is terrifying.

    But yes, you can add heuristics, but you will still get false positives.

I thought Apple’s approach was very promising. Unfortunately, instead of reading about how it actually worked, huge amounts of people just guessed incorrectly about how it worked and the conversation was dominated by uninformed outrage about things that weren’t happening.

  • Among many many issues: Apple used neural networks to compare images, which made the system very exploitable. You could send someone an image where you invisibly altered the image to trip the filter, but the image itself looked unchanged.

    Also, once the system is created it’s easy to envision governments putting whatever images they want to know people have into the phone or changing the specificity of the filter so it starts sending many more images to the cloud. Especially since the filter ran on locally stored images and not things that were already in the cloud.

    Their nudity filter on iMessages was fine though (I don’t think it ever sends anything to the internet? Just contacts your parents if you’re a minor with Family Sharing enabled?)

    • > once the system is created it’s easy to envision governments putting whatever images they want to know people have into the phone

      A key point is that the system was designed to make sure the database was strongly cryptographically private against review. -- that's actually where 95% of the technical complexity in the proposal came from: to make absolutely sure the public could never discover exactly what government organizations were or weren't scanning for.

  • > Unfortunately, instead of reading about how it actually worked, huge amounts of people just guessed incorrectly about how it worked

    Folks did read. They guessed that known hashes would be stored on devices and images would be scanned against that. Was this a wrong guess?

    > the conversation was dominated by uninformed outrage about things that weren’t happening.

    The thing that wasn't happening yet was mission creep beyond the original targets. Because expanding-beyond-originally-stated-parameters is thing that happens with far reaching monitoring systems. Because it happens with the type of regularity that is typically limited to physics.

    There were 2ndary concerns about how false positives would be handled. There were concerns about what the procedures were for any positive. Given Gov propensities to ruin lives now and ignore that harm (or craft a justification) later, the concerns seem valid.

    That's what I recall the concerned voices were on about. To me, they didn't seem outraged.

    • > Folks did read. They guessed that known hashes would be stored on devices and images would be scanned against that. Was this a wrong guess?

      Yes. Completely wrong. Not even close.

      Why don’t you just go and read about it instead of guessing? Seriously, the point of my comment was that discussion with people who are just guessing is worthless.

      5 replies →

  • Sorry, but you're relaying a false memory. Conversation on the subject on HN and Reddit (for example) was extremely well informed and grounded in the specifics of the proposal.

    Just as an example, part of my responses here were to develop and publish a second-preimage attack on their hash function-- simply to make the point concrete that varrious bad scenarios would be facilitated by the existence of one.

  • > instead of reading about how it actually worked, huge amounts of people just guessed incorrectly about how it worked and the conversation was dominated by uninformed outrage

    I would not care if it worked 100% accurately. My outrage is informed by people like you who think it is OK in any form whatever.