← Back to context

Comment by diggan

1 year ago

Clearly better than nothing, but how does it work with perceptual hashes? I gave it five minutes to try to get pHash to run locally but didn't manage to get any useful results from it, I was probably holding it wrong.

I’ve been working with perceptual hashes a lot lately for a side project, and my experience is that they are extremely resilient to noise, re-encoding, resizing, and some changes in color (since most implementations desaturate the image). Mirroring and rotation can in theory defeat perceptual hashing, but it’s fast enough to compute that if you care you can easily hash horizontal and vertically mirrored versions at 1 degree increments of rotation to identify those cases. Affine transformations can easily defeat some perceptual hashing algorithms, but others are resistant to them.

The big weakness is that most perceptive hashing algorithms aren’t content aware, so you can easily defeat them by adding or removing background objects that might not be noticed or considered meaningful by a human observer.

Could probably get one of the many repos up and running pretty quickly [1].

Potentially what you could do is generate smaller versions of the images, test their hash matching under different conditions against multiple algorithms and then pick the parameters where you get fewest hash collisions.

[1] https://github.com/JohannesBuchner/imagehash