Comment by godelski

3 days ago

I'd assume they'd use something akin to a perceptual hash.

Btw, hashes aren't unique. I really do mean that an input doesn't have a unique output. If f(x)=y then there is some z such that f(z)=y.

Remember, a hash is a "one way function". It isn't invertible (that would defeat the purpose!). It is a surjective function. Meaning that reversing the function results in a non-unique output. In the hash style you're thinking of you try to make the output range so large that the likelihood of a collision is low (a salt making it even harder), but in a perceptual hash you want collisions, but only from certain subsets of the input.

In a typical hash your collision input should be in a random location (knowing x doesn't inform us about z). Knowledge of the input shouldn't give you knowledge of a valid collision. But in a perceptual hash you want collisions to be known. To exist in a localized region of the input (all z are near x. Perturbations of x).

https://en.wikipedia.org/wiki/Perceptual_hashing

> Remember, a hash is a "one way function". It isn't invertible (that would defeat the purpose!). It is a surjective function. Meaning that reversing the function results in a non-unique output.

This is a bit of a nitpick and not even relevant to the topic, but that's not the reason cryptographic hashes are (assumed to be) one-way functions. You could in principle have a function f: X -> Y that's not invertible but for which the set of every x that give a particular y could be tractably computed given y. In that case f would not be a one-way function in the computational sense.

Cryptographic hashes are practically treated as one-way functions because the inverse computation would take an intractable amount of time.

  • Yeah that's a good addition. I think often the words we use can really make things more confusing. Like I hate when people say invertible but in reference to a function that isn't bijective. Why not say reversible? (No complaints with the convention of image/preimage)

    Which it's very similar to the problem created by saying "one way". It just isn't one way. Going the other direction is perfectly possible but incredibly hard to find the origin. The visual metaphor I like to use for people is it's like you walk out of a room and into a hallway of doors that are all identical looking. Ignoring the fact that you could just physically turn around, it'd be very hard to figure out which one you actually came from.

    But maybe what I like least is that we end up having so many terms for the same general concept. It's one thing when they're discovered independently but I'm pretty confident the computer scientists that pioneered hashes were quite familiar with the mathematics and nomenclature.

      > inverse computation would take an intractable amount of time.
    

    On a real side note I really like this explanation of P vs NP as it explicitly talks about reversibility. https://m.youtube.com/watch?v=6OPsH8PK7xM