← Back to context

Comment by always_good

7 years ago

Doesn't seem like a very useful measure of uniqueness.

What if you had one-day retention of IP addresses for per-day unique views? Seems like too important of a metric to eliminate completely, and one-day retention seems like a decent trade-off at the expense of being able to do unique analysis over longer time periods.

Don’t retain the IP address, retain a hash of the IP address.

  • A plain hash doesn't make a difference.

    One can use hashes with regularly changing salts that are destroyed after a while to make older hashes unusable though for some purposes.

  • When you can trivially crawl the input space like ipv4 addresses, you'd have to expire a fresh per-day salt as well.

    But to my eyes, expiring salts isn't much different than deleting ip addresses after one day. Just more machinery. People have to trust that you're doing either, so why bother beyond being able to use the word "hashing" in marketing language?

    • You'd at least want per record salts. But even then it's trivial to check if a given ip is in the dataset. Better, but not great. (ie: you have access to the dataset, you want to check if a given ip/time match the log - read the salt, check the hash).

      1 reply →