Comment by hedora

9 months ago

I’ve heard this claim, but they could use some sort of bloom filter pr cryptographic hashing to block profiles that contain previously-removed records.

There could also be a shared, trusted opt-out service that accepted information and returned a boolean saying “opt-out” or “opt-in”.

Ideally, it’d return “opt-out” in the no-information case.

Hash-based solutions aren't as easy as we might hope.

You store a hashed version of my SSN, or my phone number, to represent my opt-out? Someone can just hash every number from 000-00-0000 to 999-99-9999 and figure out mine from that.

You hash the entire contents of the profile - name+address+phone+e-mail+DOB+SSN - and the moment a data source provides them with a profile only containing name+address+email - the missing fields mean the hashes won't match.

A trusted third party will work a lot better IMHO.

And of course none of the data brokers have much reason to make opt-outs work well, in the absence of legislation and strict enforcement - it's in their commercial interests to say they "can't stop your data reappearing"

  • > Someone can just hash every number from 000-00-0000 to 999-99-9999 and figure out mine from that.

    That's what salts are for, right? It wouldn't be too hard to issue a very large, known, public salt alongside each SSN.

    > And of course none of the data brokers have much reason to make opt-outs work well, in the absence of legislation and strict enforcement - it's in their commercial interests to say they "can't stop your data reappearing"

    This is the actual reason, IMHO.

    • If the salt is public, what’s the point, then you can get all the salts, and combine them with every possible ssn, and you’re back where you were before.

      8 replies →

Yup, it would be trivial to create a one way hash of various attributes to perma ‘opt out’ someone.

But how would they keep making money that way?

So for a perfect match they'd need to have some sort of unique identifier that's present in the first set of data you ask them to remove, as well as being present in any subsequent "acquisitions" or "scrapes" of your data.

If these devs that scrape/dump/collate all this info are anything like the ones I've seen, and they're functioning in countries like the US and UK whereby you don't have individual identifiers that are pretty unique, then I'd say the chance of them being able to get such a "unique" key on you to remove you perpetually, is next to impossible. And if it's even close to being "hard", they'll not even bother. Doubley-so if this service/people/data is anything like the credit-score companies, which are notoriously bad at data de duplication and sanitation.

Likewise, if you want them to do some sort of removal using things other than a unique identifier, then you have to have some sort of function that determines closeness between the two records. From what I've heard, places like Interpol, countries' border-control and police agencies usually use name, surname and dob as a combination to match. Amazingly unique and unchanging combination, that one! /s