← Back to context

Comment by fullspectrumdev

9 months ago

A lot of these data brokers hold wildly inaccurate information.

You too can be a data broker!

    for (i = 0; i < 900000000; i++)
        insert(first: random_firstname(), last: random_lastname(), ssn: i);

Does anyone really really care if the name is accurate if the SSN is present? More than half of the SSNs in the above dataset are valid.

  • You probably are posting this as a joke, but without a clear technical solution to this problem, flooding the industry with bullshit data seems like a great avenue.

    • I have a silly standup joke along these lines, about how I'd Google things crazy things like "circus lawyer" or "giraffe mitigation tactics" to throw the algorithm off every now and then.

      4 replies →

    • that was the idea behind certain applications and add-ons that would browse around to popular websites and randomly click ads so that marketers couldn't tell your actual interests from fake ones.

      Unfortunately that strategy is deeply flawed and dangerous because nobody cares if the data they have on you is accurate or not. They still can, and still will, use it against you at every opportunity. Every scrap of data they have, accurate or not, can be used to hurt you.

      The only way to flood data brokers with garbage data that can't hurt anyone is to fill it with entirely fictitious people who somehow can't be mistaken for any actual people. Even that runs the risk of hurting real people though. For example, an insurance company might go to a data broker and ask for the number of people within a certain neighborhood or zip code who bought fast food more than once a week in the last year and how many have a gym membership. If the number of frequent fast food buyers is higher than it was last year and/or the number of gym members is lower the insurance company might decide to raise the rates of every single member within that neighborhood or zip code. Even fake people could skew those numbers if their fake data said they lived in those zip codes or neighborhood and ate out a lot or didn't have a gym membership. Indirectly, the fact person is mistaken for being a real one in that community.

      The best way to deal with data brokers is to regulate them with strong data protection laws. Anything you give them risks hurting someone and gives them another data point to sell.

      7 replies →

    • That has been my strategy for the last decade or so, Unless I have a solid reason to I never use my real name when placing orders and generally never the same fake name twice, always use a virtual credit card, if it's a non-physical product I don't even use my real address. I have some old phones I throw pre-paid sim cards into when I need to do number confirmation. The goal is to create a little consistent linkable data to me and at least generate some noise in all these data broker collection processes.

      1 reply →

  • In fact there are far fewer valid Socials. They follow a system where guessing a number of digits is fairly determined based on year and state of birth

    • This is not exactly true; the system _used_ to have a geographic component but SSNs issued since 2011 are random.

      (Granted, most people here with an SSN should be older than that.)

Yes, but they can also be pretty accurate.

While I have never dealt with one of the paid services someone ran one on me as an example of what is out there (nothing malicious about it) and just about everything on it was accurate or close to it. Only one thing on it wasn't at least pretty close to the truth--it had me living in a state I've never set foot in. And quite a few other people seemed to have the same address at one point or another.