Comment by foolfoolz
1 day ago
i have such a hard time reconciling stuff like this:
> The census bureau decided to adopt differential privacy for the 2020 Census
and:
> The consequences will be dire for utility or for privacy, and possibly both. It's hard to understate this point: future statistical releases will either be useless compared to past ones, or they will be incredibly unsafe
so we took the census for centuries before this point, and it was “ok.” and for the last census only we added some privacy items. but if we remove just one of those filters, we are in “dire” circumstances? but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
this makes it feel like an emotional overblown problem
Believe it or not, mathematical techniques and computational power have increased in the past hundreds of years, not to mention the digitization of everything.
Privacy issues that weren’t possible before due to cost are now pennies to exploit. Also keep in mind as it points out people were using census data to drive gerrymandering efforts, so these attacks are real and have been going on for a long time.
I don’t understand why gerrymandering would require privacy violation, or how differential privacy would stop it.
Gerrymandering is most effective when you know exact voting patterns of each household so you can draw the lines to get the result you want. Differential privacy blurs those boundaries and provides more room for the partisan hacks to make a fatal mistake.
2 replies →
> but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
One notable thing we have today that we didn't have 100 years ago is a computer. Before, you could reasonably assume that recreating individual records wasn't feasible, at least not on a large scale. You can't assume that now. A 4 digit password was safe for hundreds of years, but it would be a security lability today for the same reason.
Computers and improvements in data science/machine learning are basically the entire explanation. A LOT of the techniques that we use today to de-anonymize data require computation power not previously available. Even when doable, resources limited scale. Source: statistics degree
(Also, linkage. There are more data sources to cross reference now with the internet and social media and web tracking and hacks - the record footprint of Americans even as recently as the 70s and 80s was dramatically lower!)
The concerns here, like most concerns about privacy, are hyperbolic hypothetical hypochondria, until they’re not.
> but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
If you are choosing hundreds of years ago, when we had no computers and internet, I wonder how we had worse privacy than the surveillance world today.
> so we took the census for centuries before this point, and it was “ok.”
Yes because we didn't have computers to unearth patterns in the data in a millisecond and politicians could have their career ended for doing the wrong thing, when revealed, instead of being rewarded for it.
For decades we were encrypting our communications with rsa, surely nothing is wrong with it?
There is nothing wrong with it, and RSA is still commonly used. In fact, RSA is better against quantum computers compared to ECC.
As the article clearly states, privacy features have been in the census since 1990. It is just that the previously used privacy feature was not very strong and could be defeated. So it was replaced by a stronger feature in 1920. Before 1990 the census. 1990 was when personal computers were being popularized and the computing power available to individuals exploded and so then it was possible to use computers to separate out individual information from the data the census publishes. So the issue came up then.
No it is not an overblown problem.
> so we took the census for centuries before this point, and it was “ok.”
It wasn't ok - it's been shown that the data released could individually identify people in releases before the 2010 Census.
As far as I recall they did have some measures in place. Differential privacy just made it a bit more robust.
Arguably the defaults for differential privacy are too robust but that is a different story.