I "enumerated" for the last census. Trust in my community was already not high* and I had lots of interesting encounters. I really believed the rather invasive data I was collecting with a friendly face would be used and handled responsibly. I feel for the poor souls that'll sign up to go door to door for 2030 now that the firewalls against weaponizing and monetizing all of our sensitive government data has been torn down, and even more for those that will volunteer information that can hurt them.
The comments that this rather expensive endeavour should just be about getting a head count are also amusing to me. The data collected was such an important baseline of common understanding, and this will not be a good thing for its future quality. I've grown very jaded now seeing all the things taken for granted in this country and lost or degraded recently with a whimper.
*: To be fair, they sent me specifically to places that didn't respond, so I was naturally led to believe that everyone in my region hated the government, ignored bizzarrely threatening fliers, or had recently moved and had no knowledge of the inhabitants (if any) during the census period.
> The comments that this rather expensive endeavour should just be about getting a head count are also amusing to me. The data collected was such an important baseline of common understanding, and this will not be a good thing for its future quality.
Even without considering the Census data products alone, Census demographic data underlies virtually all extrapolation from other survey research. Everything from national opinion surveys based on tens of thousands of respondents, to small community surveys. A Census product with the most diverse participation pays off almost infinitely for America. It benefits everyone from national newspapers to rural counties.
If the smallest communities lose what little trust remains in the privacy of the Census, they have the most to lose in all of these ways.
You're playing party politics. That's the risk you take: that the party has goals beyond your (dareisay naive) utopian ideals for civic engagement.
Parties are not universally evil, when I malign them in this way it is in full acknowlegment that organization is the nearly singular path to "effect on target" as regards society-scale politics. What I mean is the party per se becomes a superorganism that has always as its first priority self-preservation (a la homeostasis) and it is very worth remembering this when subsuming oneself into their structure.
The real decline started after Edward Snowden and all the information that came out about the NSA. It really sparked distrust in the government. Trying to get people to respond to surveys was already hard, why would those general people believe the Census Bureau is actually keeping their data safe? Doesn’t matter when it comes to laws and the constitution, if you work for an Agency. You are the government. Response rates keep going down, now we have attacks from the President on statistics about the economy. I’m a little cynical and I just assume they will continue to shrink the statistical agencies and make the statistics more useless (which is what this recent policy change does), and they will shift to the private industry. Even though the private industry cannot do the work in the Field that the government does.
I buy the argument that a functioning democracy requires the populace to believe that the government is honest, competent, and working in their interests. Watergate, Iran-Contra, and the Vietnam war (respectively) undermined those notions. As of ~2016, half of the US voting population had come of age after those events.
> The comments that this rather expensive endeavour should just be about getting a head count are also amusing to me
Countries conduct censuses so they can understand, in great detail, what is going on with the people who make up the country.
With this accurate information, improvement plans can be made, and life can be improved for everyone.
The comments about just making it a head count give a very interesting window into the mentality of many these days.
They don’t want to - it can’t fathom how to - make life better.
Pretty sad, in my opinion. In my ideal the state should have visibility into the shape of the people present so that we can make good decisions about our combined organization. I think we’re making a mistake we will come to regret by intentionally damaging our data collection infrastructure.
I think a large amount of the US’s success is the result of good institutions handling granular data. Policies can be adjusted to match outcomes more rapidly than otherwise.
I understand why people decide to diminish all state capacity - they feel that governments are populated by their opponents who will use state capacity against them. But as our relative strength wanes, our ability to overcome these forces of inertia does as well. And then our governments become less capable and eventually life starts getting worse.
We don’t need house-level data immediately (except perhaps in order to place census blocks within their appropriate congressional district etc). But there are aggregation units above which we should be using as good information as we possibly could be.
This does nothing to make government less powerful.
It just makes government stupider so even if they decide they want to do the right thing, now they can’t because they don’t have the information needed to make effective decisions.
No, it gives them data to attack specific groups of people that were previously anonymized. The two options are less granular data, or data that can be abused.
There is no question the end goal is data that can be abused, and anyone left who would protest their actions will be fired and replaced with more sycophants.
> In my ideal the state should have visibility into the shape of the people present so that we can make good decisions about our combined organization.
That ideal became tantamount to enabling genocide when the US government breached the confidentiality of the census in order to prison camp the japanese on the basis of their race.
> I understand why people decide to diminish all state capacity
It's not even just a question of "all". The state should have the absolute minimum capacity to carry out its necessary tasks. Collecting race (just to give one example of many) of any form is not absolutely necessary and so it should not be done.
> they feel that governments are populated by their opponents who will use state capacity against them
Because they may be in the future. -- but even that is too strong, the greatest harm perpetrated by state actors has consistently come from trying to "help" rather than intentionally malicious acts.
Replying to a dead comment that demanded an example, for example, Mao's mass killing of somewhere on the order of 30-40 million people famine (in addition to the million straight up murdered in the cultural revolution) created as a result of "helping" through planned economy food distribution and the Eliminate Sparrows Campaign.
People only kill at a truly massive scale because they believe they are doing something good or at least necessary (even in war, but especially outside of war). This is why hoping states aren't evil isn't sufficient-- in fact it may induce mass murder, because what could be less evil than to Do the Right thing.
The universal cure is to distribute power and influence in as many ways as practicable, such that the damage from erroneous thinking is contained.
If they follow the rules, preserving privacy via cruder methods, the data will be much more damaged.
For any particular level of privacy, the banned methods can give you more accurate data. For any particular level of accuracy, the banned methods can give you better privacy.
The only way we're getting more accurate data is if the new rule causes them to largely abandon privacy. That would be bad. Harm for no benefit.
TFA lays out why things don't work that way. If you erode trust in the privacy of census responses, an awful lot of folks will have to start lying on their census
Whatever you do, there is a level of trust that is assumed when census takes place. The trust that this data is then not identified in a way that could be targeted for scams, frauds, and other such evils. But in NY, house sale records are made public but much to the detriment, many mortgage companies fake a bill for payment.
Differential privacy is absolutely necessary, and the social scientists being unable to reconstruct the data at an individual level is intended. A macroscopic description is rather enough for most purposes, and anything more is asking for a surveillance state.
In Ohio (or at least my county) the deed and mortgage are public record. As is a record when the mortgage is paid off. Interestly also property tax charges and payments are, too
> But in NY, house sale records are made public but much to the detriment, many mortgage companies fake a bill for payment.
That frankly sounds more like a failure of enforcement, on top of a failure of the construction of the financial system. Here in Germany, it is absolutely not a common thing that mortgages or the banks holding them get sold like a hot potato towards some other sucker, and thus such a letter would cause immediate suspicion.
Here in Germany, founding a company creates a public record. There are a number of companies who then send all newly formed companies an invoice that looks like a legitimate invoice for expenses related to creating said company, but on closer inspection actually contain a dense paragraph of text that details that this is not an invoice at all, merely an offer you accept by paying. Quite possibly even a subscription
It's a well-known trick, our notary warned us that these letters would come and we should scrutinize any invoice for a while. But they manage to skirt at the edge of legality
An earlier article on the same blog had some very useful information on how easily the aggregated census data from 2010 (before differential privacy) could be used to reconstruct real data for individuals: https://desfontain.es/blog/us-census-reconstruction-attack.h...
I am personally convinced that the reason noise infusion was banned was because powerful people were already reconstructing individual data from census for the purpose of gerrymandering, and they wanted to continue gerrymandering.
To optimally crack a district to bias in one parties favour it is often required to literally run a boundary down a street to separate one side (close to a university, say) from the other.
Once you've table voter preferences to actual street addresses you are no longer in the realm of "broad area cumulative averages and medians".
Or worse. It allows anyone to build really targeted data sets. Insurance companies would love such data, and many of those will use them without scruples.
Ban it from the dataset, add it to the analysis. You can choose your own flavor of noise.
I don't know what the political undertones are here, but at some level you need to have actual ground truth, including "this person/household declined".
Publishing raw data though? That seems like shooting yourself in the foot from a national security perspective, not to mention all the other reasons not to do it.
Sorry, I think you're reading more into this than I intended to say. My point was that the raw data itself doesn't need noise, but the published data necessarily does.
The replies here arguing we should publish it all are wild in the worst kind of first-order thinking way.
It’s a census: it just asks questions.
If you start publishing and weaponizing the data against people with various attributes, they’ll just lie or not answer. And then you are left with worse than nothing: bad data people try to act on.
You first gather the data while people don't know or care. Then you weaponize it later. It happened at least once not long ago in another country, seems not overreaction to be concerned about it
It happened a year ago in this country, with IRS sharing data with ICE (breaking a longstanding policy of keeping taxpayer data private within the government).
If this is a Nazi reference, Census data was used to send people to concentration camps here during the same era. Less awful than death camps, at least.
The US Government is the entity that weaponizes the data. The most obvious example is the Census Bureau compiling lists of people of Japanese descent to imprison during WWII. That's just the most obvious one that I know of without looking up more.
The real push for this now is to form lists of people to disenfranchise.
There is a significant movement in conservative circles that "the census should literally only be a count". this could be a wedge to prevent detailed demographic data collection by the government
and implicitly force them to sell the land they own for less then it's worth, which in combination with setting up very messed up tax related laws in some states (1) which highly benefit you if you bought land longer in the past effectively "killed" a budding, wealthy, land owning Asian community and made sure it can't really regrow in that form.
(1): I think it was mainly California, but don't remember full
The easy solution is to just reduce the resolution and scope of the data to the degree it is absolutely necessary. The census exists to inform representation decisions. All other concerns are addons. You can have all the data on the county or voting district level and strip data as you increase your resolution, to the point you only keep population number at the neighborhood, block level.
Knowing the racial, ethnic and socioeconomic background of the residents of a single building block is only useful to discriminate against them.
Demographic information is useful for medical, financial, educational, and so many other items.
The current admin doesn’t need it to discriminate, you can just access cameras and license plate readers and target easily that way.
The purpose is to scare people into misstating or obscuring data to reduce total house representation for an area. It’s to win votes, there are much better ways to do all these things than use this data, but effecting the vote with limited impact is a huge money savings.
The real question is why anyone answers these questions in the first place? I just wait until a census worker shows up and tell them how many people live at my domicile. It's needed for proper electoral representation and absolutely nothing else.
Any use to identify where government resources are best used, will have people thinking they should have gotten more and would have if they'd answered differently. Ie, that their answers were "weaponized" against them.
I guess the way to optimize is to find an equilibrium between an extreme of specificity and an extreme of vagueness that's still actionable from a high-level policy perspective.
Something about this conversation is fundamentally broken if there's no space to iterate towards optimization and instead it's just swinging between maximalist extremes.
This might be the point. As long as they think the people who end up under-counted are not people this government would like to have voting power for the House of Representatives.
Imagine the weaponization possibilities when combining the census data with Amazon’s and Meta’s data, and possibly several other datasets readily available to this administration. Whatever is missing from one of them can be inferred or defined from the others. This might already be happening, it can’t be checked. Some (former) dictators would be salivating.
Extremists or in general any fraction willing to engage in systematic discrimination, harassment, terrorizing or similar love highly detailed non anonymized census data.
Why?
Because it gives them the perfect layout for which areas to harass (areas likely to yield), which to brutalize (areas unlikely to yield or from especially "hated" people), which to best not touch which (areas with too much influence/money or likely to contain hidden sympathizers), which to systematically take apart through other means like building a highway through them (e.g. "hated" communities to strong/connected to brutalize). etc. etc.
All of this has a lot of history weather it's from right extremists like fascists or left extremists(1).
At which point the question is, if the data you collect is that abuseable. Should you even collect it? Is it even really needed?
(1): Like actual left extremists, the a lot of US sources have the habit to label people as left extremists which by EU standards sometimes aren't even left (but centrist) and very far away from extremism...
You can’t completely trust what people say anyway. There are stated preferences and observed preferences in economics but it applies to other areas of life.
There's a pretty good chance the Elon Musk, plus Russia and China have had more-orless unrestricted access to American's data since the DOGE dismantling of US government. Plus, by intentionally removing security and accountability mechanisms it makes it impossible to accurately determine how bad the damage actually was.
The Harper government actively worked on destroying the efficacy of the Canadian census, to make it more difficult for subsequent governments to make data-driven decisions.
In addition to the obvious goal of making it easier to identify and target homosexuals, trans people, minorities, immigrants, it's quite possible that destroying future governments' ability to make good decisions is one of the objectives of the Republican party. Stop voting for the face-eating leopard party, already. They don't use the litterbox, shit everywhere, and actively try to eat your face.
For all the very clever people pointing out that this is nothing new, I have two responses.
1. Your cell company may track your location, and your credit rating agencies know how many nose hairs you have, but they doesn't always (or even usually) have the deeply personal information you're supposed to put down in a census.
2. Enough of a change in degree is a change in kind. If you disagree, remember that Imperial Russia had the Okhrana and sent over a million Sybiraks - prisoners and exiles - to Siberia, and then the fucking CHEKA and the NKVD and then the (kinder, softer, slightly less outright murderous) KGB went ahead to send 18 million people into the GULAG system, and outright murdered half a million to a million. This was all the same, right? No difference?
The entity most capable of weaponizing demographic data is the government itself. If people weren’t previously providing false information to the census, I’m skeptical that this change is what will push people over the edge.
Congress passed laws that blocked the federal government from fusing data across departments for this specific reason. the admin decided to ignore those, and a friendly congress is deciding to not act on that.
You really, really don't want a government who can build a unified profile on you in that way.
i have such a hard time reconciling stuff like this:
> The census bureau decided to adopt differential privacy for the 2020 Census
and:
> The consequences will be dire for utility or for privacy, and possibly both. It's hard to understate this point: future statistical releases will either be useless compared to past ones, or they will be incredibly unsafe
so we took the census for centuries before this point, and it was “ok.” and for the last census only we added some privacy items. but if we remove just one of those filters, we are in “dire” circumstances? but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
this makes it feel like an emotional overblown problem
Believe it or not, mathematical techniques and computational power have increased in the past hundreds of years, not to mention the digitization of everything.
Privacy issues that weren’t possible before due to cost are now pennies to exploit. Also keep in mind as it points out people were using census data to drive gerrymandering efforts, so these attacks are real and have been going on for a long time.
> but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
One notable thing we have today that we didn't have 100 years ago is a computer. Before, you could reasonably assume that recreating individual records wasn't feasible, at least not on a large scale. You can't assume that now. A 4 digit password was safe for hundreds of years, but it would be a security lability today for the same reason.
Computers and improvements in data science/machine learning are basically the entire explanation. A LOT of the techniques that we use today to de-anonymize data require computation power not previously available. Even when doable, resources limited scale. Source: statistics degree
(Also, linkage. There are more data sources to cross reference now with the internet and social media and web tracking and hacks - the record footprint of Americans even as recently as the 70s and 80s was dramatically lower!)
> but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
If you are choosing hundreds of years ago, when we had no computers and internet, I wonder how we had worse privacy than the surveillance world today.
> so we took the census for centuries before this point, and it was “ok.”
Yes because we didn't have computers to unearth patterns in the data in a millisecond and politicians could have their career ended for doing the wrong thing, when revealed, instead of being rewarded for it.
As the article clearly states, privacy features have been in the census since 1990. It is just that the previously used privacy feature was not very strong and could be defeated. So it was replaced by a stronger feature in 1920. Before 1990 the census. 1990 was when personal computers were being popularized and the computing power available to individuals exploded and so then it was possible to use computers to separate out individual information from the data the census publishes. So the issue came up then.
At the Republican TX state convention this week, they proposed to add wording against differential privacy to their proposed platform via an amendment, justified with an example from someone supposedly involved with the census of how it was common-sense silly because one homeless guy under a bridge can become five via DP. I don't know if it passed, but that's the grassroots push behind things like this.
Having worked behind the scenes at a state convention (granted, not in Texas), there is no such thing as grassroots announcements/efforts at those things.
> Differential privacy makes this trade-off explicit, and thus impossible to ignore. Maybe banning it is a way of pretending that the problem doesn't exist, in the hope that it will go away?
Or it's saying that one of these conflicting goals is more valuable than the other, and so shouldn't be sacrificed for it.
Coming from a certain european country, you never know what answer on the census might get you into trouble.
"What is your religious affiliation". Seems perfectly innocuous, but turned out to be retroactively fatal if your answer could be attributed to you by a certain foreign occupier in the 1940s .
Right, but the entire damn point is privacy protections enable people to be more honest. The entire point is supposed to be good data so we can make informed population wide decisions.
And race is a pretty big one under the current administration which has had hundreds of legal immigrants arrested for weeks to months off of "suspicion" that for lack of concrete evidence could only amount to racial profiling.
France used to make plenty of lists. We loved lists. Lists are good. Jews lists? Sure, it's maybe useful one day when we want to do something.
Boy were the Germans happy to find these.
The American obsession with asking for people their perceived origins (AAPI, AA, Latino, ...) is more than weird: it's downright dangerous. Don't fucking ask these questions, and never, ever write it down, especially not with names.
Thankfully, now they can just buy it from data brokers and let Palantir target, so that makes life easier for them
France knows very little about managing a post-colonial multiracial society (except for terribly), I would appreciate if y'all listened and learned or at least approached the issue with more humility. France has serious racial, colonial, Muslim, immigrant, and banlieue inequalities, but its refusal to officially measure race/ethnicity makes those inequalities harder to see, litigate, quantify, or remedy.
It actually does. Religious affinity can absolutely be useful for longer trend studies, and census data is usually of much, much higher quality than other random sample studies.
> Differential privacy makes this trade-off explicit, and thus impossible to ignore.
I think he has it backwards here.
Techniques like differential privacy hide the fact that a trade-off exists, except for a small cadre of experts who live and breathe this stuff.
I don’t know enough to defend this decision, but it strikes me that if there is a real trade-off, not having access to these techniques will force people other than statisticians to confront the trade-off.
If data about the public is so dangerous that we must disguise the results, then perhaps its data we shouldn’t be collecting in the first place.
Nope private data about people is published unintentionally regularly, Netflix history and medical records being some of the notable examples.
People are bad at making the tradeoff because they consistently underestimate the amount of information that is leaked. Forcing them to leak safe amounts of information is the right way.
Not sharing or collecting the data could in some cases be better but there is clear value in this data so the optimal amount to store and make public is not 0.
I think the real killer is that every knows their data has been leaked six times over, and yet nothing bad has come of it for 99% of people.
If there was an apocalyptic privacy breach that lead to 40% of the population losing their savings, people would be smashing their smart TVs in the streets a day later.
But alas, nothing bad actually manifests (besides the suspicious ads that know you really like Tide detergent).
imho, one big reason why Data Science as a big org lost clout in tech companies was a tendency to treat DS as gatekeepers of data. Outsourcing the responsibility of stat thinking gave many DS a weird power trip; when one dude gets to decide the trade-offs first without anyone around them needing to understand properly.
> If data about the public is so dangerous that we must disguise the results, then perhaps its data we shouldn’t be collecting in the first place.
By this logic no one should ever collect your address for any reason ever. How do we function as a society if we can’t ever give PII in any context? Anonymization/security is critical and makes a lot of critical functions possible.
How could you receive your mail in a world where we never give out/collect info that is potentially hazardous?
Name, address, and phone number served plenty of critical functions when they were published in the White Pages. Cell phones not being listed there was kind of an accident of history. It was common to call a listed landline and be given or forwarded to a cell number. Only after most people stopped having landlines altogether did a phone number come to be considered sensitive information (unless you were a celebrity or something).
Ironically Facebook is responsible for much of this, as friending someone on Facebook became a lower stakes, less intimate alternative to exchanging phone numbers.
It would entirely be possible to limit the scope of things, by making sure the company that has your address (UPS or USPS, say) never has the other information. Each business would hand off a zero-knowledge identifier to you that you'd give to the others: Amazon would only know that the payment identifier they gave to you was fulfilled at VISA somehow, and then hand the package off to UPS with an identifier that they would never see again.
An argument about whether or not to deploy differential privacy on large statistical databases has no bearing whatsoever on whether or not you give your address to have a package delivered. If you want the package delivered, you have to give your address.
On the other hand, it’s not at all clear that people should have to involuntarily, my force of law, offer up all sorts of personal details about their lives. And questions about whether the use of differential privacy can or should justify the collection of sensitive information are quite valid.
The census is justified by the idea that it will help us plan for the future. But the track record of central planning is poor to disastrous.
A small example: in theory population changes could inform land use decisions. In practice however, the ability of population to increase is softly capped by the amount of housing that exists, or will exist. If you restrict or frustrate housing, you will also restrict people from living where they want to live. Then the planners will point to the census data and tell you that nobody wants to live there and therefore there’s no need for change.
Ironically, if you wanted to measure where people want to live in order to get information for planning purposes, the number is right there and doesn’t require any personal data collection at all - it’s the price. (in this example $ per square foot of floor space). But in my experience people who like central planning don’t believe in prices so they ignore that and they look at their reams of personal data and they conclude that all is well in the world. It is hard for me to be sympathetic if one day folks like that had
have less data to look at.
Can anyone explain to me the previous state and why it was desirable? I admittedly do not understand why people are getting riled up. I am not being difficult. I really don't understand the original state and the changed state here.
I know it is off topic, and the issues raised here are fairly profound, but I want to share the conformed idea of “Noise infusion banned for industries regulated by the FCC”
And a lot of countries have things like national IDs that, rightly or wrongly, given things like RealID and passports, that a lot of Americans just don't like on principle.
Sure, in Europe we don't because we already have databases of all citizens, also recording attributes like race, skin color, religious affiliation or political leaning in a database is highly illegal, both for the government and for private use.
The only reason we ever started doing this was to track ex-slaves and their descendants, and after-1965 every other possible grouping of people started begging for a category that it could use to get government grants in some way.
The irony is that now, when censuses somehow desperately need to figure out if you're Armenian or not, they don't count the descendants of slaves at all, preferring to lump them in with every dark-skinned person of partial African descent, but sometimes not the Spanish speakers(?!).
The US Supreme Court made a good decision (on admissions, not on the need for the approval of redistricting maps in places that have continuously attacked slave and Jim Crow descended voters.) The government needs to get out of the race and religious science business. Elected and appointed officials are openly claiming jihadi eschatology as the reason that they're supporting Israel, and openly explaining how the culturally varied mix of people who happened to live in land that Zionists wanted, or the Chinese, are inhuman races that are a threat because of their inhuman behavior and their inhuman values. We've woven church and race deeply into the government again.
The idea that preferential admissions to elite schools was going to somehow offset slavery was laughable anyway. It was just a grievance engine that gave people on top an excuse to feel downtrodden during the one of the most and the first vulnerable times in their lives - when they find out they're too stupid or boring to get into the college they want. I've always been partial to the libertarian solution to the problem of US slavery - Murray Rothbard and others said that according to the Libertarian homesteading principle, slaves should have been awarded the land and the factories that they worked. That it was an injustice that would lead to (what was in his view) catastrophe, such as how the freeing of Russian serfs in 1861 without any of the land still controlled by their ex-masters led to the Russian Revolution 50 years later.
I think it should be noted that there was a lot of dissatisfaction from users of the census data as far as I know. So it's not been banned just for politicals sake or because they hate privacy... Some people I talked to in the privacy field even called the whole thing a total disaster and weren't shy to put blame on John Abowd who apparently pushed this through despite a lot of internal opposition and concerns. Not sure if that's true, but what is definitely true is that the way the data was released produced serious issues downstream as most researchers and statisticians that ingested the data weren't prepared for receiving noisy data values. Differential privacy was applied in a way such that many invariants that data users cared about weren't preserved, which was expected as it's not possible as you can't preserve all invariants and at the same time add meaningful noise to the data. The thing is, with such a differentially private data release you need to adapt all of the downstream analyses to take into account the exact mechanism the data was altered in. And since the census bureau used a very intricate mechanism that didn't just add Laplace noise to data values but instead relied on a multi-stage process that preserved some invariants but not others it was very difficult to even write routines to account for the changes being made to the data. They essentially asked of every data user to rewrite their whole analysis pipeline based on the exact disclosure mechanism that contained a large number of bespoke choices regarding which data invariants to preserve and basically produced a mix of noisy, synthesized data that was just really hard to reason about. I don't even know if there even would've been a way to do this better, but the fact is that not every small county or school district has top-tier statisticians at hand that can just read a whole monograph on differentially private synthesized census data and then hotpatch their existing analysis systems to work with that data.
I was a big fan of differential privacy but now I think it might be doing more harm than good, as I haven't seen a single case where it was applied successfully in a problem where it actually mattered, and it contributed strongly to discrediting and preventing a lot of work on other anonymization techniques as it was deemed the only way to preserve privacy by the research community, so showing up with enhancements to k-anonymity or any other noise mechanism not rooted in it was a sure way to get ridiculed and ignored. And it's just not a practical mechanism, even when it works for a single disclosure you always end up having to blow up the privacy budget to a ridiculous amount in order to keep disclosing statistics as otherwise you would for almost all real-world data run out of budget after a few publications.
So, for me it's a technique that works in the areas where it doesn't really matter (publishing highly aggregated statistics that pose almost zero privacy risk even without differential privacy) and doesn't work in other areas where it would actually matter (publishing fine-grained data about individuals or small groups). There are some niche use cases but in my view the privacy community has really overblown the importance of differential privacy by portraying it as the only way to reliably anonymize data.
BTW the German census bureau has an interesting approach to anonymization which they use for several decades already and so far I haven't heard of any cases of successful de-anonymization of the data, maybe the US bureau should have a look at that for their own needs.
Of course there will be dissatisfaction from users of the data. Anyone that wants to use census data will prefer less privacy in the data. And anytime privacy is enforced the data becomes less useful. It would be certainly very convenient for both advertisers and gerrymandering political consultants to have detailed data on every citizen.
As the article says anytime you want to enforce privacy, the data becomes somewhat less useful, there is just no way around that.
The point of rights is that we have them and that they should not be trampled upon when they become slightly inconvenient to someone in power.
Are you sure about that? You are saying that differentially private census data couldn't be used for gerrymeandering and advertisement while non differentially private data could? Hard to believe, I'm not an advertisement or gerrymeandering expert but I would assume people running ads or cutting up districts are mostly interested in aggregate statistics i.e. they won't care about single households? And I would assume they can rely on voter files, party databases etc... And to the contrary there are reports [1] that indicate differential privacy actually makes gerrymeandering analysis more difficult or impossible. So, not really an argument for differential privacy, discriminatory action can be equally well taken based on differentially private data as the government cares about groups not individuals and groups aren't protected by differential privacy. It seems people really fundamentally misunderstand what this technique can achieve and what it won't do.
> serious issues downstream as most researchers and statisticians that ingested the data weren't prepared for receiving noisy data values
They weren't prepared for data that was obviously noisy. The data has always been inherently inaccurate, and folks just chose to ignore that previously
No, there are dozens of articles discussing the mechanism and explaining the impact it had in different areas e.g. [1,2,3]. And the release mechanism wasn't just "add noise", far from it, you may read the original paper [4] to see how intricate it was, anyone wanting to make real use the resulting data would have needed to understand that approach in detail to work with the resulting data. The report of the national academies [3] is probably the most comprehensive analysis of the mechanism and the complications it introduced, so writing "it has always been inherently inaccurate" is just wrong, this new mechanism was way worse than just introducing unbiased sampling noise.
So "differential privacy" pretty much sounds like someone gets to modify the results of a census and how it gets modified is entirely up to their discretion.
Seems like something that could be abused to achieve political objectives.
there are obviously measures in place to ensure the added noise is statistically homogeneous. the changes don't affect the final aggregates significantly, just enough to avoid saying much *about any individual person*.
know how you can buy "anonymized" data from data brokers and drill down until it's not anonymous anymore and in many cases point to the exact person? differential privacy would prevent that kind of thing.
If someone actually wanted to achieve political objectives by tampering with census data, there are better means than tampering with homogeneous statistical fuzzing.
Not really, it has to be random in a predetermined fashion to be considered differential privacy. It is reversible in the way that someone shouting over an aicraft producing white noise is intelligible.
I guess someone could fiddle with the noise, but then why not nudge the originals? Or more insidiously, control what is published?
If someone modified the original dataset and it was discovered they would be held accountable. However if you have a departmental policy of modifying the data for "privacy reasons" and it just so happened to surrepititiously affect some sort of political outcome then ah geez that just a wacky coincidence not any individuals fault.
Any privacy-diminishing changes at federal level happening during this administration are for one reason only: to amass more power in Conservative administration/governance. At the federal level it's Project 2025, at the state level it's making sure states stay red and disenfranchise minorities.
I really have to take the anti-noise side here. I get why it's a hard problem, and I get why the Census Bureau thought this was a neat solution. But I'm imagining an accountant stepping through a similar chain of logic:
* I want to accurately report the finances of our company to the best of my ability.
* But that report would allow people to reconstruct private data about the terms of our contracts with various counterparties. I'd really like to avoid that, there's no rule that says we're supposed to release that data. In fact some of those contracts probably came with nondisclosure agreements!
* So here's what I'm going to do. I'm going to calculate our results to the best of my ability, and then I'm going to add random values to them and report only the randomized ones. Any reconstruction people try to do will be wrong because of the randomness.
* If the SEC says "no, you need to report your actual numbers", I will explain to them that there's no such thing as an actual number because all data is noisy.
Applying subjectivity to what they keep and where it's bound, implies that this was always an expression of opinion.
Science intrinsically ignores opinions.
The officials responsible for this smearing of data should be tried. This was a violation of the free speech clause as it coercively manipulated public beliefs. This was a crime against science and civil rights.
The dueling political demands of accuracy and privacy are simply incompatible at some level. After reading this, maybe Hanlon's Razor isn't the right standard. Besides malice and stupidity, there is impossibility. Some problems just aren't solvable under certain constraints. I don't envy the statisticians tasked with finding a politically palatable solution to a math problem.
But the strength of differential privacy is that you can now make this tradeoff explicit and quantify it. I always liked it because it offers a mathematical solution to a policy problem, but then of course it's up to us to decide what parameters and tradeoff to choose. Also, some data might just not get published at all if the privacy implications are too problematic, so differential privacy might buy you more signal!
Yeah, the main issue with differential privacy is that you need competent government officials making decisions who understand math beyond a high school level.
There's a ton of information in the US that is accessible to various degrees--especially through the the deep web much less background investigations. Unless you're a wealthy person who can set up various levels of trusts you can't really hide them.
You can of course disagree about what what should actually be part of a transparent public record. (Though I suspect a lot of people post-date what was generally available in a "phone book.")
I have filled out census forms in the past and it was not a big imposition. During the last census I had supposed census workers showing up at my home multiple times and pushily asking for an in person interview. I told the guy that came initially that I was not interested as I had a full time job, a 5 year old, and newborn twins. He brazenly said “your wife can do it” with zero consideration that she was just cut open weeks prior. A couple weeks later he shows up again at like 7pm pounding on the door right in the middle of the kids bedtime routine. I told him it was a really unwelcome visit and sent him on his way. A couple weeks later a car comes rolling up to the house on a Saturday and the woman driving tells me she is the guys supervisor and they really want the interview. I explained to her the situation, the newborn babies, the previous encounters, etc. she seemed completely undeterred and just went right back to pestering. I told her if anybody from the census came back they should go ahead and bring the sheriff because I’d be calling for trespassing. They finally stopped bothering me.
I think it's easy to predict some things that will happen in 950 days
in 950 days there will be several hundred warehouses concentrating over a million people in this country including many thousands of children costing a quarter trillion dollars (already funded)
and the Iran War will still be happening despite over a hundred declared "deals"
and the US will be running Cuba (forcing millions to return there)
statistical noise or the lack of it will be the least of our problems
This is a gift to reactionary gerrymandering and voting restriction efforts, along with things like yesterday's FBI raid of an Ohio voting rights organization.
Representative Beatty serves her own interests and her involvement Kennedy Center naming was just more of the same performative politics she routinely engages in. She's on the verge of being an octogenarian and missed a number of key votes, like the bill that cut funding to NPR, PBS, and other govt. programs. Kudos to her for working to remove Trump's name from the Kennedy Center but she needs to go.
Yet another thing this admin is screwing up. News at 11. Let’s fix this in the midterms by voting out the republicans. That’s it. That should be the sum total of the platform: not republican and not crazy.
headcount doesn't have to be granular, it has to be accurate. this is about the very useful street- and block-level data.
also, if how would anyone know how accurate the "transparent" number is? if Trump or Thiel can fuck with the fuzzing they can just as do so with the base data.
Frankly i see no reason to keep this data private. They should simply publish a full dataset of the census, with no such data coarsening/differential privacy/ etc...
Fundamentally this is public data. If it's to dangerous to make public, it's too dangerous to collect, and people should be aware of exactly what it is.
There are very few things that the state has data on that should not be made public. Census data is simply not one of those things.
publishing should be the default for any data, and to keep it unpublished should require substantially good reasons that impact the country as a whole. Frankly, if it isn't detailed national defence plans, i struggle to see any data that should not be public.
The biggest challenge with running a census is getting people to trust you enough to answer your questions.
A lot of census questions are sensitive. The ACS covers topics like citizenship status, disabilities, income, SNAP assistance, languages spoken at home.
If you want accurate information about the people who live in your country you need the census process to feel as safe for people to respond to as possible.
Are you saying the census shouldn't collect any data that people wouldn't be comfortable publishing? Because that's a recipe for a census that is far less useful for helping the country make useful decisions.
> Are you saying the census shouldn't collect any data that people wouldn't be comfortable publishing? Because that's a recipe for a census that is far less useful for helping the country make useful decisions.
I'll say that. The state representatives should provide congress and the president any data needed to inform policy decisions about the people they represent. And as others have pointed out, other departments and agencies (such as the IRS) have most of the rest of the data required to make policy decisions.
Except for gerrymandering purposes, I fail to see why income, party affiliations, etc., is useful for the purpose the census was created for.
This seems’s like an issue created by congress. the constitution only requires a headcount by state. Maybe they should use another mechanism to collect demographic data. Since the concern is not about representation, but allocation, tax returns seem like an obvious alternative and they are already private and collected at a much more granular level.
The census isn't for helping the country make any decisions other than determining the number of representatives and apportionment of taxes. It should not be collecting any data that isn't necessary for that.
I'd like to know when they stopped publishing census data. I have used it for genealogical purposes to track ancestors: you can see exactly who was living in which house, how they are related, and what their ages are (I found that women in my family often reported, both on the census and marriage documents, being younger than they actually were). I don't think I've seen data from after 1950, though.
I don't understand why the census would include SNAP data or income: surely the government already has that information. I have never doubted that the IRS knows my income better than I do. Maybe better use of existing datasets could restrict the census to less invasive questions.
>Are you saying the census shouldn't collect any data that people wouldn't be comfortable publishing? Because that's a recipe for a census that is far less useful for helping the country make useful decisions.
That seems to me like it's a good thing. Allow people to determine whether the data is actually needed, rather than closing their eyes.
This is the real reason for the fudging of the data. People don't want an ethnicity/citizenship status/birth country breakdown of things like benefit use.
Replying to the ACS with accurate information is required by law, so they don't actually need to rely on people feeling safe to get answers.
I don't trust the Census Bureau with my data, so if this is as "dangerous" as the author and some people here seem to think, they shouldn't be collecting it in the first place.
1. People give the information to the government under the expectation that this data is to be kept private or used in such a way that individual targeting is made impossible, you break that expectation and people will lie or won't give you this data.
2. Without noise injection it's rather simple to do statistical attacks to reverse engineer individual entities.
3. This data is and has already been used in the past to undermine democratic systems by targeting and disenfranchising minorities, as well as gerrymandering the US to hell.
4. "Too dangerous to make public, too dangerous to collect" - this is a false dichotomy. To govern effectively you need sensitive data, but it should be collected and used in a way that's safe for the individuals.
5. Macro level aggregates don't need individual exposure, that's why noise, anonymization and statistical functions are fine.
That's a good default position, and I think should be our starting point.
But the devil is in the details. If we don't want advertisers constructing semi-complete profiles from simple web interactions then why would we publish 330 million census questionnaires for their use?
>If it's to dangerous to make public, it's too dangerous to collect, and people should be aware of exactly what it is.
While this may be a reasonable stance in theory, there are many examples in reality where the danger has not materialized for decades. Personally, I have access to health records, birth certificates, and death certificates collected by a state. They contain very personal information. As far as I know, they have not been leaked to the general public.
This is one of those situations where everything you hear tells you the system is failing, but that's because nobody talks about the systems which haven't failed.
Besides, this possible failing of the Census' privacy promises shouldn't convince us that "If only we hadn't given info to the despotic and cruel government using it to target people, then we'd only have a despotic and cruel government hurting people randomly." The solution to this problem isn't to withhold info, it's to get rid of the despots.
So do you believe that individual income should be public? Or do you believe that the government should not take income into account for taxation or distribution of benefits?
Then dox yourself right now with your previous census answers and PII. There are several obvious reasons to keep the data private, all you have to do is use your brain.
But why is the census asking about those attrbutes at all. The Constitution requires a count. That's it. A number. We don't need to know the rest of it, or if we do, it should be surveyed separately with voluntary participation.
It’s because people are significantly more likely to lie or omit some facts if you don’t guarantee their privacy, which means your census data ends up being worth less than a pile of shit.
The alternative is to water down the census questions, which also leads you down the same path (i.e. manure as data).
So you seem to have at least a surface level of understanding of incentives.
Check this then:
If the census is responsible for allocating federal funds and congressional apportionment, what are the incentives for making census data private and encouraging people that would otherwise hide their identity?
First off the census is used for determining how many seats are used for congressional apportionment and allocating federal funds.
So unless you're willing to also say that counted illegals cannot used for either of those, then you're just being obtuse.
But if we can agree that they cannot be used for that then sure, lets identify and count them. If we can't identify (make non-private) and count them then why should we trust that those counts are accurate?
Adding fake data (noise) officially to an important data such census, is the height of weirdness of the West. The nations are totally confused between privacy and visibility requirements. The privacy and freedom is effectively working against the very foundations of the nation, as the binding force between elements of a nation is directly affected by privacy.
Excessive obsession with equality is another thing that works to erase any cognitive abilities of the people to recognize differences in gender, race, age, culture etc. Equality is good to a reasonable extent but it shouldn't be forced to an extent to erase the cognitive capabilities gained through evolution.
I "enumerated" for the last census. Trust in my community was already not high* and I had lots of interesting encounters. I really believed the rather invasive data I was collecting with a friendly face would be used and handled responsibly. I feel for the poor souls that'll sign up to go door to door for 2030 now that the firewalls against weaponizing and monetizing all of our sensitive government data has been torn down, and even more for those that will volunteer information that can hurt them.
The comments that this rather expensive endeavour should just be about getting a head count are also amusing to me. The data collected was such an important baseline of common understanding, and this will not be a good thing for its future quality. I've grown very jaded now seeing all the things taken for granted in this country and lost or degraded recently with a whimper.
*: To be fair, they sent me specifically to places that didn't respond, so I was naturally led to believe that everyone in my region hated the government, ignored bizzarrely threatening fliers, or had recently moved and had no knowledge of the inhabitants (if any) during the census period.
> The comments that this rather expensive endeavour should just be about getting a head count are also amusing to me. The data collected was such an important baseline of common understanding, and this will not be a good thing for its future quality.
Even without considering the Census data products alone, Census demographic data underlies virtually all extrapolation from other survey research. Everything from national opinion surveys based on tens of thousands of respondents, to small community surveys. A Census product with the most diverse participation pays off almost infinitely for America. It benefits everyone from national newspapers to rural counties.
If the smallest communities lose what little trust remains in the privacy of the Census, they have the most to lose in all of these ways.
[dead]
[flagged]
1 reply →
I did similar and you summarized the feelings well. It's really sad and hard to rebuild that trust
And disheartening that people continue to gravitate to a political party that proudly announces desires to abuse this data.
>And disheartening that people continue to gravitate to a political party that proudly announces desires to abuse this data.
The same party that promotes distrust in the government (that is justified by the abuse the same party does when in power).
Amazing, innit.
22 replies →
You're playing party politics. That's the risk you take: that the party has goals beyond your (dareisay naive) utopian ideals for civic engagement.
Parties are not universally evil, when I malign them in this way it is in full acknowlegment that organization is the nearly singular path to "effect on target" as regards society-scale politics. What I mean is the party per se becomes a superorganism that has always as its first priority self-preservation (a la homeostasis) and it is very worth remembering this when subsuming oneself into their structure.
The real decline started after Edward Snowden and all the information that came out about the NSA. It really sparked distrust in the government. Trying to get people to respond to surveys was already hard, why would those general people believe the Census Bureau is actually keeping their data safe? Doesn’t matter when it comes to laws and the constitution, if you work for an Agency. You are the government. Response rates keep going down, now we have attacks from the President on statistics about the economy. I’m a little cynical and I just assume they will continue to shrink the statistical agencies and make the statistics more useless (which is what this recent policy change does), and they will shift to the private industry. Even though the private industry cannot do the work in the Field that the government does.
> The real decline started after Edward Snowden and all the information that came out about the NSA. It really sparked distrust in the government.
Do you have evidence of this? Because I'd bet 90% Americans have no idea who Edward Snowden even is.
1 reply →
I buy the argument that a functioning democracy requires the populace to believe that the government is honest, competent, and working in their interests. Watergate, Iran-Contra, and the Vietnam war (respectively) undermined those notions. As of ~2016, half of the US voting population had come of age after those events.
> The comments that this rather expensive endeavour should just be about getting a head count are also amusing to me
Countries conduct censuses so they can understand, in great detail, what is going on with the people who make up the country.
With this accurate information, improvement plans can be made, and life can be improved for everyone.
The comments about just making it a head count give a very interesting window into the mentality of many these days. They don’t want to - it can’t fathom how to - make life better.
It’s sad, really
Indeed, the very word "statistics" originates as an understanding or description of the state [1].
[1] https://en.wikipedia.org/wiki/Statistics#History
Or worse, they actively don't want to make life better for the "wrong" kind of people.
Eh, that’s the ‘if people do the right thing’ approach.
Many countries use census data to target (or even round up and murder) specific groups of people by religion, ethnicity, etc.
14 replies →
[dead]
Pretty sad, in my opinion. In my ideal the state should have visibility into the shape of the people present so that we can make good decisions about our combined organization. I think we’re making a mistake we will come to regret by intentionally damaging our data collection infrastructure.
I think a large amount of the US’s success is the result of good institutions handling granular data. Policies can be adjusted to match outcomes more rapidly than otherwise.
I understand why people decide to diminish all state capacity - they feel that governments are populated by their opponents who will use state capacity against them. But as our relative strength wanes, our ability to overcome these forces of inertia does as well. And then our governments become less capable and eventually life starts getting worse.
We don’t need house-level data immediately (except perhaps in order to place census blocks within their appropriate congressional district etc). But there are aggregation units above which we should be using as good information as we possibly could be.
> I think we’re making a mistake we will come to regret by intentionally damaging our data collection infrastructure.
Intentionally damaging infrastructure is the recurring theme of this administration.
[flagged]
This does nothing to make government less powerful.
It just makes government stupider so even if they decide they want to do the right thing, now they can’t because they don’t have the information needed to make effective decisions.
No, it gives them data to attack specific groups of people that were previously anonymized. The two options are less granular data, or data that can be abused.
There is no question the end goal is data that can be abused, and anyone left who would protest their actions will be fired and replaced with more sycophants.
3 replies →
It makes the government stupider so there are more excuses to bring in better private solutions.
Handicap the public services if they are working well, then talk about how bad they are to justify for-profit replacement.
Or don’t and just exploit the gaps directly with better private data, whatever increases proximate wealth inequality.
It makes it harder for them to do things, that is both right things and wrong things. They are doing a lot of wrong things recently.
Making them less able to do whatever it is they might want to do is pretty much the definition of making them less powerful.
2 replies →
I’d be more interested in giving my state detailed info, letting them run programs. The country can have aggregate data.
The history of the VRA suggests that several states simply cannot be trusted to do that for all their residents.
2 replies →
That works great for real states, but some states are just three mining companies in a trenchcoat.
The feds have smart people who find the levers to work to get municipal, county, state and private data via voluntary/“voluntary” disclosure.
That would probably not be constitutional. I don't think the states are unable to run their own census, but the Constitution requires a federal one.
1 reply →
> In my ideal the state should have visibility into the shape of the people present so that we can make good decisions about our combined organization.
That ideal became tantamount to enabling genocide when the US government breached the confidentiality of the census in order to prison camp the japanese on the basis of their race.
> I understand why people decide to diminish all state capacity
It's not even just a question of "all". The state should have the absolute minimum capacity to carry out its necessary tasks. Collecting race (just to give one example of many) of any form is not absolutely necessary and so it should not be done.
> they feel that governments are populated by their opponents who will use state capacity against them
Because they may be in the future. -- but even that is too strong, the greatest harm perpetrated by state actors has consistently come from trying to "help" rather than intentionally malicious acts.
Replying to a dead comment that demanded an example, for example, Mao's mass killing of somewhere on the order of 30-40 million people famine (in addition to the million straight up murdered in the cultural revolution) created as a result of "helping" through planned economy food distribution and the Eliminate Sparrows Campaign.
People only kill at a truly massive scale because they believe they are doing something good or at least necessary (even in war, but especially outside of war). This is why hoping states aren't evil isn't sufficient-- in fact it may induce mass murder, because what could be less evil than to Do the Right thing.
The universal cure is to distribute power and influence in as many ways as practicable, such that the damage from erroneous thinking is contained.
2 replies →
[dead]
But this article is about a decision to damage the census less. If you value an accurate census, you should be celebrating!
If they follow the rules, preserving privacy via cruder methods, the data will be much more damaged.
For any particular level of privacy, the banned methods can give you more accurate data. For any particular level of accuracy, the banned methods can give you better privacy.
The only way we're getting more accurate data is if the new rule causes them to largely abandon privacy. That would be bad. Harm for no benefit.
1 reply →
TFA lays out why things don't work that way. If you erode trust in the privacy of census responses, an awful lot of folks will have to start lying on their census
1 reply →
Whatever you do, there is a level of trust that is assumed when census takes place. The trust that this data is then not identified in a way that could be targeted for scams, frauds, and other such evils. But in NY, house sale records are made public but much to the detriment, many mortgage companies fake a bill for payment.
Differential privacy is absolutely necessary, and the social scientists being unable to reconstruct the data at an individual level is intended. A macroscopic description is rather enough for most purposes, and anything more is asking for a surveillance state.
In Ohio (or at least my county) the deed and mortgage are public record. As is a record when the mortgage is paid off. Interestly also property tax charges and payments are, too
> But in NY, house sale records are made public but much to the detriment, many mortgage companies fake a bill for payment.
That frankly sounds more like a failure of enforcement, on top of a failure of the construction of the financial system. Here in Germany, it is absolutely not a common thing that mortgages or the banks holding them get sold like a hot potato towards some other sucker, and thus such a letter would cause immediate suspicion.
Here in Germany, founding a company creates a public record. There are a number of companies who then send all newly formed companies an invoice that looks like a legitimate invoice for expenses related to creating said company, but on closer inspection actually contain a dense paragraph of text that details that this is not an invoice at all, merely an offer you accept by paying. Quite possibly even a subscription
It's a well-known trick, our notary warned us that these letters would come and we should scrutinize any invoice for a while. But they manage to skirt at the edge of legality
7 replies →
An earlier article on the same blog had some very useful information on how easily the aggregated census data from 2010 (before differential privacy) could be used to reconstruct real data for individuals: https://desfontain.es/blog/us-census-reconstruction-attack.h...
I am personally convinced that the reason noise infusion was banned was because powerful people were already reconstructing individual data from census for the purpose of gerrymandering, and they wanted to continue gerrymandering.
Why do you need individual data for gerrymandering? Don't you only need area level?
To optimally crack a district to bias in one parties favour it is often required to literally run a boundary down a street to separate one side (close to a university, say) from the other.
Once you've table voter preferences to actual street addresses you are no longer in the realm of "broad area cumulative averages and medians".
Or worse. It allows anyone to build really targeted data sets. Insurance companies would love such data, and many of those will use them without scruples.
Ban it from the dataset, add it to the analysis. You can choose your own flavor of noise.
I don't know what the political undertones are here, but at some level you need to have actual ground truth, including "this person/household declined".
Publishing raw data though? That seems like shooting yourself in the foot from a national security perspective, not to mention all the other reasons not to do it.
> Ban it from the dataset, add it to the analysis. You can choose your own flavor of noise.
It is introduced in the public data, not the secret data.
> Ban it from the dataset, add it to the analysis. You can choose your own flavor of noise.
Not sure exactly what you're proposing, but if the noise is added independently to different people, you can just buy multiple copies to reduce it.
There are a lot of ways to do this wrong, which is why so much analysis has gone into differential privacy.
Sorry, I think you're reading more into this than I intended to say. My point was that the raw data itself doesn't need noise, but the published data necessarily does.
The replies here arguing we should publish it all are wild in the worst kind of first-order thinking way.
It’s a census: it just asks questions.
If you start publishing and weaponizing the data against people with various attributes, they’ll just lie or not answer. And then you are left with worse than nothing: bad data people try to act on.
You first gather the data while people don't know or care. Then you weaponize it later. It happened at least once not long ago in another country, seems not overreaction to be concerned about it
It happened a year ago in this country, with IRS sharing data with ICE (breaking a longstanding policy of keeping taxpayer data private within the government).
If this is a Nazi reference, Census data was used to send people to concentration camps here during the same era. Less awful than death camps, at least.
6 replies →
The US Government is the entity that weaponizes the data. The most obvious example is the Census Bureau compiling lists of people of Japanese descent to imprison during WWII. That's just the most obvious one that I know of without looking up more.
The real push for this now is to form lists of people to disenfranchise.
There is a significant movement in conservative circles that "the census should literally only be a count". this could be a wedge to prevent detailed demographic data collection by the government
20 replies →
yeah,
and implicitly force them to sell the land they own for less then it's worth, which in combination with setting up very messed up tax related laws in some states (1) which highly benefit you if you bought land longer in the past effectively "killed" a budding, wealthy, land owning Asian community and made sure it can't really regrow in that form.
(1): I think it was mainly California, but don't remember full
4 replies →
> The US Government is the entity that weaponizes the data.
Pointing at an example from so long ago to find "the" misuser is turning a blind eye to lots of active misuse.
Databases are neutral until someone asks them for a list.
Remember “leftist “ and transgender activists are terrorists now.
First they came for…
115 replies →
[flagged]
36 replies →
Does anyone actually believe this crap?
You think the census is what the government would use to mass identify and imprison people, not the NSA database(s)?
You think homeland security, or the FBI, or any other alphabet agency doesn't already have access to a giant list of people?
Think about what meta knows about everyone, or Google. You do realize that the US gov has read access to their core databases right?
"The census" has absolutely no bearing on any of that which you're worried about.
It's just shocking the level of ignorance that gets upvoted in the comments here now.
25 replies →
The easy solution is to just reduce the resolution and scope of the data to the degree it is absolutely necessary. The census exists to inform representation decisions. All other concerns are addons. You can have all the data on the county or voting district level and strip data as you increase your resolution, to the point you only keep population number at the neighborhood, block level.
Knowing the racial, ethnic and socioeconomic background of the residents of a single building block is only useful to discriminate against them.
Demographic information is useful for medical, financial, educational, and so many other items.
The current admin doesn’t need it to discriminate, you can just access cameras and license plate readers and target easily that way.
The purpose is to scare people into misstating or obscuring data to reduce total house representation for an area. It’s to win votes, there are much better ways to do all these things than use this data, but effecting the vote with limited impact is a huge money savings.
4 replies →
There are plenty of other uses - knowing where to build stores to serve your target market, predicting possible pandemic vulnerabilities, etc.
The real question is why anyone answers these questions in the first place? I just wait until a census worker shows up and tell them how many people live at my domicile. It's needed for proper electoral representation and absolutely nothing else.
This administration does ... not ... care ... about ... facts.
Any use to identify where government resources are best used, will have people thinking they should have gotten more and would have if they'd answered differently. Ie, that their answers were "weaponized" against them.
I guess the way to optimize is to find an equilibrium between an extreme of specificity and an extreme of vagueness that's still actionable from a high-level policy perspective.
Something about this conversation is fundamentally broken if there's no space to iterate towards optimization and instead it's just swinging between maximalist extremes.
This might be the point. As long as they think the people who end up under-counted are not people this government would like to have voting power for the House of Representatives.
Imagine the weaponization possibilities when combining the census data with Amazon’s and Meta’s data, and possibly several other datasets readily available to this administration. Whatever is missing from one of them can be inferred or defined from the others. This might already be happening, it can’t be checked. Some (former) dictators would be salivating.
Yes.
Extremists or in general any fraction willing to engage in systematic discrimination, harassment, terrorizing or similar love highly detailed non anonymized census data.
Why?
Because it gives them the perfect layout for which areas to harass (areas likely to yield), which to brutalize (areas unlikely to yield or from especially "hated" people), which to best not touch which (areas with too much influence/money or likely to contain hidden sympathizers), which to systematically take apart through other means like building a highway through them (e.g. "hated" communities to strong/connected to brutalize). etc. etc.
All of this has a lot of history weather it's from right extremists like fascists or left extremists(1).
At which point the question is, if the data you collect is that abuseable. Should you even collect it? Is it even really needed?
(1): Like actual left extremists, the a lot of US sources have the habit to label people as left extremists which by EU standards sometimes aren't even left (but centrist) and very far away from extremism...
The term “first-order thinking” just clicked for me. So revealing. One of today’s lucky 10,000
Then maybe the data shouldn’t be collected in the first place?
It's a census: it's only function is to determine the number of representatives your state should have.
Please don't ask about my toilets, my demographics, or my religion.
Thanks.
have you not been paying attention for 10 years? At the top of the rotting snakehead they know all this, they arn't arguing in good faith.
You can’t completely trust what people say anyway. There are stated preferences and observed preferences in economics but it applies to other areas of life.
>It’s a census: it just asks questions.
Thats what dutch and french bureaucrats thought until 1940.
There's a pretty good chance the Elon Musk, plus Russia and China have had more-orless unrestricted access to American's data since the DOGE dismantling of US government. Plus, by intentionally removing security and accountability mechanisms it makes it impossible to accurately determine how bad the damage actually was.
[flagged]
> they’ll just lie or not answer
The Harper government actively worked on destroying the efficacy of the Canadian census, to make it more difficult for subsequent governments to make data-driven decisions.
In addition to the obvious goal of making it easier to identify and target homosexuals, trans people, minorities, immigrants, it's quite possible that destroying future governments' ability to make good decisions is one of the objectives of the Republican party. Stop voting for the face-eating leopard party, already. They don't use the litterbox, shit everywhere, and actively try to eat your face.
For all the very clever people pointing out that this is nothing new, I have two responses.
1. Your cell company may track your location, and your credit rating agencies know how many nose hairs you have, but they doesn't always (or even usually) have the deeply personal information you're supposed to put down in a census.
2. Enough of a change in degree is a change in kind. If you disagree, remember that Imperial Russia had the Okhrana and sent over a million Sybiraks - prisoners and exiles - to Siberia, and then the fucking CHEKA and the NKVD and then the (kinder, softer, slightly less outright murderous) KGB went ahead to send 18 million people into the GULAG system, and outright murdered half a million to a million. This was all the same, right? No difference?
The entity most capable of weaponizing demographic data is the government itself. If people weren’t previously providing false information to the census, I’m skeptical that this change is what will push people over the edge.
Congress passed laws that blocked the federal government from fusing data across departments for this specific reason. the admin decided to ignore those, and a friendly congress is deciding to not act on that.
You really, really don't want a government who can build a unified profile on you in that way.
2 replies →
i have such a hard time reconciling stuff like this:
> The census bureau decided to adopt differential privacy for the 2020 Census
and:
> The consequences will be dire for utility or for privacy, and possibly both. It's hard to understate this point: future statistical releases will either be useless compared to past ones, or they will be incredibly unsafe
so we took the census for centuries before this point, and it was “ok.” and for the last census only we added some privacy items. but if we remove just one of those filters, we are in “dire” circumstances? but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
this makes it feel like an emotional overblown problem
Believe it or not, mathematical techniques and computational power have increased in the past hundreds of years, not to mention the digitization of everything.
Privacy issues that weren’t possible before due to cost are now pennies to exploit. Also keep in mind as it points out people were using census data to drive gerrymandering efforts, so these attacks are real and have been going on for a long time.
I don’t understand why gerrymandering would require privacy violation, or how differential privacy would stop it.
2 replies →
> but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
One notable thing we have today that we didn't have 100 years ago is a computer. Before, you could reasonably assume that recreating individual records wasn't feasible, at least not on a large scale. You can't assume that now. A 4 digit password was safe for hundreds of years, but it would be a security lability today for the same reason.
Computers and improvements in data science/machine learning are basically the entire explanation. A LOT of the techniques that we use today to de-anonymize data require computation power not previously available. Even when doable, resources limited scale. Source: statistics degree
(Also, linkage. There are more data sources to cross reference now with the internet and social media and web tracking and hacks - the record footprint of Americans even as recently as the 70s and 80s was dramatically lower!)
The concerns here, like most concerns about privacy, are hyperbolic hypothetical hypochondria, until they’re not.
> but there were no privacy features before. so we’re actually still much better off than we were for hundreds of years before this.
If you are choosing hundreds of years ago, when we had no computers and internet, I wonder how we had worse privacy than the surveillance world today.
> so we took the census for centuries before this point, and it was “ok.”
Yes because we didn't have computers to unearth patterns in the data in a millisecond and politicians could have their career ended for doing the wrong thing, when revealed, instead of being rewarded for it.
> so we took the census for centuries before this point, and it was “ok.”
It wasn't ok - it's been shown that the data released could individually identify people in releases before the 2010 Census.
For decades we were encrypting our communications with rsa, surely nothing is wrong with it?
There is nothing wrong with it, and RSA is still commonly used. In fact, RSA is better against quantum computers compared to ECC.
As the article clearly states, privacy features have been in the census since 1990. It is just that the previously used privacy feature was not very strong and could be defeated. So it was replaced by a stronger feature in 1920. Before 1990 the census. 1990 was when personal computers were being popularized and the computing power available to individuals exploded and so then it was possible to use computers to separate out individual information from the data the census publishes. So the issue came up then.
No it is not an overblown problem.
As far as I recall they did have some measures in place. Differential privacy just made it a bit more robust.
Arguably the defaults for differential privacy are too robust but that is a different story.
At the Republican TX state convention this week, they proposed to add wording against differential privacy to their proposed platform via an amendment, justified with an example from someone supposedly involved with the census of how it was common-sense silly because one homeless guy under a bridge can become five via DP. I don't know if it passed, but that's the grassroots push behind things like this.
Having worked behind the scenes at a state convention (granted, not in Texas), there is no such thing as grassroots announcements/efforts at those things.
How do you know that's grassroots?
That's what confounds the media [1]
[1] https://www.youtube.com/watch?v=sN98wmzisn4 (Excuse this poor quality, it was the only version I could find that wasn't tiktok)
> Differential privacy makes this trade-off explicit, and thus impossible to ignore. Maybe banning it is a way of pretending that the problem doesn't exist, in the hope that it will go away?
Or it's saying that one of these conflicting goals is more valuable than the other, and so shouldn't be sacrificed for it.
Coming from a certain european country, you never know what answer on the census might get you into trouble.
"What is your religious affiliation". Seems perfectly innocuous, but turned out to be retroactively fatal if your answer could be attributed to you by a certain foreign occupier in the 1940s .
Surely any such foreign occupier would just demand the unredacted data?
Exactly why a government may refrain from collecting such data, as it is not even relevant in any kind of policy decision.
10 replies →
That's where you hope people like Rene Carmille are around. S
Yes, which is why the government shouldn't have this data at all in the first place.
6 replies →
They don't ask about religious affiliation on the census.
1. How many people were living or staying in this house, apartment, or mobile home on April 1, 2020?
2. Were there any additional people staying here on April 1, 2020 that you did not include in Question 1?
3. Is this house, apartment, or mobile home?
4. What is your telephone number?
5. What is Person 1’s name?
6. What is Person 1’s sex?
7. What is Person 1’s age and what is Person 1’s date of birth?
8. Is Person 1 of Hispanic, Latino, or Spanish origin?
9. What is Person 1’s race?
Nothing really stops you from lying either.
Right, but the entire damn point is privacy protections enable people to be more honest. The entire point is supposed to be good data so we can make informed population wide decisions.
And race is a pretty big one under the current administration which has had hundreds of legal immigrants arrested for weeks to months off of "suspicion" that for lack of concrete evidence could only amount to racial profiling.
1 reply →
France used to make plenty of lists. We loved lists. Lists are good. Jews lists? Sure, it's maybe useful one day when we want to do something.
Boy were the Germans happy to find these.
The American obsession with asking for people their perceived origins (AAPI, AA, Latino, ...) is more than weird: it's downright dangerous. Don't fucking ask these questions, and never, ever write it down, especially not with names.
Thankfully, now they can just buy it from data brokers and let Palantir target, so that makes life easier for them
France knows very little about managing a post-colonial multiracial society (except for terribly), I would appreciate if y'all listened and learned or at least approached the issue with more humility. France has serious racial, colonial, Muslim, immigrant, and banlieue inequalities, but its refusal to officially measure race/ethnicity makes those inequalities harder to see, litigate, quantify, or remedy.
1 reply →
"What is your religious affiliation" makes absolutely no sense in a census exercise. IMO.
The U.S. Census Bureau collects tons of data unrelated to the decennial counting for Congressional apportionment.
https://www.census.gov/programs-surveys.html
The American Community Survey is the most well-known, as it replaced the “long form” sampling that had been an extension to the Census.
Unless you’re a government explicitly and openly aligned with Christian nationalists.
The point might be going over my head… why does it make no sense?
5 replies →
It actually does. Religious affinity can absolutely be useful for longer trend studies, and census data is usually of much, much higher quality than other random sample studies.
2 replies →
Asking about your religion on the census is against the law in the US:
no person shall be compelled to disclose information relative to his religious beliefs or to membership in a religious body.
https://www.congress.gov/94/statute/STATUTE-90/STATUTE-90-Pg...
> compelled
Doesn't that mean they can ask that question with an option for "rather not disclose"?
1 reply →
Religion is just an example. Don't dwell on it.
> Differential privacy makes this trade-off explicit, and thus impossible to ignore.
I think he has it backwards here.
Techniques like differential privacy hide the fact that a trade-off exists, except for a small cadre of experts who live and breathe this stuff.
I don’t know enough to defend this decision, but it strikes me that if there is a real trade-off, not having access to these techniques will force people other than statisticians to confront the trade-off.
If data about the public is so dangerous that we must disguise the results, then perhaps its data we shouldn’t be collecting in the first place.
Nope private data about people is published unintentionally regularly, Netflix history and medical records being some of the notable examples.
People are bad at making the tradeoff because they consistently underestimate the amount of information that is leaked. Forcing them to leak safe amounts of information is the right way.
Not sharing or collecting the data could in some cases be better but there is clear value in this data so the optimal amount to store and make public is not 0.
I think the real killer is that every knows their data has been leaked six times over, and yet nothing bad has come of it for 99% of people.
If there was an apocalyptic privacy breach that lead to 40% of the population losing their savings, people would be smashing their smart TVs in the streets a day later.
But alas, nothing bad actually manifests (besides the suspicious ads that know you really like Tide detergent).
imho, one big reason why Data Science as a big org lost clout in tech companies was a tendency to treat DS as gatekeepers of data. Outsourcing the responsibility of stat thinking gave many DS a weird power trip; when one dude gets to decide the trade-offs first without anyone around them needing to understand properly.
> If data about the public is so dangerous that we must disguise the results, then perhaps its data we shouldn’t be collecting in the first place.
By this logic no one should ever collect your address for any reason ever. How do we function as a society if we can’t ever give PII in any context? Anonymization/security is critical and makes a lot of critical functions possible.
How could you receive your mail in a world where we never give out/collect info that is potentially hazardous?
Name, address, and phone number served plenty of critical functions when they were published in the White Pages. Cell phones not being listed there was kind of an accident of history. It was common to call a listed landline and be given or forwarded to a cell number. Only after most people stopped having landlines altogether did a phone number come to be considered sensitive information (unless you were a celebrity or something).
Ironically Facebook is responsible for much of this, as friending someone on Facebook became a lower stakes, less intimate alternative to exchanging phone numbers.
It would entirely be possible to limit the scope of things, by making sure the company that has your address (UPS or USPS, say) never has the other information. Each business would hand off a zero-knowledge identifier to you that you'd give to the others: Amazon would only know that the payment identifier they gave to you was fulfilled at VISA somehow, and then hand the package off to UPS with an identifier that they would never see again.
This is silly.
An argument about whether or not to deploy differential privacy on large statistical databases has no bearing whatsoever on whether or not you give your address to have a package delivered. If you want the package delivered, you have to give your address.
On the other hand, it’s not at all clear that people should have to involuntarily, my force of law, offer up all sorts of personal details about their lives. And questions about whether the use of differential privacy can or should justify the collection of sensitive information are quite valid.
The census is justified by the idea that it will help us plan for the future. But the track record of central planning is poor to disastrous.
A small example: in theory population changes could inform land use decisions. In practice however, the ability of population to increase is softly capped by the amount of housing that exists, or will exist. If you restrict or frustrate housing, you will also restrict people from living where they want to live. Then the planners will point to the census data and tell you that nobody wants to live there and therefore there’s no need for change.
Ironically, if you wanted to measure where people want to live in order to get information for planning purposes, the number is right there and doesn’t require any personal data collection at all - it’s the price. (in this example $ per square foot of floor space). But in my experience people who like central planning don’t believe in prices so they ignore that and they look at their reams of personal data and they conclude that all is well in the world. It is hard for me to be sympathetic if one day folks like that had have less data to look at.
1 reply →
Can anyone explain to me the previous state and why it was desirable? I admittedly do not understand why people are getting riled up. I am not being difficult. I really don't understand the original state and the changed state here.
Read up here for example: https://www2.census.gov/library/publications/decennial/2020/...
The real underlying issue is that (block, sex, age) is basically a unique identifier.
Sounds like a great way to prevent finding irregularities
Amazing how the current US gov is finding every different way to destroy the country from every aspect every single day.
I know it is off topic, and the issues raised here are fairly profound, but I want to share the conformed idea of “Noise infusion banned for industries regulated by the FCC”
Can anyone share how other countries handle this?
A lot of countries are really bad at running their census. https://asteriskmag.com/issues/11/why-governments-cant-count
And a lot of countries have things like national IDs that, rightly or wrongly, given things like RealID and passports, that a lot of Americans just don't like on principle.
Sure, in Europe we don't because we already have databases of all citizens, also recording attributes like race, skin color, religious affiliation or political leaning in a database is highly illegal, both for the government and for private use.
Wait, are you saying Europe doesn't have censuses?
7 replies →
> recording attributes like race, skin color,
The only reason we ever started doing this was to track ex-slaves and their descendants, and after-1965 every other possible grouping of people started begging for a category that it could use to get government grants in some way.
The irony is that now, when censuses somehow desperately need to figure out if you're Armenian or not, they don't count the descendants of slaves at all, preferring to lump them in with every dark-skinned person of partial African descent, but sometimes not the Spanish speakers(?!).
The US Supreme Court made a good decision (on admissions, not on the need for the approval of redistricting maps in places that have continuously attacked slave and Jim Crow descended voters.) The government needs to get out of the race and religious science business. Elected and appointed officials are openly claiming jihadi eschatology as the reason that they're supporting Israel, and openly explaining how the culturally varied mix of people who happened to live in land that Zionists wanted, or the Chinese, are inhuman races that are a threat because of their inhuman behavior and their inhuman values. We've woven church and race deeply into the government again.
The idea that preferential admissions to elite schools was going to somehow offset slavery was laughable anyway. It was just a grievance engine that gave people on top an excuse to feel downtrodden during the one of the most and the first vulnerable times in their lives - when they find out they're too stupid or boring to get into the college they want. I've always been partial to the libertarian solution to the problem of US slavery - Murray Rothbard and others said that according to the Libertarian homesteading principle, slaves should have been awarded the land and the factories that they worked. That it was an injustice that would lead to (what was in his view) catastrophe, such as how the freeing of Russian serfs in 1861 without any of the land still controlled by their ex-masters led to the Russian Revolution 50 years later.
1 reply →
> Maybe the goal is to force the U.S. Census to publish statistics that actually enable re-identification, to help with future gerrymandering efforts?
In case you were wondering why the government would do this, yes, that's exactly why.
I think it should be noted that there was a lot of dissatisfaction from users of the census data as far as I know. So it's not been banned just for politicals sake or because they hate privacy... Some people I talked to in the privacy field even called the whole thing a total disaster and weren't shy to put blame on John Abowd who apparently pushed this through despite a lot of internal opposition and concerns. Not sure if that's true, but what is definitely true is that the way the data was released produced serious issues downstream as most researchers and statisticians that ingested the data weren't prepared for receiving noisy data values. Differential privacy was applied in a way such that many invariants that data users cared about weren't preserved, which was expected as it's not possible as you can't preserve all invariants and at the same time add meaningful noise to the data. The thing is, with such a differentially private data release you need to adapt all of the downstream analyses to take into account the exact mechanism the data was altered in. And since the census bureau used a very intricate mechanism that didn't just add Laplace noise to data values but instead relied on a multi-stage process that preserved some invariants but not others it was very difficult to even write routines to account for the changes being made to the data. They essentially asked of every data user to rewrite their whole analysis pipeline based on the exact disclosure mechanism that contained a large number of bespoke choices regarding which data invariants to preserve and basically produced a mix of noisy, synthesized data that was just really hard to reason about. I don't even know if there even would've been a way to do this better, but the fact is that not every small county or school district has top-tier statisticians at hand that can just read a whole monograph on differentially private synthesized census data and then hotpatch their existing analysis systems to work with that data.
I was a big fan of differential privacy but now I think it might be doing more harm than good, as I haven't seen a single case where it was applied successfully in a problem where it actually mattered, and it contributed strongly to discrediting and preventing a lot of work on other anonymization techniques as it was deemed the only way to preserve privacy by the research community, so showing up with enhancements to k-anonymity or any other noise mechanism not rooted in it was a sure way to get ridiculed and ignored. And it's just not a practical mechanism, even when it works for a single disclosure you always end up having to blow up the privacy budget to a ridiculous amount in order to keep disclosing statistics as otherwise you would for almost all real-world data run out of budget after a few publications.
So, for me it's a technique that works in the areas where it doesn't really matter (publishing highly aggregated statistics that pose almost zero privacy risk even without differential privacy) and doesn't work in other areas where it would actually matter (publishing fine-grained data about individuals or small groups). There are some niche use cases but in my view the privacy community has really overblown the importance of differential privacy by portraying it as the only way to reliably anonymize data.
BTW the German census bureau has an interesting approach to anonymization which they use for several decades already and so far I haven't heard of any cases of successful de-anonymization of the data, maybe the US bureau should have a look at that for their own needs.
Of course there will be dissatisfaction from users of the data. Anyone that wants to use census data will prefer less privacy in the data. And anytime privacy is enforced the data becomes less useful. It would be certainly very convenient for both advertisers and gerrymandering political consultants to have detailed data on every citizen.
As the article says anytime you want to enforce privacy, the data becomes somewhat less useful, there is just no way around that.
The point of rights is that we have them and that they should not be trampled upon when they become slightly inconvenient to someone in power.
Are you sure about that? You are saying that differentially private census data couldn't be used for gerrymeandering and advertisement while non differentially private data could? Hard to believe, I'm not an advertisement or gerrymeandering expert but I would assume people running ads or cutting up districts are mostly interested in aggregate statistics i.e. they won't care about single households? And I would assume they can rely on voter files, party databases etc... And to the contrary there are reports [1] that indicate differential privacy actually makes gerrymeandering analysis more difficult or impossible. So, not really an argument for differential privacy, discriminatory action can be equally well taken based on differentially private data as the government cares about groups not individuals and groups aren't protected by differential privacy. It seems people really fundamentally misunderstand what this technique can achieve and what it won't do.
1: https://pmc.ncbi.nlm.nih.gov/articles/PMC8494446/?utm_source...
2 replies →
> serious issues downstream as most researchers and statisticians that ingested the data weren't prepared for receiving noisy data values
They weren't prepared for data that was obviously noisy. The data has always been inherently inaccurate, and folks just chose to ignore that previously
No, there are dozens of articles discussing the mechanism and explaining the impact it had in different areas e.g. [1,2,3]. And the release mechanism wasn't just "add noise", far from it, you may read the original paper [4] to see how intricate it was, anyone wanting to make real use the resulting data would have needed to understand that approach in detail to work with the resulting data. The report of the national academies [3] is probably the most comprehensive analysis of the mechanism and the complications it introduced, so writing "it has always been inherently inaccurate" is just wrong, this new mechanism was way worse than just introducing unbiased sampling noise.
1: https://www.aeaweb.org/articles?id=10.1257%2Fpandp.20191107&... 2: https://www.science.org/doi/10.1126/sciadv.abk3283?utm_sourc... 3: https://www.nationalacademies.org/read/27150/chapter/14
4: https://hdsr.mitpress.mit.edu/pub/7evz361i/release/2
https://www.npr.org/2026/06/12/nx-s1-5855734/census-bureau-d...
Tax info, criminal records, licenses, identification, and ownership, should be the only records. Census data is profiling, and that never ends well.
I guess this could be implemented externally.
Eg via some app that instructs respondents to enter a specific answer in a pseudorandomly chosen question.
Of course security would be another question.
This is why for the census my forms said that I was a poor widow from Kazakhstan with 9 chidren , no education and an adherent of the Mandean faith.
Never ever provide true information in any form.
So "differential privacy" pretty much sounds like someone gets to modify the results of a census and how it gets modified is entirely up to their discretion.
Seems like something that could be abused to achieve political objectives.
there are obviously measures in place to ensure the added noise is statistically homogeneous. the changes don't affect the final aggregates significantly, just enough to avoid saying much *about any individual person*.
know how you can buy "anonymized" data from data brokers and drill down until it's not anonymous anymore and in many cases point to the exact person? differential privacy would prevent that kind of thing.
If someone actually wanted to achieve political objectives by tampering with census data, there are better means than tampering with homogeneous statistical fuzzing.
>there are obviously measures in place to ensure the added noise is statistically homogeneous
I hope so. What are they?
Not really, it has to be random in a predetermined fashion to be considered differential privacy. It is reversible in the way that someone shouting over an aicraft producing white noise is intelligible.
I guess someone could fiddle with the noise, but then why not nudge the originals? Or more insidiously, control what is published?
If someone modified the original dataset and it was discovered they would be held accountable. However if you have a departmental policy of modifying the data for "privacy reasons" and it just so happened to surrepititiously affect some sort of political outcome then ah geez that just a wacky coincidence not any individuals fault.
1 reply →
Any privacy-diminishing changes at federal level happening during this administration are for one reason only: to amass more power in Conservative administration/governance. At the federal level it's Project 2025, at the state level it's making sure states stay red and disenfranchise minorities.
The better to sell the data, all your privates are belong to us.
The fines for non-compliance are low enough to remain silent.
Do. The American Census Survey (randomly-selected long-form questionairre) is dangerously overinvasive.
I really have to take the anti-noise side here. I get why it's a hard problem, and I get why the Census Bureau thought this was a neat solution. But I'm imagining an accountant stepping through a similar chain of logic:
* I want to accurately report the finances of our company to the best of my ability.
* But that report would allow people to reconstruct private data about the terms of our contracts with various counterparties. I'd really like to avoid that, there's no rule that says we're supposed to release that data. In fact some of those contracts probably came with nondisclosure agreements!
* So here's what I'm going to do. I'm going to calculate our results to the best of my ability, and then I'm going to add random values to them and report only the randomized ones. Any reconstruction people try to do will be wrong because of the randomness.
* If the SEC says "no, you need to report your actual numbers", I will explain to them that there's no such thing as an actual number because all data is noisy.
I can't get behind it.
Applying subjectivity to what they keep and where it's bound, implies that this was always an expression of opinion.
Science intrinsically ignores opinions.
The officials responsible for this smearing of data should be tried. This was a violation of the free speech clause as it coercively manipulated public beliefs. This was a crime against science and civil rights.
This is a rare occasion of the Trump administration getting something right.
Why even do a census if you're just going to synthesize random data as the last step?
The dueling political demands of accuracy and privacy are simply incompatible at some level. After reading this, maybe Hanlon's Razor isn't the right standard. Besides malice and stupidity, there is impossibility. Some problems just aren't solvable under certain constraints. I don't envy the statisticians tasked with finding a politically palatable solution to a math problem.
But the strength of differential privacy is that you can now make this tradeoff explicit and quantify it. I always liked it because it offers a mathematical solution to a policy problem, but then of course it's up to us to decide what parameters and tradeoff to choose. Also, some data might just not get published at all if the privacy implications are too problematic, so differential privacy might buy you more signal!
Yeah, the main issue with differential privacy is that you need competent government officials making decisions who understand math beyond a high school level.
It offers a mathematical description of a policy tradeoff, and the policy makers are apparently setting one of the parameters to zero.
There's a ton of information in the US that is accessible to various degrees--especially through the the deep web much less background investigations. Unless you're a wealthy person who can set up various levels of trusts you can't really hide them.
You can of course disagree about what what should actually be part of a transparent public record. (Though I suspect a lot of people post-date what was generally available in a "phone book.")
I have filled out census forms in the past and it was not a big imposition. During the last census I had supposed census workers showing up at my home multiple times and pushily asking for an in person interview. I told the guy that came initially that I was not interested as I had a full time job, a 5 year old, and newborn twins. He brazenly said “your wife can do it” with zero consideration that she was just cut open weeks prior. A couple weeks later he shows up again at like 7pm pounding on the door right in the middle of the kids bedtime routine. I told him it was a really unwelcome visit and sent him on his way. A couple weeks later a car comes rolling up to the house on a Saturday and the woman driving tells me she is the guys supervisor and they really want the interview. I explained to her the situation, the newborn babies, the previous encounters, etc. she seemed completely undeterred and just went right back to pestering. I told her if anybody from the census came back they should go ahead and bring the sheriff because I’d be calling for trespassing. They finally stopped bothering me.
I was going to build something cool with fable, and now it's banned, feeling disappointed
Stalin's demographic researchers kept disappearing until they came up with the numbers he wanted.
roschdal
The arguments im seeing in here are that census data will lead to a literal holocaust. Histrionic. Makes it seem like this policy was a no brainer
if you want to keep your sanity, I suggest silently adding the phrase
every time you read some politically spiteful news like this
because the next two years are going to become insanely miserable
It’s highly uncertain what will happen in 950 days.
I think it's easy to predict some things that will happen in 950 days
in 950 days there will be several hundred warehouses concentrating over a million people in this country including many thousands of children costing a quarter trillion dollars (already funded)
and the Iran War will still be happening despite over a hundred declared "deals"
and the US will be running Cuba (forcing millions to return there)
statistical noise or the lack of it will be the least of our problems
i think they will use ai as a leverage card to other country to order them
But why?? Differential privacy works? It's not even "woke" or whatever these people perceive. It's just math man...
Data shall set you free... or not
This is a gift to reactionary gerrymandering and voting restriction efforts, along with things like yesterday's FBI raid of an Ohio voting rights organization.
https://www.statenews.org/government-politics/2026-06-12/ohi...
Representative Joyce Beatty is from Ohio and was instrumental in stopping Trump from illegally renaming the Kennedy Center.
https://www.theatlantic.com/culture/2026/06/kennedy-center-b...
Representative Beatty serves her own interests and her involvement Kennedy Center naming was just more of the same performative politics she routinely engages in. She's on the verge of being an octogenarian and missed a number of key votes, like the bill that cut funding to NPR, PBS, and other govt. programs. Kudos to her for working to remove Trump's name from the Kennedy Center but she needs to go.
The removal of his name is not performative since we're in the thick of a cult of personality president (at a bare minimum).
Yet another thing this admin is screwing up. News at 11. Let’s fix this in the midterms by voting out the republicans. That’s it. That should be the sum total of the platform: not republican and not crazy.
[flagged]
[dead]
Census data is extremely powerful. It's why some states lost house seats and why some gained house seats.
It must therefore be maximally transparent. Do you want president Trump or palantir to decide on the "noise infusion" algorithm?
headcount doesn't have to be granular, it has to be accurate. this is about the very useful street- and block-level data.
also, if how would anyone know how accurate the "transparent" number is? if Trump or Thiel can fuck with the fuzzing they can just as do so with the base data.
Frankly i see no reason to keep this data private. They should simply publish a full dataset of the census, with no such data coarsening/differential privacy/ etc...
Fundamentally this is public data. If it's to dangerous to make public, it's too dangerous to collect, and people should be aware of exactly what it is.
There are very few things that the state has data on that should not be made public. Census data is simply not one of those things.
publishing should be the default for any data, and to keep it unpublished should require substantially good reasons that impact the country as a whole. Frankly, if it isn't detailed national defence plans, i struggle to see any data that should not be public.
How hard have you thought about this?
The biggest challenge with running a census is getting people to trust you enough to answer your questions.
A lot of census questions are sensitive. The ACS covers topics like citizenship status, disabilities, income, SNAP assistance, languages spoken at home.
If you want accurate information about the people who live in your country you need the census process to feel as safe for people to respond to as possible.
Are you saying the census shouldn't collect any data that people wouldn't be comfortable publishing? Because that's a recipe for a census that is far less useful for helping the country make useful decisions.
> Are you saying the census shouldn't collect any data that people wouldn't be comfortable publishing? Because that's a recipe for a census that is far less useful for helping the country make useful decisions.
I'll say that. The state representatives should provide congress and the president any data needed to inform policy decisions about the people they represent. And as others have pointed out, other departments and agencies (such as the IRS) have most of the rest of the data required to make policy decisions.
Except for gerrymandering purposes, I fail to see why income, party affiliations, etc., is useful for the purpose the census was created for.
3 replies →
This seems’s like an issue created by congress. the constitution only requires a headcount by state. Maybe they should use another mechanism to collect demographic data. Since the concern is not about representation, but allocation, tax returns seem like an obvious alternative and they are already private and collected at a much more granular level.
5 replies →
The census isn't for helping the country make any decisions other than determining the number of representatives and apportionment of taxes. It should not be collecting any data that isn't necessary for that.
2 replies →
I'd like to know when they stopped publishing census data. I have used it for genealogical purposes to track ancestors: you can see exactly who was living in which house, how they are related, and what their ages are (I found that women in my family often reported, both on the census and marriage documents, being younger than they actually were). I don't think I've seen data from after 1950, though.
I don't understand why the census would include SNAP data or income: surely the government already has that information. I have never doubted that the IRS knows my income better than I do. Maybe better use of existing datasets could restrict the census to less invasive questions.
3 replies →
>Are you saying the census shouldn't collect any data that people wouldn't be comfortable publishing? Because that's a recipe for a census that is far less useful for helping the country make useful decisions.
That seems to me like it's a good thing. Allow people to determine whether the data is actually needed, rather than closing their eyes.
This is the real reason for the fudging of the data. People don't want an ethnicity/citizenship status/birth country breakdown of things like benefit use.
Thank you for writing a much more thoughtful reply to this comment than I was drafting
Replying to the ACS with accurate information is required by law, so they don't actually need to rely on people feeling safe to get answers.
I don't trust the Census Bureau with my data, so if this is as "dangerous" as the author and some people here seem to think, they shouldn't be collecting it in the first place.
5 replies →
[flagged]
3 replies →
1. People give the information to the government under the expectation that this data is to be kept private or used in such a way that individual targeting is made impossible, you break that expectation and people will lie or won't give you this data.
2. Without noise injection it's rather simple to do statistical attacks to reverse engineer individual entities.
3. This data is and has already been used in the past to undermine democratic systems by targeting and disenfranchising minorities, as well as gerrymandering the US to hell.
4. "Too dangerous to make public, too dangerous to collect" - this is a false dichotomy. To govern effectively you need sensitive data, but it should be collected and used in a way that's safe for the individuals.
5. Macro level aggregates don't need individual exposure, that's why noise, anonymization and statistical functions are fine.
Re point 1, not just an expectation, and explicit legal requirement.
> They should simply publish a full dataset of the census, with no such data coarsening/differential privacy/ etc...
They do. After a substantial delay. Pretty handy for geneological research, while protecting privacy for the living.
That's a good default position, and I think should be our starting point.
But the devil is in the details. If we don't want advertisers constructing semi-complete profiles from simple web interactions then why would we publish 330 million census questionnaires for their use?
>If it's to dangerous to make public, it's too dangerous to collect, and people should be aware of exactly what it is.
While this may be a reasonable stance in theory, there are many examples in reality where the danger has not materialized for decades. Personally, I have access to health records, birth certificates, and death certificates collected by a state. They contain very personal information. As far as I know, they have not been leaked to the general public.
This is one of those situations where everything you hear tells you the system is failing, but that's because nobody talks about the systems which haven't failed.
Besides, this possible failing of the Census' privacy promises shouldn't convince us that "If only we hadn't given info to the despotic and cruel government using it to target people, then we'd only have a despotic and cruel government hurting people randomly." The solution to this problem isn't to withhold info, it's to get rid of the despots.
So do you believe that individual income should be public? Or do you believe that the government should not take income into account for taxation or distribution of benefits?
some countries do make income available publicly.
even the USA does it for public employees in many states.
Then dox yourself right now with your previous census answers and PII. There are several obvious reasons to keep the data private, all you have to do is use your brain.
I've never met a "privacy is irrelevant" advocate that doesn't close the door when they go to the toilet
Don’t quit your day job. One guess as to what gender, sexual orientation, and skin colour you have.
But why is the census asking about those attrbutes at all. The Constitution requires a count. That's it. A number. We don't need to know the rest of it, or if we do, it should be surveyed separately with voluntary participation.
8 replies →
We can make them more accurate by leveraging ICE going door to door.
There will be a bunch of people that start off with the premise that this data should be private and make following arguments based on this premise.
So I'll just go ahead and ask, give me good reasons why this data should be private?
My guess is that most of you think we should be counting illegals because they should have representation. And I reject that
It’s because people are significantly more likely to lie or omit some facts if you don’t guarantee their privacy, which means your census data ends up being worth less than a pile of shit.
The alternative is to water down the census questions, which also leads you down the same path (i.e. manure as data).
So you seem to have at least a surface level of understanding of incentives.
Check this then:
If the census is responsible for allocating federal funds and congressional apportionment, what are the incentives for making census data private and encouraging people that would otherwise hide their identity?
1 reply →
How about we should be "counting illegals" so that we know how many of them there are?
(Do you reject that? As someone who uses the phrase "counting illegals" I imagine you would be interested in knowing what that number is.)
Counting illegals on a poorly defined framework of which is largely self attestation?
1 reply →
First off the census is used for determining how many seats are used for congressional apportionment and allocating federal funds.
So unless you're willing to also say that counted illegals cannot used for either of those, then you're just being obtuse.
But if we can agree that they cannot be used for that then sure, lets identify and count them. If we can't identify (make non-private) and count them then why should we trust that those counts are accurate?
1 reply →
Adding fake data (noise) officially to an important data such census, is the height of weirdness of the West. The nations are totally confused between privacy and visibility requirements. The privacy and freedom is effectively working against the very foundations of the nation, as the binding force between elements of a nation is directly affected by privacy.
Excessive obsession with equality is another thing that works to erase any cognitive abilities of the people to recognize differences in gender, race, age, culture etc. Equality is good to a reasonable extent but it shouldn't be forced to an extent to erase the cognitive capabilities gained through evolution.