← Back to context

Comment by jmole

1 day ago

Ban it from the dataset, add it to the analysis. You can choose your own flavor of noise.

I don't know what the political undertones are here, but at some level you need to have actual ground truth, including "this person/household declined".

Publishing raw data though? That seems like shooting yourself in the foot from a national security perspective, not to mention all the other reasons not to do it.

> Ban it from the dataset, add it to the analysis. You can choose your own flavor of noise.

It is introduced in the public data, not the secret data.

> Ban it from the dataset, add it to the analysis. You can choose your own flavor of noise.

Not sure exactly what you're proposing, but if the noise is added independently to different people, you can just buy multiple copies to reduce it.

There are a lot of ways to do this wrong, which is why so much analysis has gone into differential privacy.

  • Sorry, I think you're reading more into this than I intended to say. My point was that the raw data itself doesn't need noise, but the published data necessarily does.