← Back to context

Comment by rented_mule

20 days ago

Something that seems inherently different between GP's comment and encryption is that encryption is an algorithm / tool, not a dataset. Not creating literal tools because they might have bad use cases is clearly a bad idea (e.g., fire, knives, hammers, etc.).

I'd say that one thing inherently different about datasets is that they are continually used badly, including by well-meaning actors. Data is frequently misinterpreted, with good intent, to draw bad conclusions.

You might hit your thumb with a hammer. That hurts! People would be a lot more careful if misinterpreting data had such clear, immediate effects on them.

Also, there are many different groups with different passionate opinions in any community as large as this one.

What is the distinction you are making between a "dataset" and a "tool"?

To use this specific example of the license plate dataset, this is a tool used to find and bring justice to criminals. How is it any different from any other tool at the disposal of law enforcement? Isn't this system just a scaled up version of a cop with a camera?

  • This might be too pedantic, but a dataset is not a tool in and of itself. It's something that can be processed by a tool. And it's not simple for anyone to reproduce without significant access, either to the original observational opportunities or to the dataset itself. Information about individuals is often in datasets and those people too seldom have a say in the security practices used to safeguard it.

    Tools (or pick another word that illustrates this distinction) like encryption, hammers, etc. do not contain our information. They are fairly straightforward to reproduce. And therefore nearly impossible to contain. Bad actors will have encryption and hammers, whether we want them to or not. The only question is whether good actors will also have them, or if they will be restricted by laws. This, for example, can make it easier for datasets to fall into the wrong hands, because they are less likely to be encrypted.

    • It seems very strange to define these terms based off the difficulty in reproducing them.

      Let's look at the sibling comment's example of a nuclear bomb. That's "not simple for anyone to reproduce without significant access" and as citizens we don't "have a say in the security practices used to safeguard it." And international laws have done a relatively good job keeping them out of the hands of bad actors. Does that make them a dataset?

      Contrast that with data that is easy to reproduce, like say the name of the 45 different Presidents of the US. That is obviously a dataset. Yet there is no private information involved, it is all public data. Many people can even produce that list entirely from memory. But having that list on a piece of paper in front of me could still be a helpful tool if I was taking a US history test.

  • Isn't an atomic bomb just a scaled up version of a firecracker?

    Nobody denies that collection of datasets can have upsides. But the downsides are often not seen/evaluated accurately. And negative effects don't necessarily scale with the same power as positive effects.

    • >Isn't an atomic bomb just a scaled up version of a firecracker?

      Yes and no. I think radiation is a big differentiator, but absent that, I don't think it is better morally or ethically to level a city with conventual bombs than it would be to do it with a nuclear bomb.

      >Nobody denies that collection of datasets can have upsides. But the downsides are often not seen/evaluated accurately. And negative effects don't necessarily scale with the same power as positive effects.

      I'm not disagreeing with this. I'm asking why this same logic is not applied elsewhere.

      5 replies →