← Back to context

Comment by michaelt

17 hours ago

> It has given 20,000 researchers around the world access under strict agreements that prohibit sharing data further.

To me it seems rather naive to have done that.

After all, you can't un-leak medical data. So even if the "strict agreement" included huge punishments, there's no getting the toothpaste back in the tube.

If you want to ensure compliance before a leak happens you have to (ugh) audit their compliance. And that isn't something that scales to 20,000 researchers.

Too late to do anything about it now though :(

One of the favorite lessons I learned is that anything at scale has to be designed for idiots. I am pretty sure every person reading this has had days where they have done absolutely stupid things without realizing. Now assume there are thousands of users, and you could be providing tools to the smartest people in the world and still have people do stupid stuff all the time. This doesn't just apply to UX.

Then there's the question of trust. You probably have friends you know not to tell certain secrets to, because they believe they get to delegate your secrets onwards to people they trust. The further away someone is from you, the less respect they will show. Researchers have been loaning the dataset in good faith to people who they trust, but who probably didn't take the whole secrecy thing as seriously.

With 20k researchers this was inevitable. The kind of factors above need to be factored in when designing on what grounds such a dataset is to be released.

Not giving the data to researchers means not getting the scientific benefits from that data. Which was the point of collecting that data in the first place.

Reckless harm prevention is the root of many evils.

  • As a biostatistician who's touched epidemiological studies, I'd argue losing the trust of participants and the public is one of the biggest threats to the viability of the whole research enterprise. It's reckless to jeopardize that as well. Conversely, this dataset will be mined for at least 30-50 years - there are an infinite number of questions that can be asked of this dat. Given that timescale, I think a little delay here is acceptable.

That’s insane. And what does researcher even mean - some random university student? What would they know about securing that data? I wonder if the people whose data is out there even know this is happening