← Back to context

Comment by ACCount37

5 hours ago

Not giving the data to researchers means not getting the scientific benefits from that data. Which was the point of collecting that data in the first place.

Reckless harm prevention is the root of many evils.

As a biostatistician who's touched epidemiological studies, I'd argue losing the trust of participants and the public is one of the biggest threats to the viability of the whole research enterprise. It's reckless to jeopardize that as well. Conversely, this dataset will be mined for at least 30-50 years - there are an infinite number of questions that can be asked of this dat. Given that timescale, I think a little delay here is acceptable.

It's not a zero-sum game, you can both protect people and reap the benefits of health data. Many countries have much safer approaches. UK Biobank typically leads with the scale of the data, but not with its infrastructure.

That’s a false dichotomy.

Sensitive research systems thread that needle by giving remote access to researchers with the data in the control and supervision of the responsible organization. Strong internal data access controls and data siloing alongside strict verified extraction routines. Specifically: limited project-dedicated DB access, full logging of data interactions, and full lockouts/freezes if something feels off.

‘The five safes’ is a good presentation from the NHS(?) a decade ago covering the approaches.

Data publishing restrictions around health data aren’t reckless. Modern computing and digital permanence mean we have to be extra cautious.

  • No, this is a real tradeoff.

    Any friction you add to "access the data" process makes it harder for legitimate researchers to get access to, and get benefits from that data.

    So, at what point do stricter data controls begin to choke you at the throat?

  • We have dozens of data / db startups - kinda odd that there isnt one (I have seen) that focuses on this problem.

    Perhaps our future ai overlords will feel its important to compartmentalise, and log data access more agressively.