← Back to context

Comment by rapatel0

2 years ago

> Basically, you want to replace statistics ("large enough samples will average out variation") with AI

No. This is misunderstanding due to my lack of clarity. Apologizes.

The biggest problems with Medical data is that 1) incredibly small scale data is collected to make assertions, 2) the data is horrendously de-normalized

In Radiology, a common validation approach is have radiologists review cases on the order of 250 studies to make assessments about a radiology product. This is considered the gold standard for FDA. Look into it more, vaccines, treatments. The sample sizes are fucking tiny.

The statistical assertion is that these relatively small samples capture the variation sufficiently to demonstrate efficacy across devices, treatments. These are then extrapolated to the US and wider populations. Do you believe that this is rigorously true?

The rationale underpinning this is simply practicality. You cannot get thousands (or hundreds of thousands) of patients/doctors/etc to get a strong signal and confidence. For drugs, it's super hard, but for devices and software interventions, it way easier to get data.

That brings us to the second big problem: the data structure is completely highly varied and denormalized.

1) From a pure structure point of view, it's basically free text fields that doctors sporadically fill.

2) From a underlying truth point of view, each hospital across the world has different protocols for care delivery. A histopathological FNA procedure might have a completely different meaning in CA, NY, or EU. This might simply be because of workflow, timeline, or just people using the words wrong.

What I mean with AI+Sensors:

AI doesn't need to solve the problem of intuition around medical problems. The biggest impact will likely come from the relatively mundane task of simply structuring and normalizing the data. Sensors simply help to generate more data

To be more concrete, you don't want (and shouldn't trust) this:

Prompt: "Please diagnose this person"

You want this:

Prompt: "Here is 100 TB of data from 100 different hospitals each with different workflows and patients for histopathology. For each patient, synthesize a CSV with the following schema "AnonPatientID, AnonCaseID, Pathology Result, Pathology stage, Incidenctal findings, ..."

Then I can do the analysis myself.

Hope this makes more sense.

(I work in healthtech and med device and I promise you: Demoralization at the state of medicine is a rite of passage after which you can begin to address practical problems)