← Back to context

Comment by rockskon

13 hours ago

Isn't AI routinely making significant mistakes in analyzing medical imaging?

My understanding is that it’s better than doctors themselves. But it’s probably the same as with autonomous driving: the bar isn’t just “be as good as humans”, it’s “be flawless”.

  • It’s actually quite a lot worse than even doctors in training except for highly constrained experimental settings and a few very nice applications that are mostly too tedious/impractical for a human to do or are very basic detection tasks.

    I am a radiologist and researcher predominately focused on AI.

    • I work with pathologists and radiology is way ahead of us with AI use in clinical setting (but still not very far). Only things that get serious use are lab-developed (ie not commercial) image analysis algorithms for very limited (tedious, error-prone and ultimately not that often used) biomarkers. Don't believe the hype.

      You could also look at the market, one of the biggest players, Paige, was acquired for about 30% of the money they raised.

    • A friend of mine, a dermatologist, told me that LLMs are quite performant for melanoma analysis. Based on their own statistics, LLMs are able to beat humans with ~10 years of experience in the field.

      They will never beat the human instinct tho, but they can be great tools sometimes. Unfortunately, LLMs mostly produce garbage.

      4 replies →

  • Last time I checked thoroughly (roughly two years ago), AI (in the form of small ML models) mostly outperformed radiologists in areas where the gold standard is "one level" above imagining wise. By that I mean that you train a model to detect on an X-ray what would normally need a CT. Or train it to see on a non-contrast CT what would normally need contrast or an MRI, or biopsy, and so on.

    Essentially the cutting edge reaches up to 99% of human performance on the task it is trained, which is good enough for triage but not for a final diagnosis. However, magic sometimes happens when you train a model to detect something, which you already know is there, on an examination that is cheaper, faster or less invasive than the human"gold standard". Conveniently, this dataset exists since it's common to first do a cheap examination like an X-ray, and then escalate if nothing is found (or if something is found that you want to see better, or a number of other possibilities).

    Examples of AI outperforming humans like this includes AI detecting sacral fractures on x-rays better than radiologists (who normally take a CT to conclusively exclude it), detecting potential precursors to pancreatic cancer on non-contrast CTs (where contrast or an MRI is usually required) and detecting an occluded coronary artery on an ECG without the archetypical "ST-elevation changes".

    See the link below for references: https://pmc.ncbi.nlm.nih.gov/articles/PMC9478257/ https://www.nature.com/articles/s41591-023-02640-w https://rebelem.com/a-winning-hand-in-cardiology/

    So AI, as a general rule, doesn't usually match or exceed the upper bound of the "gold standard" medical performance. But it tends to carry the quality of the upper bound downwards towards the faster, less expensive and invasive methods. In some cases, like in the case of EKGs, that's huge. In some cases it saves time, in some cases it decreases miss rates from tired radiologists or triages their review feed. And in some cases it's not very useful.

    LLMs doesn't come close to specialized radiology models at the moment, because LLMs are more about applying knowledge than creating new correlations. Of course that's also hugely useful, but that's a bit of a different topic to unpack.

  • With these kinds of things, I want to see comparisons to trained, alert humans. Cut out all the distracted, stressed, tired, incompetent, intoxicated cases from the baseline. That includes rushed doctors at the end of a long shift.

    A self driving car doing better than a drunk on the freeway doesn't reassure me that it'll do better than sober me in a snowstorm.

    • That would be a fine bar if you could ensure your doctors or nearby drivers aren't distracted, stressed, tired, incompetent, or intoxicated.

  • I’ve seen the same. But I don’t see that as a glowing beacon of progress.

    A whole lot of doctors, if not most, didn’t pick their profession out of an interest in medicine…