Comment by stingraycharles
13 hours ago
My understanding is that it’s better than doctors themselves. But it’s probably the same as with autonomous driving: the bar isn’t just “be as good as humans”, it’s “be flawless”.
13 hours ago
My understanding is that it’s better than doctors themselves. But it’s probably the same as with autonomous driving: the bar isn’t just “be as good as humans”, it’s “be flawless”.
It’s actually quite a lot worse than even doctors in training except for highly constrained experimental settings and a few very nice applications that are mostly too tedious/impractical for a human to do or are very basic detection tasks.
I am a radiologist and researcher predominately focused on AI.
I work with pathologists and radiology is way ahead of us with AI use in clinical setting (but still not very far). Only things that get serious use are lab-developed (ie not commercial) image analysis algorithms for very limited (tedious, error-prone and ultimately not that often used) biomarkers. Don't believe the hype.
You could also look at the market, one of the biggest players, Paige, was acquired for about 30% of the money they raised.
First I'm hearing that is a lot worse - do you have sources? Genuinely curious
Thanks for the informed take :)
Do you think this will result in more routine/boring/tedious tests? Is the bottleneck on these things the human time to analyse them?
I don’t think so, not beyond the current trend in medicine which is going up anyway.
For some things, like 3D volume segmentation of structure or disease (e.g. CVA/stroke volume, cardiac muscle mass, iron quantification) the bottleneck is the time it takes so we currently use approximations like single longest dimension, circular regions of interest, etc. AI will dramatically increase accuracy allowing for more accurate treatment and easier large scale research with quantitative endpoints.
Other things people think of like detection of aneurysms, fracture, lung nodules are not “hard” but AI has already added and will continue to add the second-reader benefit which will reduce detection errors. For this category the clinical benefit is as of yet unclear and we know that increased detection does not necessarily translate into improved patient outcomes and can in fact make them worse from over-diagnosis which means investigation related harms and over-treatment.
We were already in a phase of “over detection” in much of radiology with advances in imaging technology so the incremental benefit of current AI remains to be seen and I personally think is going to be much more limited. I had a case recently where a 2 mm brain aneurysm was missed on 3 CT scans over 10 years but was picked up by AI so now is being followed annually. This is too small to treat considering the risks and a serious argument could be made that 10 years of stability is proof enough that this is almost certainly clinically irrelevant for this patient.
Far more interesting areas of AI in imaging are in acquisition of acceleration (i.e. the medical equivalent of upscaling) which can dramatically decrease costs and increase accessibility as well as analyzing imperceptible features.
It may not be a popular take here but in my opinion the future of radiology is like what we see in software engineering today - a skilled human equipped with AI will outperform humans without AI and AI without humans, the latter of which we are still several years away from prototyping due to various technical hurdles.
1 reply →
A friend of mine, a dermatologist, told me that LLMs are quite performant for melanoma analysis. Based on their own statistics, LLMs are able to beat humans with ~10 years of experience in the field.
They will never beat the human instinct tho, but they can be great tools sometimes. Unfortunately, LLMs mostly produce garbage.
Whenever it comes to medical diagnosis I would caution anyone to be careful with what “beat humans” really means.
In real life pathology is a spectrum not a binary and physicians are not trained to be 100% accurate instead optimizing sensitivity and specificity considering pretest probability as well as the harms of overdiagnosis and under diagnosis for a given scenario.
For something like melanoma which is relatively easy to diagnose with a superficial, extremely low risk skin biopsy and where early staging dramatically improves outcomes you would want to design around overcalling (high sensitivity) rather than maximize accuracy given the significant harms with false negatives and minimal harms with false positives.
An AI may be more accurate at classifying melanoma/not melanoma but if it does not meaningfully improve on the clinical threshold of biopsy/no biopsy or result in less biopsies that accuracy is wasted and may even be detrimental.
Note: I am just using this as an example to illustrate the considerations.
I don't think your friend understands Large LANGUAGE models.
1 reply →
What are the premiums like for this flawless AI radiologists malpractice insurance?
Last time I checked thoroughly (roughly two years ago), AI (in the form of small ML models) mostly outperformed radiologists in areas where the gold standard is "one level" above imagining wise. By that I mean that you train a model to detect on an X-ray what would normally need a CT. Or train it to see on a non-contrast CT what would normally need contrast or an MRI, or biopsy, and so on.
Essentially the cutting edge reaches up to 99% of human performance on the task it is trained, which is good enough for triage but not for a final diagnosis. However, magic sometimes happens when you train a model to detect something, which you already know is there, on an examination that is cheaper, faster or less invasive than the human"gold standard". Conveniently, this dataset exists since it's common to first do a cheap examination like an X-ray, and then escalate if nothing is found (or if something is found that you want to see better, or a number of other possibilities).
Examples of AI outperforming humans like this includes AI detecting sacral fractures on x-rays better than radiologists (who normally take a CT to conclusively exclude it), detecting potential precursors to pancreatic cancer on non-contrast CTs (where contrast or an MRI is usually required) and detecting an occluded coronary artery on an ECG without the archetypical "ST-elevation changes".
See the link below for references: https://pmc.ncbi.nlm.nih.gov/articles/PMC9478257/ https://www.nature.com/articles/s41591-023-02640-w https://rebelem.com/a-winning-hand-in-cardiology/
So AI, as a general rule, doesn't usually match or exceed the upper bound of the "gold standard" medical performance. But it tends to carry the quality of the upper bound downwards towards the faster, less expensive and invasive methods. In some cases, like in the case of EKGs, that's huge. In some cases it saves time, in some cases it decreases miss rates from tired radiologists or triages their review feed. And in some cases it's not very useful.
LLMs doesn't come close to specialized radiology models at the moment, because LLMs are more about applying knowledge than creating new correlations. Of course that's also hugely useful, but that's a bit of a different topic to unpack.
With these kinds of things, I want to see comparisons to trained, alert humans. Cut out all the distracted, stressed, tired, incompetent, intoxicated cases from the baseline. That includes rushed doctors at the end of a long shift.
A self driving car doing better than a drunk on the freeway doesn't reassure me that it'll do better than sober me in a snowstorm.
That would be a fine bar if you could ensure your doctors or nearby drivers aren't distracted, stressed, tired, incompetent, or intoxicated.
How does sober you in a snowstorm cause the drunk on the freeway not to drive?
Non sequitur. The core idea is that if you have just self-driving cars you won't be trained enough to drive properly next time you're caught in a blizzard, because you never drove for the last 5 years.
I also question if the kind of person who actually drives while drunk - knowing perfectly by thousand of society inputs and peer pressure that it is wrong - will care enough to buy a self-driving car.
It doesn't, but I'm not going to trust my own safety to a self driving car that can only be said to be better than the worst drivers. It's a bad baseline.
I’ve seen the same. But I don’t see that as a glowing beacon of progress.
A whole lot of doctors, if not most, didn’t pick their profession out of an interest in medicine…