Comment by nesk_
12 hours ago
A friend of mine, a dermatologist, told me that LLMs are quite performant for melanoma analysis. Based on their own statistics, LLMs are able to beat humans with ~10 years of experience in the field.
They will never beat the human instinct tho, but they can be great tools sometimes. Unfortunately, LLMs mostly produce garbage.
Whenever it comes to medical diagnosis I would caution anyone to be careful with what “beat humans” really means.
In real life pathology is a spectrum not a binary and physicians are not trained to be 100% accurate instead optimizing sensitivity and specificity considering pretest probability as well as the harms of overdiagnosis and under diagnosis for a given scenario.
For something like melanoma which is relatively easy to diagnose with a superficial, extremely low risk skin biopsy and where early staging dramatically improves outcomes you would want to design around overcalling (high sensitivity) rather than maximize accuracy given the significant harms with false negatives and minimal harms with false positives.
An AI may be more accurate at classifying melanoma/not melanoma but if it does not meaningfully improve on the clinical threshold of biopsy/no biopsy or result in less biopsies that accuracy is wasted and may even be detrimental.
Note: I am just using this as an example to illustrate the considerations.
I don't think your friend understands Large LANGUAGE models.
You do realize that today’s multi-modal LLMs are actually able to understand images? They’re tokenized very much like language is.