Comment by mday27
20 hours ago
hallucination on steroids, wow. I had to read through the abstract to believe it:
"In the most extreme case, our model achieved the top rank on a standard chest Xray question-answering benchmark without access to any images."
I still don't quite understand, after skimming the paper. How does it achieve high scores without access to the images (beating even humans with access to the images)?
The paper gives an example of a question:
And an example of the answer (generated without the referenced image)
How is it doing this? There are two obvious options:
1. Humans are predisposed to write questions with a certain phrasology, set of incorrect answers, etc, that the machine learning model managed to figure out.
2. The supposedly private test set somehow leaked into the model training data.
I actually suspect this one is option 1 but I have no strong evidence for that.