Comment by 0xDEAFBEAD
1 day ago
So which of these benchmarks are most relevant for an ordinary user who wants to talk to AI about their health issues?
I'm guessing HeadQA, Medbullets, MedHallu, and perhaps PubMedQA? (Seems to me that "unsupported speculation" could be a good thing for a patient who has yet to receive a diagnosis...)
Maybe in practice it's better to look at RAG benchmarks, since a lot of AI tools will search online for information before giving you an answer anyways? (Memorization of info would matter less in that scenario)
No comments yet
Contribute on Hacker News ↗