Comment by govideo
1 day ago
ML first, then Bio and Data. Of course, interconnectedness runs high (eg just read about ML for non-random missingness in med records) and that data is the foundational bottleneck/need across the board.
Interesting anecdote abt Stanford doctors annotating QA question!
Each of your comments get my mind going... I'm going to think about them more and may ping you on other channels, per your profile. Thanks!
More like alarming anecdote. :) Google did a wonderful job relabeling MedQA, a core benchmark, but even they missed some (e.g., question 448 in the test set remains wrong according to Stanford doctors).
For ML, start with MedGemma. It's a great family. 4B is tiny and easy to experiment with. Pick an area and try finetuning.
Note the new image encoder, MedSigLIP, which leverages another cool Google model, SigLIP. It's unclear if MedSigLIP is the right approach (open question!), but it's innovative and worth studying for newcomers. Follow Lucas Beyer, SigLIP's senior author and now at Meta. He'll drop tons of computer vision knowledge (and entertaining takes).
For bio, read 10 papers in a domain of passion (e.g., lung cancer). If you (or AI) can't find one biased/outdated assumption or method, I'll gift a $20 Starbucks gift card. (Ping on Twitter.) This matters because data is downstream of study design, and of course models are downstream of data.
Starbucks offer open to up to three people.