Comment by antichronology

17 hours ago

This is a really interesting direction. There is this big field of Cell Free (cfRNA) cancer detection. We talked to a few people in the field and think that embedding sequences for this direction could be really valuable. One challenge here is that it's hard to set up evaluation tasks since the public data is scarce

Maybe we can crowd source data. My platform, currently in beta, has ai assistants for compute infrastructure and biology and will soon let people to do self serve research on their own omics data using models like yours. So there could be a monetization path too if enough people start looking their own cell data (which they might once they fully understand the risks of engineered pathogens, and certainly will when the risks materialize and start hitting home). Email in bio if you want to brainstorm.

  • That would be really cool. Navigating SRA and mining out reasonable $ relevant tasks is a huge bottleneck.

    I find it takes a large amount of effort to parse what the authors are doing, whether the data is high quality, and how to pre-process it in a way that makes sense for the task at hand.

    Would love to chat more about how you're thinking of evaluating quality of these agents.