Comment by nextos

8 months ago

Yes, but it's not dramatically different from what is out there already.

There is a concerning gap between prediction and causality. In problems, like this one, where lots of variables are highly correlated, prediction methods that only have an implicit notion of causality don't perform well.

Right now, SOTA seems to use huge population data to infer causality within each linkage block of interest in the genome. These types of methods are quite close to Pearl's notion of causal graphs.

3 comments

nextos

cysteinechapel 8 months ago

To push back a little, isn't there causality inherently in these sequence-to-function models, in the sense that causality must proceed in the direction of genetics -> predicted molecular function? And the genetic contribution to phenotype must pass through molecular function.

ejstronge 8 months ago

> SOTA seems to use huge population data to infer causality within each linkage block of interest in the genome.

This has existed for at least a decade, maybe two.

> There is a concerning gap between prediction and causality.

Which can be bridged with protein prediction (alphafold) and non-coding regulatory predictions (alphagenome) amongst all the other tools that exist.

What is it that does not exist that you "found it disappointing that they ignored"?

nextos 8 months ago

> This has existed for at least a decade, maybe two.
Methods have evolved a lot in a decade.
Note how AlphaGenome prediction at 1 bp resolution for CAGE is poor. Just Pearson r = 0.49. CAGE is very often used to pinpoint causal regulatory variants.