Comment by nextos

8 months ago

I found it disappointing that they ignored one of the biggest problems in the field, i.e. distinguishing between causal and non-causal variants among highly correlated DNA loci. In genetics jargon, this is called fine mapping. Perhaps, this is something for the next version, but it is really important to design effective drugs that target key regulatory regions.

One interesting example of such a problem and why it is important to solve it was recently published in Nature and has led to interesting drug candidates for modulating macrophage function in autoimmunity: https://www.nature.com/articles/s41586-024-07501-1

6 comments

nextos

rattlesnakedave 8 months ago

Does this get us closer? Pretty uninformed but seems that better functional predictions make it easier to pick out which variants actually matter versus the ones just along for the ride. Step 2 probably is integrating this with proper statistical fine mapping methods?

nextos 8 months ago
Yes, but it's not dramatically different from what is out there already.
There is a concerning gap between prediction and causality. In problems, like this one, where lots of variables are highly correlated, prediction methods that only have an implicit notion of causality don't perform well.
Right now, SOTA seems to use huge population data to infer causality within each linkage block of interest in the genome. These types of methods are quite close to Pearl's notion of causal graphs.
- cysteinechapel 8 months ago
  
  To push back a little, isn't there causality inherently in these sequence-to-function models, in the sense that causality must proceed in the direction of genetics -> predicted molecular function? And the genetic contribution to phenotype must pass through molecular function.
- ejstronge 8 months ago
  
  > SOTA seems to use huge population data to infer causality within each linkage block of interest in the genome.
  This has existed for at least a decade, maybe two.
  > There is a concerning gap between prediction and causality.
  Which can be bridged with protein prediction (alphafold) and non-coding regulatory predictions (alphagenome) amongst all the other tools that exist.
  What is it that does not exist that you "found it disappointing that they ignored"?
  
  1 reply →
cysteinechapel 8 months ago

There are existing frameworks for integrating functional and statistical fine mapping methods (e.g. polyfun + susie/finemap). They use annotation overlaps like epigenetic or conservation tracks but can be extended to variant effect predictions from models like this. They essentially modify the prior probability of a variant being causal from uniform to one that depends on the functional annotation.