← Back to context

Comment by dekhn

2 years ago

From what I can tell, the model DM built is mining subtle relationships between aligned columns of multiple sequence alignments and any structural information which is tangibly related to those sequences. Those relationships can be used to infer rough atomic distances ("this atom should be within 3 and 7 angstroms of this other atom"). A large matrix (partially filled out) of distances is output, and those distances are used as constraints in a force field (which also includes lots of prior knowledge about protein structure) and then they run simulations which attempt to minimize both the force field and constraint terms.

In principle you don't even need a physical force field- if you have enough distance information between pairs of atoms, you can derive a plausible structure by embedding the distances in R3 (https://en.wikipedia.org/wiki/Distance_geometry and https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21...

Presumably, the signal they extract includes both rich local interactions (amino acids near in sequence) and distant ones inferred through sequence/structure relationships, and the constraints could in fact push a model towards a novel fold, presumably through some extremely subtle statistical relationships to other evolutionarily related proteins that adopt a different fold.