Comment by COGlory
2 years ago
>I would like to correct somethign here- it does predict structures de novo and predict folds that haven't been seen before. That's because of the design of the NN- it uses sequence information to create structural constraints. If those constraints push the modeller in the direction of a novel fold, it will predict that.
Could you expand on this? Basically it looks at the data, and figures out what's an acceptable position in 3D space for residues to occupy, based on what's known about other structure?
I will update my original post to point out I may be not entirely correct there.
The distinction I'm trying to make is that there's a difference between looking at pre-existing data and modeling (ultimately homology modeling, but maybe slightly different) and understanding how protein folding works, being able to predict de novo how an amino acid sequence will become a 3D structure.
Also thank you for contacting CASP about this.
From what I can tell, the model DM built is mining subtle relationships between aligned columns of multiple sequence alignments and any structural information which is tangibly related to those sequences. Those relationships can be used to infer rough atomic distances ("this atom should be within 3 and 7 angstroms of this other atom"). A large matrix (partially filled out) of distances is output, and those distances are used as constraints in a force field (which also includes lots of prior knowledge about protein structure) and then they run simulations which attempt to minimize both the force field and constraint terms.
In principle you don't even need a physical force field- if you have enough distance information between pairs of atoms, you can derive a plausible structure by embedding the distances in R3 (https://en.wikipedia.org/wiki/Distance_geometry and https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21...
Presumably, the signal they extract includes both rich local interactions (amino acids near in sequence) and distant ones inferred through sequence/structure relationships, and the constraints could in fact push a model towards a novel fold, presumably through some extremely subtle statistical relationships to other evolutionarily related proteins that adopt a different fold.
> The distinction I'm trying to make is that there's a difference between looking at pre-existing data and modeling (ultimately homology modeling, but maybe slightly different) and understanding how protein folding works, being able to predict de novo how an amino acid sequence will become a 3D structure.
Your objection is that alphafold is a chinese room?
What does that matter? Either it generates useful results or it doesn't. That is the metric we should evaluate it on.
Because it's being presented as something that it isn't. It's a better way to analyze data that we got experimentally, and to predict how new data will fit into what we know. It's not de novo understanding, which is the holy grail and what the field is ultimately trying to accomplish. It's Tesla's adaptive cruise control being sold as full self driving. Yes, they are close things - one is an approximation of the other, but being really really good at adaptive cruise control has basically zero carryover to full self driving. FSD isn't a linear progression from adaptive cruise control, and understanding how proteins fold isn't a linear progression from AlphaFold sequence homology/homology modeling. It's not even close to the same thing, AlphaFold doesn't even move the needle for our understanding of how proteins fold, and yet it's sucking all the air out of the conversation by presenting itself like it solved this problem.
It's a really good, fancy model completely reliant on data we already have empirically (and therefore subject to all the same biases as well).
I'm assuming "de novo" means from first principles?
i really don't think anyone is presenting alphafold as if its a physics simulator operating from first principles.
Like obviously alphafold does not "understand". Maybe i have blinders on for being in the computer field, but i would assume that it goes without saying that a statistical deep learning AI model does not tell us how to solve the problem from first principles.
Like yes, alphafold isn't the final chapter in protein folding and that is obvious. But it seems a stretch to dismiss it on those grounds. If that's the metric we're going with then we can dismiss pretty much everything that has happened in science for the past thousand years.
> re self driving car metaphor
I think this is a bad metaphor for your purposes, because self-driving cars aren't de novo understanding, and arguably do have some carry over from things like adaptive cruise control.