Comment by COGlory

2 years ago

>Domains with novel folds usually have a low number of remotely homolog sequences and for that reason the method will fail, not just because they are novel.

How can you say this but not believe it's doing homology modeling?

Because homology search is not homology modelling. And a multiple sequence alignment is not a structural (i.e, with three-dimensional coordinates) template.

  • For someone who knows very little about this field, could you elaborate on what specific aspect of “homology modeling” AF violates/circumvents which makes you call it “homology search” instead?

    • Homology search is a method to find homologous sequences, that is, evolutionary related sequences that posess a common ancestor. This was usually done based on how identical sequences were, but newer algorithms allow to find remote homologs even when the identity between the sequences is very low. The first step in AlphaFold is to retrieve as many remotely homolog sequences as possible to generate a multiple sequence alignment (MSA) that will be used to generate the embedding.

      On the other hand, homology (or comparative) modelling is a method that generates a structural model of a query sequence based on one or more experimentally solved structure of a close protein homolog. The model generation details depend on the specific protocol but, broadly speaking, spatial restraints are extracted from the template structures and mapped to the query sequence to be modelled.

      Note that AlphaFold also uses a type of geometrical restraint (pairwise residue distances) in its modelling, although they are not derived from protein structures but the MSA embeddings. Both are related but are not exactly the same.

      One difference between AlphaFold and homology modelling is that the latter requires templates having a certain sequence identity with the query sequence (≥30% is the rule of thumb), while the former can have in its MSA remotely homolog sequences well below any discernible identity.