← Back to context

Comment by COGlory

2 years ago

It has template structures. AlphaFold uses the following databases:

    BFD,
    MGnify,
    PDB70,
    PDB (structures in the mmCIF format),
    PDB seqres – only for AlphaFold-Multimer,
    Uniclust30,
    UniProt – only for AlphaFold-Multimer,
    UniRef90.

Those databases are used to derive the evolutionary couplings and distance matrices used by the algorithm. Several of those databases aren’t even structural ones. Furthermore, AlphaFold can function with only a MSA as an input, without retrieving a single PDB coordinate.

  • It's all about boosting signal by finding other proteins that are similar, until you get to the point that you can identify a fold to assign to a region of the protein. That's why some are structural, and some are not.

    >Furthermore, AlphaFold can function with only a MSA as an input, without retrieving a single PDB coordinate.

    Yes, it has a very nice model of what sequences should look like in 3D. That model is derived from experimental data. So if I give AlphaFold an MSA of a new, unknown protein fold (substantively away from any known fold), it cannot predict it.

    • > Yes, it has a very nice model of what sequences should look like in 3D.

      A structural model, you would say.

      > That model is derived from experimental data.

      That doesn’t make it a template-based model, or a homology one.

      > if I give AlphaFold an MSA of a new, unknown protein fold (substantively away from any known fold), it cannot predict it

      That will depend on the number of effective sequences found to derive couplings. Domains with novel folds usually have a low number of remotely homolog sequences and for that reason the method will fail, not just because they are novel.

      4 replies →