← Back to context

Comment by yuan43

2 years ago

> Today, I’m incredibly excited to share the next stage of this journey. In partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI), we’re now releasing predicted structures for nearly all catalogued proteins known to science, which will expand the AlphaFold DB by over 200x - from nearly 1 million structures to over 200 million structures - with the potential to dramatically increase our understanding of biology.

And later:

> Today’s update means that most pages on the main protein database UniProt will come with a predicted structure. All 200+ million structures will also be available for bulk download via Google Cloud Public Datasets, making AlphaFold even more accessible to scientists around the world.

This is the actual announcement.

UniProt is a large database of protein structure and function. The inclusion of the predicted structures alongside the experimental data makes it easier to include the predictions in workflows already set up to work with the other experimental and computed properties.

It's not completely clear from the article whether any of the 200+ million predicted structures deposited to UniProt have not be previously released.

Protein structure determines function. Before AlphaFold, experimental structure determination was the only option, and that's very costly. AlphaFold's predictions appears to be good enough to jumpstart investigations without an experimental structure determination. That has the potential to accelerate many areas of science and could percolate up to therapeutics.

One area that doesn't get much discussion in the press is the difference between solid state structure and solution state structure. It's possible to obtain a solid state structure determination (x-ray) that has nothing to do with actual behavior in solution. Given that AlhpaFold was trained to a large extent on solid state structures, it could be propagating that bias into its predicted structures.

This paper talks about that:

> In the recent Critical Assessment of Structure Prediction (CASP) competition, AlphaFold2 performed outstandingly. Its worst predictions were for nuclear magnetic resonance (NMR) structures, which has two alternative explanations: either the NMR structures were poor, implying that Alpha-Fold may be more accurate than NMR, or there is a genuine difference between crystal and solution structures. Here, we use the program Accuracy of NMR Structures Using RCI and Rigidity (ANSURR), which measures the accuracy of solution structures, and show that one of the NMR structures was indeed poor. We then compare Alpha-Fold predictions to NMR structures and show that Alpha-Fold tends to be more accurate than NMR ensembles. There are, however, some cases where the NMR ensembles are more accurate. These tend to be dynamic structures, where Alpha-Fold had low confidence. We suggest that Alpha-Fold could be used as the model for NMR-structure refinements and that Alpha-Fold structures validated by ANSURR may require no further refinement.

https://pubmed.ncbi.nlm.nih.gov/35537451/

> Before AlphaFold, experimental structure determination was the only option

Other computational methods have existed for a long time. Folding@home was founded 22 years ago.

  • folding@home doesn't predict structures, it simulates protein folding. Different area with some overlap.