Comment by DevX101

4 years ago

They compare the predicted structure (computed) to a known structure (physical x-ray crystallography). There's an annual competition CASP (Crtical Assessment of protein Structure Prediction) that does X-Ray crystallography on a protein. The identity of this protein is held secret by the organizers. Then research teams across the world present their models and attempt to predict without advance knowledge, the structure of the protein from their amino acid sequence. Think of CASP as a validation data set used to evaluate a machine learning model.

DeepMind crushes everyone else at this competition.

1 comment

DevX101

liuliu 4 years ago

The worry is about dataset shifting. Previously, the data were collected for a few hundreds thousands structures, now it is 200m. I think there could be doubts on distributions and how that could play a role in prediction accuracy.