Comment by alphabetting

2 years ago

Obtaining this dataset prior to alphafold would have cost on the order of $200 trillion. https://twitter.com/wintonARK/status/1552653527670857729

Anyone knowledgeable know if this estimate is accurate? Insane if true

It's impossible to really put a number on it, because the task itself was impossible. PHDs and the field's top scientists simply couldn't figure out many complicated protein structures after years of attempts, and the fact that there's so many (200M+) mean that the problem space is vast.

It doesn't make any sense on multiple levels. This is a computational prediction and there was no computational alternative- for many of these proteins would never have had a structure solved even if you spent the money. They are just taking $cost_per_structure_solved * number_of_remaining_structures and assuming that things scale linearly like that.

Note that crystallographers are now using these predicftions to bootstrap models of proteins they've struggled to work with, which indicates the level of trust in the structural community for these predictions is pretty high.

Even if that's exaggerated, it might have taken significant time to reach to this stage. Probably on the order of >50 years.

Off the top of my head:

(200 trillion cost) / (200 million structures predicted) = 1 million per structure.

That reflects the personnel cost (5 Yr PHP scholarship, PostDoc/Prof mentorship; inverstment+depreciation for the lab equipment). All this to crystallize 1 structure and characterize its folding behavior.

I don't know if this calculation is too simplistic, just coming up with something.