Comment by codedokode

2 years ago

Today I learned that there are bacteria that have a protein helping to form ice on plants [1] to destroy them and extract nutrients (however I didn't understand how bacteria themselves survive this).

Machine learning typically uses existing data to predict new data. Please explain: Does it mean that AlphaFold can only use known types of interactions between atoms and will mispredict the structure of proteins that use not yet known interactions?

And why we cannot just simulate protein behaviour and interactions using quantum mechanics?

[1] https://pubs.acs.org/doi/10.1021/acs.jpcb.1c09342

>And why we cannot just simulate protein behaviour and interactions using quantum mechanics?

If you wanted to simulate the behaviour of an entire protein using quantum mechanics, the sheer number of calculations required would be infeasible.

For what it's worth, I have a background in computational physics and am studying a PhD in structural biology. For any system (of any size) that you want to simulate, you have to consider how much information you're willing to 'ignore' in order to focus on the information you would like to 'get out' of a set of simulations. Being aware of the approximations you make and how this impacts your results is crucial.

For example, if I am interested in how the electrons of a group of Carbon atoms (radius ~ 170 picometres) behave, I may want to use Density Functional Theory (DFT), a quantum mechanical method.

For a single, small protein (e.g. ubiquitin, radius ~ 2 nanometres), I may want to use atomistic molecular dynamics (AMD), which models the motion of every single atom in response to thermal motion, electrostatic interactions, etc using Newton's 2nd law. Electron/proton detail has been approximated away to focus on overall atomic motion.

In my line of work, we are interested in how big proteins (e.g. the dynein motor protein, ~ 40 nanometres in length) move around and interact with other proteins at longer time (micro- to millisecond) and length (nano- to micrometre) scales than DFT or AMD. We 'coarse-grain' protein structures by representing groups of atoms as tetrahedra in a continuous mesh (continuum mechanics). We approximate away atomic detail to focus on long-term motion of the whole protein.

Clearly, it's not feasible to calculate the movement of dynein for hundreds of nanoseconds using DFT! The motor domain alone in dynein contains roughly one million atoms (and it has several more 'subunits' attached to it). Assuming these are mostly Carbon, Oxygen or Nitrogen, then you're looking at around ten million electons in your DFT calculations, for a single step in time (rounding up). If you're dealing with the level of atomic bonds, you're probably going to a use time steps between a femto- (10^-15 s) or picosecond (10^-12 s). The numbers get a bit ridiculous. There are techniques that combine QM and AMD, although I am not too knowledgeable in this area.

Some further reading, if you're interested (I find Wikipedia articles on these topics to generally be quite good):

DFT: https://en.wikipedia.org/wiki/Density_functional_theory

Biological continuum mechanics: https://doi.org/10.1371/journal.pcbi.1005897

Length scales in biological simulations: https://doi.org/10.1107/S1399004714026777

Electronic time scales: https://www.pnas.org/doi/10.1073/pnas.0601855103

  • To add to this comment (from someone who used to engineer proteins, and long ago DFT as well): DFT is only really decent at ground state predictions, computational chemists often have to resort to even more expensive methods to capture "chemistry", i.e. correlated electron-pair physics and higher-state details. Simulating catalysis is extremely challenging!

> And why we cannot just simulate protein behaviour and interactions using quantum mechanics?

QM calculations have been done in proteins, but they’re computationally very expensive. IIRC, there are hybrid approaches where only a small portion of interest in the protein structure is modelled by QM and the rest by classical molecular mechanics.