Comment by dalbasal

2 years ago

Can someone put AlphaFold's problem space into perspective for me?

Why is protein folding important? Theoretical importance? Can we do something with protein folding knowledge? If so, what?

I've been hearing about AlphaFold from the CS side. There they seem to focus on protein folding primarily as an interesting space to apply their CS efforts.

If we knew:

(a) the structure of every protein (what DeepMind is doing here)

(b) how different protein structures interact (i.e. protein complexes - DeepMind is working on this but not there yet)

Then we could use those two building blocks to design new proteins (drugs) that do what we want. If we solve those two problems with very high accuracy, we can also reduce the time it takes to go from starting a drug discovery programme to approved medicine.

Obtaining all protein structures and determining how they interact is a key step towards making biology more predictable. Previously, solving the structure of a protein was very time consuming. As a result, we didn’t know the structure for a majority of proteins. Now that it’s much faster, downstream research can move faster.

Caveat: we should remember that these are all computational predictions. AlphaFold’s predictions can be wrong and protein structures will still need to be validated. Having said that, lots of validation has already occurred and confidence in the predictions grows with every new iteration of AlphaFold.

  • > Then we could use those two building blocks to design new proteins (drugs) that do what we want. If we solve those two problems with very high accuracy, we can also reduce the time it takes to go from starting a drug discovery programme to approved medicine.

    Drugs are usually not proteins, but instead small molecules that are designed to help or interfere with the operation of proteins instead.

  • How are the predictions validated? Waiting for the old fashioned way for... very difficult crystal structure experiments? Or something else?

    • Most of them are not, just estimations based on previous results given sequences with known structure.

      Every couple years there is a massive competition called CASP where labs submit previously unresolved protein structures derived from experimental EM, x-ray crystallography, or NMR studies and other labs attempt to predict these structures using their software. AlphaFold2 absolutely destroyed the other labs in the main contest (regular monomeric targets, predominantly globular) for structure resolution two years ago, in CASP 14.

      https://predictioncenter.org/casp14/zscores_final.cgi

      The latest contest, CASP15, is currently underway and expected to end this year. As with all ML, the usual caveats apply to the models Google generated -- the dangers of overfitting to existing structures, artifacts based on the way the problem was modelled, etc

    • > very difficult crystal structure experiments?

      Apart from X-ray crystallography there are other methods for structure determination such as nuclear magnetic resonance (NMR) or cryo-electron microscopy (cryo-EM). The latter has seen a dramatic improvement in resolution over the last decade.

    • If the predictions are generally good enough, could also skip the validation and directly try to get a desired effect or reaction. That isn't strictly speaking validating the structure, but depending on the use case might be easier to just go for an outcome - really a question of application and cost efficiency.

      1 reply →

    • For a lot of X-ray crystallography cases, some of the difficulty is working out with no prior information, the actual structure from the collected data. This makes a lot of that... much easier because with https://en.wikipedia.org/wiki/Molecular_replacement something that is "close, but not correct" can be used to bootstrap the actual structure from.

You are basically made of proteins, which are basically folded sequences of amino acids, proteins are molecular machines that are the fundamental building block of animals, plants, bacteria, fungi, viruses etc.

So yeah the applications are enormous, from medicine to better industrial chemical processes, from warfare to food manufacturing.

  • > proteins are molecular machines

    Does that imply proteins have some dynamics that need to be predicted too? I remember seeing animations of molecular machines that appeared to be "walking" inside the body - are those proteins or more complex structures?

    • Reference: https://en.wikipedia.org/wiki/Kinesin , https://en.wikipedia.org/wiki/Dynein , https://en.wikipedia.org/wiki/Myosin

      They are called motor proteins because they convert chemical energy into kinetic energy. In the case of kinesin, it forms a dimer (two copies of itself bind together to form the two "legs") and also binds to light chains (accessory proteins that modulate its behavior) so that it can walk along filaments and drag cargo around your cells. They are both proteins and more complex structures because multiple proteins are interacting, as well as binding small molecules and catalyzing them into chemical products, all to produce the motion.

    • Yes, very much so. Even for proteins that seems like they are just scaffolding for a catalytic centre can have important dynamics.

      A classic example is haemoglobin, that 'just' binds to oxygen at the iron in the middle of the haem. Other binding sites remote from the oxygen binding one can bind to other molecules - notably carbon dioxide. The 'Bohr effect' mechanism is outlined here : https://en.wikipedia.org/wiki/Bohr_effect#Allosteric_interac...

      Even at the lowest level, there is some evidence that ligand binding can affect the structure of the backbone of the protein. For example, peptide plane flipping https://en.wikipedia.org/wiki/Peptide_plane_flipping although I'm not sure where the research is on this nowadays.

      8 replies →

    • Yes proteins can be dynamic, and multiple proteins can come together to form bigger structures (like a cell's membrane for example), and they can very much interact with other proteins and molecules in complex ways.

      I think AlphaFold gives us more of a static picture that doesn't tell us much about how the protein will interact with the world. Predicting these dynamic aspects, protein-to-protein interactions, and ultimately an entire cell, is probably what they'll like to do next.

      I don't know if the walking buddy is a single protein, it may be a handful of different ones joined together.

As others have already mentioned, proteins are the machinery of the cell. They perform an immense array of functions and they must fold in a certain way to perform these functions. This is part of what's known as the structure-function relationship.

Misfolded proteins are contributors to numerous pathological conditions and the more we can understand about how and why this folding happens, the better we can treat these conditions.

Another aspect is that while we can at least partially determine the primary structure (the amino acid sequence) of proteins from DNA and RNA, we don't necessarily know their secondary or tertiary structures (3 dimensional conformation). This is a key piece of the puzzle for figuring out how these proteins do their proteiny things and how they interact with other proteins and even how they form quaternary structures with other proteins (an assembly of multiple proteins that perform some function, many pores are assemblies like this). Once we know these structures and understand how they work on a structural and chemical level, we can manipulate them far more easily.

In order to do rational drug design, which is designing a drug for a specific target or active site on a protein, we need to understand these structures. Working to solve protein folding is a key step in treating disease states and understanding how cells work on a fundamental level. The impact is hard to understate.

My understanding is that protein folding is a major cost bottleneck in drug design.

Researchers can come up with candidate molecule formulas that might work as good drugs, but the problem is that these proteins organize/fold themselves physically in a hard-to-predict way. And how they fold directly affects their properties as drugs.

If AlphaFold can accurately predict folding, it’ll allow researchers to prioritize drug candidates more accurately which will reduce research time and costs. Supposedly the major pharmaceutical companies can spend up to billions when designing a single drug. Optimistically, predicting protein folding better will allow for much more rapid and cheaper drug development

  • I love AlphaFold, but this is a big misconception. The biggest cost bottle neck in drug development and design, by orders of magnitude, is associated with assaying (and potentially reducing) off-target binding or toxicity and assaying (and potentially increasing) efficacy. Determining a protein structure empirically with cryoEM, NMR, or crystallography will generally cost less than $1M (sometimes far less), which is tiny compared to the many millions or billions of dollars that get poured into clinical trials for a single drug. AF2 is useful in some basic research cases but isn't really that useful for traditional drug design and development.

    A machine learning approach for predicting toxicity would have a far greater impact on public health than AF2 does.

  • My understanding is that protein folding is not a bottleneck in drug design.

    Yes, once you identified a target protein, its structure is useful to selectively target it. But the main bottleneck is identifying such targets. In other words, the main difficulty is to figure out what to hit, not how to hit it, and protein folding mostly helps with how at the moment.

Proteins are what makes everything in a cell work. They are produced as a "linear" structure that must fold into a proper shape to execute its function, such as acting as a pore that only lets a specific chemical through the cell membrane.

The importance here is to figure out potential targets for treatments that take into account particularities of certain proteins. That could produce better drugs with less side effects.

The genome, all of our DNA combined, is just a bunch of 1D strings like "cgtattctgcttgta". Those strings encode proteins, which fold up into a 3D shape once created. This 3D shape is what determines what the protein actually does inside the cell. Without understanding protein folding we don't understand what the DNA actually does.

The applications and importance has been discussed, but let me explain why what we are doing right now does not work which will also emphasize the importance of this.

At this time, we create drugs, test them on animals, and see what the side effects and results actually are. We are very limited in our capabilities and basically throw mud at the wall and see what sticks. This would allow us to try potential drug candidates without so much randomness.

There are a million articles and podcasts explaining exactly your question. Those will be better than HN responses. I suggest you take 15 seconds to Google it.

  • Yes there are a million articles. That is why asking a question here on HN is useful. The HN community more often than not offers intelligent insight as well as curated recommended links for learning more about a topic. Yes, the signal-to-noise ratio isn't perfect on HN, but it is a lot better than random Google searches. If nothing else, it often leads to enough basic understanding so that someone can then perform more refined and therefore more productive Google searches. I appreciate the basic questions and the generous nature of many HN commenters who offer thoughtful responses.

The endgame of protein folding is nanotechnology. Life is also nanotechnology. So basically the end game is to take full control of life: change existing life forms, design new life forms, new ecosystems. Eat the earth first, then the universe. Not yet there, but getting much closer.

Proteins are the molecular machines of all living beings on this planet. They do almost everything. We need to understand their structure and how that structure changes when they perform some function. Essentially we have to reverse engineer nature's machines.

Biology has traditionally studied these things from the outside in, top down. Advances in biology are correlated with advances in microscope technology. We went from seeing with the naked eye to being able to magnify tissue to see cells and even what's inside of them.

Studying biology from the inside out, bottom up is significantly harder. You have chemistry which is hard science. On top of that you have biochemistry which is equally rigorous. Above that level of abstraction things start to get a lot less well defined. We go from organic chemistry (carbon, hydrogen, oxygen, etc.) to more complex things found in life (aminoacids, carbohydrates, lipids) to polymers of those things (proteins, polysaccharides, triglycerides) to a point where they get so big and complex their tridimensional structure starts influencing how they interact with other molecules then we start seeing aggregates (quaternary proteins) of these complex things!

Important proteins like ATP synthase are well understood (it's a friggin' turbine like in a hydroelectric power plant!) but there are so many others haven't received the same attention. Bioinformatics technology is changing that. The human genome's been mapped. Now it's time to map the proteome.