Comment by crispyambulance
2 years ago
I got a 5th grader question about how proteins are used/represented graphically that I've never been able to find a satisfying answer for.
Basically, you see these 3D representations of specific proteins as a crumple of ribbons-- literally like someone ran multi-colored ribbons though scissors to make curls and dumped it on the floor (like a grade school craft project).
So... I understand that proteins are huge organic molecules composed of thousands of atoms, right? Their special capabilities arise from their structure/shape. So basically the molecule contorts itself to a low energy state which could be very complex but which enables it to "bind?" to other molecules expressly because of this special shape and do the special things that proteins do-- that form the basis of living things. Hence the efforts, like Alphafold, to compute what these shapes are for any given protein molecule.
But what does one "do" with such 3D shapes?
They seem intractably complex. Are people just browsing these shapes and seeing patterns in them? What do the "ribbons" signify? Are they just some specific arrangement of C,H,O? Why are some ribbons different colors? Why are there also thread-like things instead of all ribbons?
Also, is that what proteins would really look like if you could see at sub-optical wavelength resolutions? Are they really like that? I recall from school the equipartition theorem-- 1/2 KT of kinetic energy for each degree of freedom. These things obviously have many degrees of freedom. So wouldn't they be "thrashing around" like rag doll in a blender at room temperature? It seems strange to me that something like that could be so central to life, but it is.
Just trying to get myself a cartoonish mental model of how these shapes are used! Anyone?
The ribbons and helices you see in those pictures are abstract representations of the underlying positions of specific arrangements of carbon atoms along the backbone.
There are tools such as DSSP https://en.wikipedia.org/wiki/DSSP_(hydrogen_bond_estimation... which will take out the 3d structure determined by crystallography and spit out hte ribbons and helices- for example, for helices, you can see a specific arrangement of carbons along the protein's backbone in 3d space (each carbon interacts with a carbon 4 amino acids down the chain).
Protein motion at room temperature varies depending on the protein- some proteins are rocks that stay pretty much in the same single conformation forever once they fold, while others do thrash around wildly and others undergo complex, whole-structure rearrangements that almost seem magical if you try to think about them using normal physics/mechanical rules.
Having a magical machine that could output the full manifold of a protein during the folding process at subatomic resolution would be really nice! but there would be a lot of data to process.
Thanks, awesome! So what do molecular biologists do with these 3D representations once they have them? Do they literally just see how they fit to other proteins?
There are many uses for structure. Personally, I find the 3d structures to be useful as a mental guide for picturing things, and certainly people do try to "dock" proteins that have complementary structures, but unfortunately, the biophysics of protein complexes suggests that the conformation change on binding is so large that the predicted structures aren't super-helpful.
Certainly, in a corpo like mine (Genentech/Roche) protein structures have a long history of being used in drug discovery- not typically a simple "dock a ligand to a protein" but more for constructing lab experiments that help elucidate the actual mechanistic biology going on. That is only a tiny part of a much larger process to work on disease targets to come up with effective treatments. Genentech is different from most pharma in that their treatments are themselves typically proteins, rather than small molecules.
5 replies →
A structure is bascially another tool for producing hypotheses. In my case, I often use structures to predict effects of genetic lesions. If your protein has a clearly defined active site, you can get a rough sense of where on the enzyme that active site is relative to other mutations. Often residues that are distant in sequence end up right next to each other in the folded structure, so certain residues can have unexpected roles.
It gives a picture of the enzyme as a machine, and lets you look at specific parts and say “this residue is probably doing this job in the whole system”.
Often the ribbons (alpha-helices and beta=sheets) form "protein domains". Canonically, these are stable, folded structures with conserved shapes and functions that serve as the building blocks of proteins, like lego pieces. These protein domains can be assembled in different ways to form proteins of different function. Different protein domains that have the same evolutionary origin have conserved structure even when the underlying amino acid sequence, or DNA sequence has changed beyond recognition over millions of years of evolution. In other words, molecular biologists use structure as a proxy for function. Looking at how the same protein domains works in different proteins in different species can give us clues as to how a protein might work in human biology or disease.
Basically, the shape of the protein determines how it interacts with other things. So knowing the structure enables better prediction of how the pathways it is involved in work and how other things (say, potential drugs) would affect that pathway.
All of the loops and swirls are summary representations of known atomic positions: really, knowing a protein structure means knowing the position of every atomic nucleus, relative to the nuclei, down to some small resolution, and assuming a low temperature.
The atoms do wiggle around a bit at room temperature (and even more at body temperature), which means that simulating them usefully typically requires sampling from a probability distribution defined by the protein structure and some prior knowledge about how atoms move (often a potential energy surface fitted to match quantum mechanics).
There are many applications of these simulations. One of the most important is drug design: knowing the structure of the protein, you can zoom in on a binding pocket and design a set of drug molecules which might disable it. Within the computer simulation, you can mutate a known molecule into each of your test molecules and measure the change in binding affinity, which tells you pretty accurately which ones will work. Each of these simulations requires tens of millions of samples from the atomic probability distribution, which typically takes a few hours on a GPU given a good molecular dynamics program.
If you want something that leaves a little less to the imagination, check out https://en.wikipedia.org/wiki/Staphylococcus_aureus_alpha_to... . It looks just like what it does: drill a giant hole in cell membranes.
Some proteins have 3D structures that look like abstract art only because we don't have an intuitive understanding of what shape and amino acids are necessary to convert chemical A to chemical B, which is the main purpose of many enzymes in the body. If you look at structural proteins or motor proteins, on the other hand, their function is clear from their shape.
There are a lot of other things you can do with the shape. If it has a pore, you can estimate the size and type of small molecule that could travel through it. You can estimate whether a binding site is accessible to the environment around it. You can determine if it forms a multimer or exists as a single unit. You can see if protein A and protein B have drastically different shapes given similar sequences, which might have implications for its druggability or understanding its function.
https://alphafold.ebi.ac.uk/entry/W6KDG8
The ribbon shape for GFP is a very cool barrel thing
One of my favorites are some antifreeze proteins:
* https://www.rcsb.org/structure/1m8n
* https://iiif.elifesciences.org/lax/05142%2Felife-05142-fig1-...
> Are people just browsing these shapes and seeing patterns in them
That's one approach.
The thing to understand is that proteins form "binding sites": areas that are more likely to attract other particular regions of proteins or other molecules, or even atoms. Think about hemoglobin. The reason it holds onto oxygen atoms is because it has binding sites.
Binding sites are great because they represent more freedom to do things than molecules typically have. Normal chemistry consists of forming strong electronic bonds between atoms, or forming rigid lattices/crystals.
Binding sites allow molecules to do things like temporarily attach to each other and let each other go under certain circumstances, for instance when another binding site is active/inactive. This can happen through "conformation change", where a molecule bound/unbound on some binding site makes the protein change shape slightly. This is how proteins can act like machines.
> What do the "ribbons" signify
Different regions of the protein have different sequences of amino acids. Amino Acids have somewhat different shapes from each other. The ribbons are actually broader than the spindles (or threads), and less flexible. Not sure about the different colors, maybe someone else can fill in.
> Also, is that what proteins would really look like if you could see at sub-optical wavelength resolutions?
Not really, it's an abstraction. They're big molecules, so if you look closely they're made of atoms, which are (kinda, sorta not really, quantum stuff) spherical.
> So wouldn't they be "thrashing around" like rag doll in blender at room temperature?
Yes, but the attractions between the different parts of the molecule keeps it somewhat under control. So more like an undulating little creature, jellyfish perhaps.
> It seems strange to me that something like that could be so central to life
Yep, gotta remember that it's all statistical. These things are getting made, do their job, breaking, and getting degraded some insane number of times per second. Swarm behavior, sort of.
Short answer is that the ribbon representation is a visual simplification based on known structures -- they are actually composed of atoms.
They certainly do "thrash around", but that thrashing is constrained by the bonds that are formed, which greatly limits the degrees of freedom. Here's a short video of a simulation to demonstrate:
https://www.youtube.com/watch?v=fggqPtaZj8g
I've been going through MIT's online Introduction to Biology course[0] that answers some of your questions here with regards to the shapes and what they signify - specifically the "Proteins and Protein Structure" lessons in the second unit, although some of the previous lectures are helpful setup as well - really interesting and engaging stuff, taught by Eric Lander (who ended up being one of the CRISPR pioneers featured in Isaacson's latest book)
[0]https://learning.edx.org/course/course-v1:MITx+7.00x+2T2022/...
That's cool, I just happened to have picked up a used copy of the text on which the course based... "Molecular Biology of the Cell" -- the huge grey book. Geez, there's a lot of material in there!
Back in the day, I had steered away from chemistry in college because I didn't like to memorize stuff. Now I realize I missed out on some amazing knowledge.
> I recall from school the equipartition theorem-- 1/2 KT of kinetic energy for each degree of freedom. These things obviously have many degrees of freedom. So wouldn't they be "thrashing around" like rag doll in a blender at room temperature?
It's funny you say that, because the first image on the English Wikipedia page for Equipartition Theorem[1] is an animation of the thermal motion of a peptide.
[1]: https://en.wikipedia.org/wiki/Equipartition_theorem
BTW, in terms of protein dynamics, before you even think about the thrashing around- 1.2kt at room temperature is enough to form and break hydrogen bonds in real time (around 1-2kcal) so presumably, protein h-bonds are breaking and reforming spontaneously at scale.
Your "now what?" question is legitimate and reminiscent of reactions after the completion of the Human Genome Project.
Just like having a human genome sequence, this is not a magic key that solves all problems of biology but a building block for use by researchers. An investigator may look up the folded structure of a protein and use that information to glean certain context-specific insights from it such as how exactly two interacting proteins interact mechanically.
The other significant benefit is that this frees up resources that were spent having to figure out the structure in other ways. It's an efficiency improvement.
Watch this video on DNA polymerase [1]. Obviously it’s an illustration, but I think it helps answer you question because cartoons are great. (MD, not PhD biologist)
[1] https://youtu.be/sKe3UgH1AKg
The ability for another molecule (probably another protein) to "react" or interact with the protein depends not only on the chemistry but also the shape. An otherwise compatible sequence of atoms might not be able to react because it and the binding site are just incompatibly shaped.
This is hugely important for developing drugs and vaccines.
To see the effect of this look no further than prions. Prions are the exact same protein that are folded in weird ways. Worse, they can "transmit" this misfolded shape to other otherwise normal proteins. Prions behave differently just because of the different shape and can lead to disease. This is exactly what Mad Cow's Disease (BSE) is.
What we get taught in high school about chemistry is incredibly oversimplified.
One example of this I like is the geometry of a water molecule. When we first learn about atoms, we learn the "solar system" model (aka Bohr). The reality is instead that we have 3D probability distributions of where electrons might be. These clouds are in pairs. I believe this is to do with the inverted wavefunction really we're getting beyond my knowledge of quantum mechanics here so that's just a guess.
Well those clouds additionally form valence shells. We learn about these and how atoms want to form completely valence shells. So Oxygen has 8 electrons ie 4 pairs of electrons. When bonding with 2 hydrogen atoms we end up with a weird geometry of ~104.5 degrees between the two hydrogen atoms because of how these pairs interact. The naive assumption might expect that the two hydrogen atoms are 180 degree apart.
So back to proteins, you may have learned about hydrogen bonds. This affects molecular shape because when a hydrogen atom shares an electron, it is often positively charged. That positive charge pushes away other positive charges. This is the realy difficulty in protein folding because with a molecule of thousands of atoms and weird geometry you may find distant parts of the molecule interacting with hydrogen bonds.
So a single cell consists of thousands (IIRC) of different proteins. Figuring out those interactions is important but incredibly difficult.
In addition to /u/dekhn 's excellent description, this phenomenon is referred to as a protein's "secondary structure" [0]
[0] https://en.m.wikipedia.org/wiki/Protein_secondary_structure