Comment by spwa4
2 days ago
This is just repeating the fact that the proteins life actually uses are a very small part of the total possible ones. First, there's no real length limit, but all life's proteins are limited to a few thousand amino acids. Most barely get past hundred.
(note: there are bigger proteins, including ones so big you can see them with the naked eye (e.g. a hair) but they consists of multiple repeats of the same small building block. There are many such building blocks. And the very few exceptions to that are "not really" part of eukaryot cells, but of cell organelles that have their own DNA)
But even if you just take the first 4 amino acids, there's half a million possible combinations. Life uses less than 1000 of those.
In other words: DNA and evolution, even with billions of years to think about it, is really a bit of a beginner when it comes to protein design. Or at least, it is pretty obvious that it's possible to do A LOT better than natural selection.
This is about folds, not amino acids - even if you used a larger alphabet of residues, I somehow doubt that you would get many more folds.
Thinking more about the question of protein _length_ - I'm also not convinced that longer proteins (more than say 750aa) would produce more novel folds. Larger proteins tend to be multi-domain; that is, a longer chain will fold into multiple compact domains, each one a separate fold.
I suppose there could be 'megafolds' out there in fold space, beyond 1000aa - like a 12-bladed beta propeller, or a beta-helix with alpha helices on the outside or some other wacky thing. Whether that would substantially increase the numbers of total folds, I doubt, but that is of course a guess.
(ref - https://pmc.ncbi.nlm.nih.gov/articles/PMC10251718/ for protein lengths)
Amino acid (sequence) defines the folds.
And really? Just any random sequence gets you a new fold. I mean, it won't be very useful if you pick a random one, but it'll work and be a new one.
I think this is just an artifact of natural selection basing new proteins on existing ones, not an actual useful ("rational" if you can call natural selection rational) selection limit. I don't think that if you designed proteins from first principles you'd see this limitation in your results.
A random sequence may not fold at all! I seem to remember a paper that tried this, creating a bunch of random proteins, and checking how much structure they had - I think they were helical bundles, but don't quote me.
The nice thing about stable folds, is that 'nearby' sequences in sequence space - as in, point mutations - are the same fold. If each sequence had a completely different fold, then mutation would be much more destructive. Surprisingly, however, sequences that are far apart in sequence space can also adopt the same fold (convergent evolution).
4 replies →
> DNA and evolution, even with billions of years to think about it, is really a bit of a beginner when it comes to protein design.
I like how you say evolution is able to think when in reality it's just a mysterious function of variation, selection, and time.
I find it completely daunting to speak of evolutions processes without some anthropomorphism sneaking in, despite being a hardcore atheist.
It's all so complex, and our verbs that more literally describe the billions of nanosecond operations going on in the cells feel inadequate. "When a protein molecule in an appropriate folded shape and orientation happens to be bounced by kinetic energy into the attractive region of a corresponding protease..." versus "The protease grabs the protein and cuts it into..."