← Back to context

Comment by jyounker

2 days ago

None of this seems particularly surprising to someone who was an undergraduate level of biochemistry knowledge. Thirty years ago the professor in my Proteins class made a few relevant important points in his lectures:

1) Only handful of amino acids in a enzyme structures were highly conserved. (Out of hundreds, generally less than ten.)

2) Those were generally in the reaction center.

3) Almost all single sequence replacements had no measurable effect on protein structure and function.

4) Across species the "same" protein can diverge in sequence by up to 40%, while keeping the same structure. Sometimes this goes as far as 80%.

Given these basic facts, the findings in the paper aren't really surprising to anyone who studies proteins.

[Note: As with everything in biology, you can find counter examples. The histone proteins involved in DNA packing have an incredibly conserved sequence.]

So what are the lessons here?

- that structure is as/more important than sequence ?

- that "reaction centers" are what matter, and the rest is just "protection" ?

What do you mean by "reaction center" - surely not physically central within the folded structure (isn't it the surface shape that determines reactivity) ?

  • > What do you mean by "reaction center"

    An enzymatic reaction center is also known as an "active size". It's the location within an enzyme's 3D structure where catalysis happens.

  • > that structure is as/more important than sequence?

    Structure is determined by sequence, so they are equally important. Structure is more conserved than sequence, mainly due to the physicochemical constraints that govern protein folding.

    > that "reaction centers" are what matter, and the rest is just "protection"?

    Sometimes not even protection. Many enzymes can have plenty of its sequence/structure removed and still be functional. Natural proteins carry lots of evolutionary cruft.

    > What do you mean by "reaction center" - surely not physically central within the folded structure

    I think they borrowed the term from photosystems/photosynthesis. But, to be more precise, what they actually meant is the active site of an enzyme; the location where the catalyzed reaction takes place.

    > (isn't it the surface shape that determines reactivity) ?

    Shape is not enough, the chemical nature of the amino acid residues involved is also important. A single mutation in a key catalytic residue will shut down the enzyme even if the shape stays the same.

You are missing the point - sure a particular enzyme's function is resilent to large levels of substitution because:

1. The number of residues actively involved in catalysis might be small and 2. Most other residues can be safely replaced with something else either similar if part of the structure or anything if the side chain is pointing out on the surface.

However, the point the article is making is that for different functions the same basic folds seem to be used again and again.

Is that because the stable protein fold structural space is actually small ( due to the limited secondard structure patterns used etc ), or is that because evolution hasn't had time to to search the enormous available structural space?

ie is it a sampling problem or an instrinic property of protein space.

The fact that some of the ML approaches mentioned can now design completely novel folds suggests it is at least partially a sampling problem.

This to me isn't surprising - the idea that evolution is somehow complete and all possible solutions have already been explored seems to me to be unlikely - a lot of evolution happens via gene duplication and then gradual functional drift - which would favour reuse of existing folds over the generation of completely new ones.

  • It seems just obvious that it's at least a sampling problem. Assuming an average protein length of 400 amino acids and 20 possible amino acids, that's about 10^520 different possibilities for sequences, which is a mind-bogglingly large number.

    We haven't even begun to explore the biological universe.

    • Sure - though because of the functional overlap of amino acids already discussed the functional/structural space could be a lot smaller ( though still massive ) - ie is choosing D or E at a particular position "different" in most situations?

      And if you take it up a level of abstraction and say there are 4 ( ish ) basic types of secondary structure ( helix, turn, sheet, disordered ). Then you could argue the structural space is even smaller still.

      Or put it another way if you can have sequences with 30% identity or lower with the same fold - that's a awful lot of different unique combinations that collapse into a single structural space.

      And on the flip side - what we don't know is what percentage of sequence space don't actually result in a functional fold - ie results in instability and multiple stable or unstable conformations.

      So it could be we are close to all the possible folds ( where fold is a single stable form - obviously there are quite a lot of disordered states - but I'm not including those in a 'fold' even if evolution uses unstructured states as well) already.