← Back to context

Comment by DrScientist

1 day ago

You are missing the point - sure a particular enzyme's function is resilent to large levels of substitution because:

1. The number of residues actively involved in catalysis might be small and 2. Most other residues can be safely replaced with something else either similar if part of the structure or anything if the side chain is pointing out on the surface.

However, the point the article is making is that for different functions the same basic folds seem to be used again and again.

Is that because the stable protein fold structural space is actually small ( due to the limited secondard structure patterns used etc ), or is that because evolution hasn't had time to to search the enormous available structural space?

ie is it a sampling problem or an instrinic property of protein space.

The fact that some of the ML approaches mentioned can now design completely novel folds suggests it is at least partially a sampling problem.

This to me isn't surprising - the idea that evolution is somehow complete and all possible solutions have already been explored seems to me to be unlikely - a lot of evolution happens via gene duplication and then gradual functional drift - which would favour reuse of existing folds over the generation of completely new ones.

It seems just obvious that it's at least a sampling problem. Assuming an average protein length of 400 amino acids and 20 possible amino acids, that's about 10^520 different possibilities for sequences, which is a mind-bogglingly large number.

We haven't even begun to explore the biological universe.

  • Sure - though because of the functional overlap of amino acids already discussed the functional/structural space could be a lot smaller ( though still massive ) - ie is choosing D or E at a particular position "different" in most situations?

    And if you take it up a level of abstraction and say there are 4 ( ish ) basic types of secondary structure ( helix, turn, sheet, disordered ). Then you could argue the structural space is even smaller still.

    Or put it another way if you can have sequences with 30% identity or lower with the same fold - that's a awful lot of different unique combinations that collapse into a single structural space.

    And on the flip side - what we don't know is what percentage of sequence space don't actually result in a functional fold - ie results in instability and multiple stable or unstable conformations.

    So it could be we are close to all the possible folds ( where fold is a single stable form - obviously there are quite a lot of disordered states - but I'm not including those in a 'fold' even if evolution uses unstructured states as well) already.