Comment by Windchaser
1 day ago
It seems just obvious that it's at least a sampling problem. Assuming an average protein length of 400 amino acids and 20 possible amino acids, that's about 10^520 different possibilities for sequences, which is a mind-bogglingly large number.
We haven't even begun to explore the biological universe.
Sure - though because of the functional overlap of amino acids already discussed the functional/structural space could be a lot smaller ( though still massive ) - ie is choosing D or E at a particular position "different" in most situations?
And if you take it up a level of abstraction and say there are 4 ( ish ) basic types of secondary structure ( helix, turn, sheet, disordered ). Then you could argue the structural space is even smaller still.
Or put it another way if you can have sequences with 30% identity or lower with the same fold - that's a awful lot of different unique combinations that collapse into a single structural space.
And on the flip side - what we don't know is what percentage of sequence space don't actually result in a functional fold - ie results in instability and multiple stable or unstable conformations.
So it could be we are close to all the possible folds ( where fold is a single stable form - obviously there are quite a lot of disordered states - but I'm not including those in a 'fold' even if evolution uses unstructured states as well) already.