Comment by adrian_b

29 minutes ago

The living beings use much more amino acids than those that compose proteins.

The relatively low number of amino acids that are used in proteins appears to be caused by the difficulty of modifying the genetic code by adding not yet encoded amino acids to the set of encoded amino acids.

Variations of the genetic code are known at various living beings, but nonetheless they are very rare, because a change in the genetic code requires a lot of other coordinated changes. A new kind of transfer RNA must be encoded in the genome (the only likely origin of such a new tRNA is a mutation in one of the existing) and that RNA molecule must be able to bind preferentially to the codons that are repurposed to encode a new amino acid, and also to molecules of that amino acid, which requires a lot of favorable change is the molecular structure of that RNA.

It seems that in the earliest form of genetic code, there were only 4 distinct symbols, i.e. of the 3 nucleobases of a codon only the central one was meaningful and the 2 peripheral nucleobases did not encode information.

The 4 original symbols selected between 4 major kinds of amino acids: the special amino acid glycine, an acid amino acid, a hydrophobic amino acid and an amino acid with intermediate behavior, like alanine or proline.

These variations would have been enough to build proteins with specific conformations.

The fact that a codon had 3 nucleobases, presumably to ensure the binding to transfer RNA molecules, even if only one of them encoded information, appears to have been a great luck, because this allowed later the expansion of the genetic code, because 3 bases give 64 combinations allowing the encoding of up to 64 symbols.

Most of the possible codons have remained ambiguous until today, but the number of encoded amino acids has increased slowly in time, up to 21, the most recent additions to the encoded set being those of the sulfur-containing amino acids, aromatic amino acids and selenium-containing amino acids.

As you say, there are disadvantages in using many kinds of amino acids, but there are also advantages, by allowing the creation of proteins with properties that are not achievable with a smaller set of amino acids.

The balance between advantages and disadvantages appears to have slowed down continuously the rate of adding new amino acids to the set encoded in the genetic code, so that the majority of the living beings of today have not added any new amino acid since several billion years ago.

Most of the expansions of the genetic code happened before the last common ancestor of all living beings of today, so that today there are very few living beings with more recent modifications in the genetic code.