ESM3: Simulating 500M years of evolution with a language model

2 years ago (evolutionaryscale.ai)

It looks like "500M of evolution" isn't a description (however indirect) of an iterative process, but a metric that measures differences in results:

> But in order for ESM3 to solve its training task of predicting the next masked token the model must learn how evolution moves through the space of potential proteins. In this sense, ESM3 can be thought of as an evolutionary simulator. A traditional evolutionary analysis of the ancestry of esmGFP is paradoxical as the protein was created outside natural processes, but still we can draw insight from the tools of evolutionary biology on the amount of time it would take for a protein to diverge from its closest sequence neighbor through natural evolution. We find naturally occuring GFPs with similar levels of sequence identity are separated by hundreds of millions of years of evolution. Using an analysis similar to one might perform on a new protein found in the natural world, we estimate that esmGFP represents an equivalent of over 500 million years of natural evolution performed by an evolutionary simulator.

  • Yeah, it seems that they created a new protein and then said it’s equivalent to 500 millions years of evolution. Which is of course clearly not.

This is just so cool. I wonder if they’ll release the large (98b) model or gatekeep it in some way. Something that can generate a novel fluorescing sequence is amazingly cool; I bet there’s a lot of interesting work to do from here both on model tuning, preference training and also biology research side.

This is incredibly exciting! It's such a relief from all the dystopian views about how our planet will crumble. I guess if people have unknowingly created problems for themselves, they are also capable of solving them!(talking about plastics here)

This is the sort of thing that could optimize proteins in multiple species across an entire ecosystem to accelerate adaptation to climate change. I am not an expert in this area but it seems so promising.

  • Naive question, but do you think that's a good idea? Many attempts to meddle with biology where I live have backfired - such as introduced biocontrol species becoming pests themselves, human-bred varieties of plants outcompeting the wild ones, stuff like that. Life seems to have a will of its own, I worry that we could make things worse by meddling with an ecosystem's gene pool.