Brian Hie leads the Laboratory of Evolutionary Design at Stanford, focusing on the intersection of artificial intelligence and biology. Recently, Hie considered a compelling question: If a tool like ChatGPT can generate original sentences by analyzing patterns in vast collections of written text, what would happen if written words were substituted with genetic code?
Evo, a generative AI model that creates genetic code, is the solution to that ostensibly straightforward query. In a study published in the journal Science, Hie and his associates from the Arc Institute and the University of California, Berkeley presented Evo.
According to Hie, scientists could use Evo to understand better how viral and microbial genomes function, create previously unimagined proteins (drugs), and rewire microbes to perform amazing tasks like consuming microplastics from the oceans or enhancing photosynthesis for carbon sequestration and increasing crop yields.
Instead of having to use brute force testing or mining promising sequences from nature, all of which are quite unpredictable, we now have an AI model for generating systems of interest, allowing researchers to focus only on the most promising possibilities. Evo puts the genomes of whole lifeforms within reach and accelerates the bioengineering design process.”
Brian Hie, Assistant Professor, Arc Institute, Stanford University
Evo may even result in new treatments, a better understanding of genetic illnesses, and a deeper comprehension of evolution itself, all accomplished on a computer rather than in a lab.
Natural Insight
Nature itself serves as the source of inspiration. DNA contains the instructions for all life. A better understanding of the intricate interactions between DNA, RNA, and proteins and how these interactions have changed over time will result in deeper knowledge and the capacity to rewire microbes into practical technologies.
However, things are not as simple as they appear. The genomes of even the most basic microorganisms contain millions of base pairs.
Enhancing the resolution to the scale of individual nucleotides, the building blocks of DNA, and increasing the length of sequences models can process at once from about 8,000 base pairs to over 131,000 base pairs (referred to as the “context window”) are two of Evo's main improvements over comparable existing tools.
The 300 billion nucleotide genomes of 2.7 million prokaryotic and phage organisms, 80,000 microbes, and smaller DNA loops called plasmids were used to train Evo. However, the team had to omit the genomes of viruses known to infect humans and some other organisms to prevent the use of Evo for the development of bioweapons.
According to Hie, Evo can produce DNA sequences of over a million base pairs, more than seven times the context window of 131,000 base pairs, and can learn how slight variations in nucleotide sequences impact the evolutionary fitness of entire organisms. The researchers point out that the smallest “minimal” bacterial genomes are roughly 580,000 base pairs long.
Proof of Concept
Hie and associates prompted Evo to produce unique synthetic CRISPR-Cas molecular complexes and systems as a proof of concept of Evo's design capabilities. Proteins and RNA work together to modify DNA in CRISPR-Cas systems, which resemble tiny molecular machines.
Evo developed a completely working, unidentified CRISPR system in response to that prompt, which was verified following the testing of 11 potential designs. According to Hie, Evo's CRISPR investigation is the first instance of simultaneous protein-RNA codesign with a language model.
Hie is already working on extending his research beyond the microbial world to human and other genomes, improving Evo's capacity to process larger genomic sequences, and gaining more control over its outputs.
Evo opens up a lot of very interesting research at the intersection of machine learning and biology. It creates opportunities for discoveries that were previously unimaginable and accelerates our ability to engineer life itself.”
Brian Hie, Assistant Professor, Arc Institute, Stanford University
Source:
Journal reference:
Nguyen, E., et al. (2024) Sequence modeling and design from molecular to genome scale with Evo. Science. doi.org/10.1126/science.ado9336.