Proteins are known to perform complex tasks and catalyze chemical reactions, making them crucial for cell survival.
A ribbon model of a protein. Image Credit: istockphoto.com.
For a long time, researchers and engineers have been seeking to exploit this power by developing artificial proteins that are capable of carrying out new functions like energy harvesting, carbon capture, and disease treatment; however, a majority of the procedures developed to synthesize such proteins are complex and slow, and are associated with a high rate of failure.
In a groundbreaking discovery that could hold implications across the energy, agriculture, and healthcare industries, a research team headed by scientists in the Pritzker School of Molecular Engineering at the University of Chicago has created a new process based on artificial intelligence. This process employs big data to engineer novel proteins.
The scientists initially developed machine-learning models that can assess protein data obtained from genomic databases and then identified comparatively simple rules of design for constructing artificial proteins.
When the researchers built these artificial proteins in the laboratory, they observed that these molecules executed the chemical processes so well that they competed with those found in nature.
We have all wondered how a simple process like evolution can lead to such a high-performance material as a protein. We found that genome data contains enormous amounts of information about the basic rules of protein structure and function, and now we’ve been able to bottle nature’s rules to create proteins ourselves.”
Rama Ranganathan, Joseph Regenstein Professor, Department of Biochemistry and Molecular Biology, Pritzker School of Molecular Engineering, University of Chicago
The study results were published in the Science journal on July 24th, 2020.
Using artificial intelligence to learn design rules
Proteins are composed of an unlimited number of amino acids. The sequences of these amino acids specify the function and structure of the proteins. However, it has been difficult to understand the way these sequences are built to produce novel proteins. While previous work has led to techniques that can specify structure, function continued to remain more elusive.
In the last 15 years, Ranganathan and his colleagues realized that genomic databases, which are increasing phenomenally, contain huge amounts of data regarding the fundamental rules of function and structure of proteins.
Based on this information, Ranganathan’s research team created mathematical models and then started using machine-learning techniques to unravel new data about the fundamental design rules of proteins.
In this study, the researchers analyzed the chorismate mutase class of metabolic enzymes—a kind of protein that sustains life in several fungi, plants, and bacteria. With the help of machine-learning models, the scientists successfully revealed the simple rules of design behind such proteins.
The model revealed that conservation at the positions of amino acids and correlations in the development of pairs of amino acids are more than adequate to estimate new artificial sequences that would exhibit the characteristics of the protein family.
We generally assume that to build something, you have to first deeply understand how it works. But if you have enough data examples, you can use deep learning methods to learn the rules of design, even as you are understanding how it works or why it’s built that way.”
Rama Ranganathan, the Joseph Regenstein Professor, Department of Biochemistry and Molecular Biology, Pritzker School of Molecular Engineering, University of Chicago
Ranganathan and his colleagues subsequently produced synthetic genes to encode the proteins, then cloned them into microorganisms, and finally observed as the microbes eventually produced the synthetic proteins by employing their regular cellular machinery. The team observed that the artificial proteins exhibited the same catalytic function as that of the natural chorismate mutase proteins.
We found that genome data contains enormous amounts of information about the basic rules of protein structure and function, and now we’ve been able to bottle nature’s rules to create proteins ourselves.”
Rama Ranganathan, Joseph Regenstein Professor, Department of Biochemistry and Molecular Biology, Pritzker School of Molecular Engineering, University of Chicago
A platform to understand other complex systems
Thanks to the relatively simple design rules, the amount of artificial proteins that can possibly be created by scientists is very large.
Ranganathan added, “The constraints are much smaller than we ever imagined they would be. There is a simplicity in nature’s design rules, and we believe similar approaches could help us search for models for design in other complex systems in biology, like ecosystems or the brain.”
Although artificial intelligence unraveled the design rules, Ranganathan and his colleagues are still unclear as to why the models work. Next, they will work to figure out how the models reached this conclusion. Ranganathan further added, “There is much more work to be done.”
Meanwhile, the researchers also hope to apply this platform to create proteins that can tackle urgent societal issues, such as climate change. Ranganathan and Associate Professor Andrew Ferguson have founded a new company known as Evozyne that is set to commercialize this new technology with applications in agriculture, catalysis, environment, and energy.
Ranganathan has also collaborated with University of Chicago’s Polsky Center for Entrepreneurship and Innovation to file patents as well as license the IP to the firm.
“This system gives us a platform for rationally engineering protein molecules in a way that we always dreamed we could. Not only can it teach us the physics of how proteins work and how they evolve, it can help us find solutions for issues like carbon capture and energy harvesting. Even more generally, the studies in proteins might even help teach us how the deep neural networks behind modern machine learning actually work,” Ranganathan concluded.
Source:
Journal reference:
Russ, W. P., et al. (2020) An evolution-based model for designing chorismate mutase enzymes. Science. doi.org/10.1126/science.aba3304.