A recent study published in Nature Communications reveals that the three-dimensional structure of proteins can help uncover long-standing evolutionary relationships, offering fresh insights into the tree of life.
For the first time, researchers have combined protein shape data with genomic sequences to enhance the accuracy of evolutionary trees. These tools are crucial for tracking the spread of pathogens, exploring life’s origins, and developing innovative disease treatments.
The method is notably effective even when applied to proteins with predicted structures that have not been experimentally verified. This breakthrough has significant implications, particularly as initiatives like AlphaFold 2 continue to generate vast amounts of structural data.
With 250 million known protein sequences but only 210,000 experimentally determined structures, the study highlights the enormous potential for this method as projects like the Earth BioGenome Project promise to produce billions more protein sequences in the coming years.
Overcoming Evolutionary Challenges with Protein Structures
For decades, biologists have used evolutionary or phylogenetic trees to map how species and genes diverge from common ancestors. These trees are traditionally built by comparing DNA or protein sequences to identify similarities and differences. However, one of the greatest challenges in reconstructing ancient relationships is saturation. Over long periods, genomic sequences can change so drastically that they no longer resemble their ancestral forms, erasing signals of shared origins.
“The issue of saturation dominates phylogeny and represents the main obstacle to the reconstruction of ancient relationships,” explained Dr. Cedric Notredame, lead author of the study and researcher at the Centre for Genomic Regulation. “It is like the erosion of an ancient text. The letters become indistinct, and the message is lost.”
To address this issue, the team turned to the physical structures of proteins. The intricate shapes that proteins fold into—critical to their cellular functions—are more conserved over evolutionary time than the sequences themselves. These structures tend to change more slowly, preserving ancestral features for longer periods.
Measuring Protein Structures to Build Evolutionary Trees
The researchers hypothesized that intra-molecular distances (IMDs)—the distances between pairs of amino acids within a protein—could reveal how much protein structures diverge over time. They analyzed a vast collection of proteins with known structures from various species and calculated IMDs to create phylogenetic trees.
The results were promising. Trees based on structural data closely resembled those derived from genetic sequences but proved significantly less prone to saturation. Even when genetic sequences had diverged extensively, structural data continued to provide consistent signals.
Recognizing the value of both approaches, the team developed a combined strategy. This method improved the reliability of tree branches while better distinguishing correct evolutionary relationships from artifacts.
“It is akin to having two witnesses describe an event from different angles,” said Dr. Leila Mansouri, study co-author from the Centre for Genomic Regulation. “Each provides unique details, but together they give a fuller, more accurate account.”
Practical Applications in Human Health and Beyond
One real-world application of the combined approach is understanding the evolutionary relationships of kinases—proteins central to numerous cellular processes.
“The genome of most mammals, including humans, contains about 500 protein kinases that regulate nearly all aspects of our biology,” explained Dr. Notredame. “These kinases are major targets for cancer therapy, with drugs like imatinib for humans or toceranib for dogs.”
Over the past billion years, duplications in kinases have contributed to their diversity. Dr. Notredame added, “Within the human genome, the most distantly related kinases are about a billion years apart. They duplicated in the common ancestor of the common ancestor of our common ancestor.”
However, the vast evolutionary timescales make building accurate kinase evolutionary trees a daunting task. “As imperfect as it may be, the kinase evolutionary tree is widely used to understand how kinases interact with drugs. Improving this tree—or those of other important protein families—would be a significant advance for human health,” Dr. Notredame said.
Beyond cancer research, the study’s method has broader implications. More precise evolutionary trees could illuminate how diseases evolve, aiding the development of treatments and vaccines. They could also guide the engineering of new enzymes for biotechnology, shed light on the origins of complex traits, and track species’ responses to climate change.
Source:
Journal reference:
Baltzis, A., et al. (2025) Multi strap: boosting phylogenetic analyses with structural information. Nature Communications. doi.org/10.1038/s41467-024-55264-0.