DeepUrfold: A New AI Model Uncovers Hidden Protein Relationships and Redefines Fold Space

Traditional methods of grouping proteins have relied largely on structural similarity to understand their evolution, function, and folding. However, this approach can often miss subtle similarities between distantly related proteins.

In a recent study published in Nature Communications, researchers from the University of Virginia introduced a novel method of modeling protein relationships that combines the structure, sequence, and biophysical properties of proteins.

This method, called DeepUrfold, indicated that the protein fold space was continuous rather than distinctly divided and uncovered previously unknown evolutionary relationships between proteins.

Study: Deep generative models of protein structure uncover distant relationships across a continuous fold space. Image Credit: unoL/Shutterstock.comStudy: Deep generative models of protein structure uncover distant relationships across a continuous fold space. Image Credit: unoL/Shutterstock.com

Background

The evolution of proteins is a complex process that continues to be extensively researched. It is believed that proteins evolved from small peptide fragments to form more complex domains, with natural processes such as recombination and mutation playing a major role.

Recent advances in artificial intelligence, including the use of deep learning models, have provided new opportunities to explore the fold space of proteins and map how different protein structures are related to one another.

Studies have found that even distant proteins can share common structural fragments. However, traditional methods of studying protein structures that group proteins into categories based on structural similarity often miss the subtle structural links between distant proteins.

About the Study

The present study introduced a method called the DeepUrfold framework that uses advanced machine learning to analyze the structure, function, and evolutionary history of proteins, along with their biophysical properties, to identify the relationship between proteins.

The researchers created a protein structure dataset and trained the deep learning models to predict the protein structure.

The dataset was created using the Prop3D computational toolkit, which included protein domains from 20 different protein superfamilies. The protein structures for the dataset were created using these domains, which were then filled in with missing residues and atoms.

Various properties of the proteins, such as hydrophobicity, secondary structure, and solvent accessibility, were also calculated for the proteins in the dataset. The dataset created through Prop3D was divided into three parts. Eighty percent of the dataset was used for training the deep learning models, and the remaining two ten percent were used for validation and testing.

Each atom in the protein was categorized based on secondary structure, atom type, and charge, and the information was converted into a format readable by the model. The proteins were then represented in three dimensions on a grid. Furthermore, the protein structures underwent random rotations during the model training to avoid bias associated with protein orientation.

The researchers then used a deep learning model called the 3D convolutional neural network along with a variational autoencoder to model the protein structures. Furthermore, they used oversampling and models known as “one-class classifiers” to account for imbalanced protein structure representation across the 20 superfamilies.

To assess the model's performance, the precision-recall curve and area under the receiver operating characteristic (ROC) curve were calculated.

Furthermore, the model’s sensitivity to protein structure was tested by generating permuted protein structures and comparing them to the original ones. This allowed the researchers to evaluate how well the model could distinguish between different protein topologies.

Major Findings

The study demonstrated that DeepUrfold could detect and analyze the structural similarities between distantly related proteins without relying on traditional methods such as specific topology and alignment. This deep learning-based framework offered a more sensitive approach to understanding how proteins were related beyond just the amino acid sequence or secondary structure.

DeepUrfold effectively compared proteins by creating a simplified version of the protein structure. This model was then used to detect similarities between proteins from different families and superfamilies. The model did not rely on rigid structural criteria, which allowed it to capture relationships that go beyond the standard hierarchical classification of proteins.

Additionally, the deep learning model used latent space similarity metric to group proteins into communities based on structural features, which the researchers believe indicated evolutionary connections and similarities.

The study also showed that proteins that were previously classified into distinct groups based on traditional methods of classification could overlap in structure, suggesting that the protein fold space was more of a continuum than distinct categories.

Furthermore, the model also detected recurring structural fragments known as ‘unfolds,' primitive structural motifs thought to represent early building blocks of protein topology. These folds could explain how complex proteins evolved from simpler components and provide insights into the relatedness between distant proteins of diverse shapes and structures.

Conclusions

To summarize, the findings showed that the deep learning-based model DeepUrfold could uncover relationships between distant proteins based on structural, functional, and biophysical information.

The study also detected recurring fragments known as unfolds across various proteins, indicating that different families of proteins shared common structures, furthering our understanding of protein evolution.

Journal reference:

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Sidharthan, Chinta. (2024, October 21). DeepUrfold: A New AI Model Uncovers Hidden Protein Relationships and Redefines Fold Space. AZoLifeSciences. Retrieved on November 21, 2024 from https://www.azolifesciences.com/news/20241021/DeepUrfold-A-New-AI-Model-Uncovers-Hidden-Protein-Relationships-and-Redefines-Fold-Space.aspx.

  • MLA

    Sidharthan, Chinta. "DeepUrfold: A New AI Model Uncovers Hidden Protein Relationships and Redefines Fold Space". AZoLifeSciences. 21 November 2024. <https://www.azolifesciences.com/news/20241021/DeepUrfold-A-New-AI-Model-Uncovers-Hidden-Protein-Relationships-and-Redefines-Fold-Space.aspx>.

  • Chicago

    Sidharthan, Chinta. "DeepUrfold: A New AI Model Uncovers Hidden Protein Relationships and Redefines Fold Space". AZoLifeSciences. https://www.azolifesciences.com/news/20241021/DeepUrfold-A-New-AI-Model-Uncovers-Hidden-Protein-Relationships-and-Redefines-Fold-Space.aspx. (accessed November 21, 2024).

  • Harvard

    Sidharthan, Chinta. 2024. DeepUrfold: A New AI Model Uncovers Hidden Protein Relationships and Redefines Fold Space. AZoLifeSciences, viewed 21 November 2024, https://www.azolifesciences.com/news/20241021/DeepUrfold-A-New-AI-Model-Uncovers-Hidden-Protein-Relationships-and-Redefines-Fold-Space.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Predicting Disease-Causing Protein Interactions with PIONEER