Researchers may be able to learn more about the inner workings of cells with the aid of a novel artificial intelligence (AI) technology that makes logical deductions regarding the function of unidentified proteins.
Developed by KAUST bioinformatics researcher Maxat Kulmanov and his team, the tool surpasses current analytical methods in predicting protein functions and can even analyze proteins lacking clear matches in existing datasets.
The DeepGO-SE model leverages massive language models, akin to those employed by Chat-GPT and other generative AI tools. Then, using broad scientific principles regarding the functioning of proteins, it applies logical entailment to derive significant inferences about molecular activities.
By building models of a portion of the world in this example, protein function and deducing the most likely scenario from what makes sense and logic in these world models, it effectively gives computers the ability to interpret results logically.
This method has many applications, especially when it is necessary to reason over data and hypotheses generated by a neural network or another machine learning model.”
Robert Hoehndorf, Head of Bio-Ontology Research Group, King Abdullah University of Science and Technology
Kulmanov and Hoehndorf worked with Stefan Arold of KAUST and scientists at the Swiss Institute of Bioinformatics to evaluate the model's capacity to interpret the roles of proteins whose physiological roles are unclear.
With the help of information on a protein's known connections with other proteins and amino acid sequences, the tool was able to accurately predict the molecular activities of an obscure protein. In an international competition of function prediction tools, DeepGO-SE was ranked in the top twenty out of over 1,600 algorithms, demonstrating the accuracy of the model.
With this technique, the KAUST team is now able to explore the roles of mysterious proteins found in plants that survive in the harsh Saudi Arabian desert. They would like other researchers to use the method and expect that the results would be helpful in identifying new proteins for biotechnological applications.
DeepGO-SE’s ability to analyze uncharacterized proteins can facilitate tasks such as drug discovery, metabolic pathway analysis, disease associations, protein engineering, screening for specific proteins of interest, and more.”
Kulmanov, Research Scientist, King Abdullah University of Science and Technology
Source:
Journal reference:
Kulmanov, M., et.al. (2024) Protein function prediction as approximate semantic entailment. Nature Machine Intelligence. doi.org/10.1038/s42256-024-00795-w