The “Unknome” Dataset Contains Protein Sequences That Are Still Unknown

Download PDF Copy

Reviewed

Reviewed by Danielle Ellis, B.Sc.Aug 9 2023

Researchers from the UK anticipate that a new, freely accessible database they have developed will get smaller rather than bigger over time. This is due to the fact that it is a compilation of the many understudied proteins that are encoded by genes in the human genome, whose existence is recognized but whose activities are mostly unknown.

The “unknome” database, created by Sean Munro of the MRC Laboratory of Molecular Biology in Cambridge, England, Matthew Freeman of the Dunn School of Pathology at the University of Oxford, and others, is detailed in the open access journal PLOS Biology. They discovered via their own investigation that the bulk of the proteins in the database supports critical biological processes, such as growth and resistance to stress.

The human genome’s sequence has shown that it presumably encodes hundreds of protein sequences, albeit their identities and functions are yet unknown. This is due to several factors, such as the propensity to concentrate limited research funds on targets that are already well-known and the dearth of tools, such as antibodies, to ask cells about the function of these proteins.

The hazards of disregarding these proteins, however, are substantial, according to the scientists, since it is probable that some if not many, play crucial roles in crucial cell processes and could provide information as well as potential targets for therapeutic intervention.

The unknome database (www.unknome.org), which assigns each protein a “knownness” score based on information from the scientific literature about function, conservation across species, subcellular compartmentalization, and other factors, was developed by the authors to encourage more rapid exploration of such proteins.

Thousands of proteins have almost no known information based on this method. Along with proteins from the human genome, model organism proteins are also mentioned. The database is freely accessible to everyone and customizable, enabling users to assign their own weights to various components and produce their own set of knownness scores to organize their own research.

The scientists selected 260 human genes with knownness scores of 1 or less in both species, indicating that little or no information was known about them, to evaluate the database's usefulness. These genes were selected because they had equivalent genes in flies and were found in humans.

The discovery that a significant portion of them contributed to crucial processes influencing fertility, development, tissue growth, protein quality control, or stress resistance was made by partial knockdowns or tissue-specific knockdowns of the genes for many of them, for which a complete knockout of the gene was incompatible with life in the fly.

The findings imply that millions of fly genes still lack even the most fundamental understanding after decades of in-depth research, and the same is undoubtedly true for the human genome.

These uncharacterized genes have not deserved their neglect. Our database provides a powerful, versatile and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents.”

Sean Munro, Group Leader, Medical Research Council Laboratory of Molecular Biology

Munro added, “The role of thousands of human proteins remains unclear and yet research tends to focus on those that are already well understood. To help address this we created an Unknome database that ranks proteins based on how little is known about them, and then performed functional screens on a selection of these mystery proteins to demonstrate how ignorance can drive biological discovery.”

Source:

PLOS

Journal reference:

Rocha, J. J., et al. (2023). Functional unknomics: Systematic screening of conserved genes of unknown function. PLOS Biology. doi.org/10.1371/journal.pbio.3002222

Posted in: Genomics