P10K Database: A Comprehensive Genetic Resource Database for Protists

Protists are single-celled eukaryotic creatures that live in aquatic settings. They include unicellular algae and protozoans. They are essential sources of nourishment for humans, bioenergy, and food for aquatic animals. They also play important roles in the carbon cycle as primary producers and oxygen generators. Still, they can also cause problems, bringing up toxic algal blooms and red tides, functioning as both infections and helpful allies in symbiotic interactions.

Image Credit: Ekky Ilham/Shutterstock.com

Image Credit: Ekky Ilham/Shutterstock.com

More than 60,000 protist species have been recognized according to the NCBI taxonomy system. The Protist 10,000 Genomes Project (P10K) was initiated in December 2019 by a team of scientists under the direction of the Chinese Academy of Sciences (CAS) Institute of Hydrobiology (IHB). The creation of an extensive genetic resource database for protists is the main goal of this initiative.

The first dataset from the P10K project was recently released by the teams of Prof. Wei Miao of the IHB and Prof. Zhang Zhang of the Beijing Institute of Genomics of CAS (China National Center for Bioinformation). It can be accessed at https://ngdc.cncb.ac.cn/p10k/, and the corresponding study was published in Nucleic Acids Research.

The first data set to be made public from the P10K is a complete collection of 2,959 protist datasets, including 1,601 genomes and 1,358 transcriptomes. 1,858 of these datasets were combined from openly accessible databases. With a major focus on ciliates, the P10K team performed new sequencing for 1,101 datasets. The protist dataset grew by a significant 37% in total size thanks to the newly sequenced data.

The P10K team created a standardized analysis workflow specifically designed for protist single-cell sequencing data to address the analytical problems presented by large-scale single-cell omics data.

The procedures of assembly, decontamination, species identification, gene annotation, and assessment are all included in this pipeline. According to quality evaluations, the genomes annotated using this process have a comparable percentage of high- and medium-quality data to those found in public databases.

In addition to advancing studies on the variety, eukaryotic origins, and interactions amongst microbes, the P10K database will support the use of protist genetic resources in ecological conservation, pollution degradation, nutrition, health, and illness prevention. By enabling the identification of planktonic species using environmental DNA (eDNA), the database will aid in the evaluation of the health of aquatic ecosystems.

Source:
Journal reference:

Gao, X., et al. (2023). The P10K database: a data portal for the protist 10 000 genomes project. Nucleic Acids Research. doi.org/10.1093/nar/gkad1179

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Biobank Data can Provide Insights into Genetic Architecture of Drug Response