Protists are single-celled eukaryotic creatures that live in aquatic settings. They include unicellular algae and protozoans. They are essential sources of nourishment for humans, bioenergy, and food for aquatic animals. They also play important roles in the carbon cycle as primary producers and oxygen generators. Still, they can also cause problems, bringing up toxic algal blooms and red tides, functioning as both infections and helpful allies in symbiotic interactions.
More than 60,000 protist species have been recognized according to the NCBI taxonomy system. The Protist 10,000 Genomes Project (P10K) was initiated in December 2019 by a team of scientists under the direction of the Chinese Academy of Sciences (CAS) Institute of Hydrobiology (IHB). The creation of an extensive genetic resource database for protists is the main goal of this initiative.
The first dataset from the P10K project was recently released by the teams of Prof. Wei Miao of the IHB and Prof. Zhang Zhang of the Beijing Institute of Genomics of CAS (China National Center for Bioinformation). It can be accessed at https://ngdc.cncb.ac.cn/p10k/, and the corresponding study was published in Nucleic Acids Research.
The first data set to be made public from the P10K is a complete collection of 2,959 protist datasets, including 1,601 genomes and 1,358 transcriptomes. 1,858 of these datasets were combined from openly accessible databases. With a major focus on ciliates, the P10K team performed new sequencing for 1,101 datasets. The protist dataset grew by a significant 37% in total size thanks to the newly sequenced data.
The P10K team created a standardized analysis workflow specifically designed for protist single-cell sequencing data to address the analytical problems presented by large-scale single-cell omics data.
The procedures of assembly, decontamination, species identification, gene annotation, and assessment are all included in this pipeline. According to quality evaluations, the genomes annotated using this process have a comparable percentage of high- and medium-quality data to those found in public databases.
In addition to advancing studies on the variety, eukaryotic origins, and interactions amongst microbes, the P10K database will support the use of protist genetic resources in ecological conservation, pollution degradation, nutrition, health, and illness prevention. By enabling the identification of planktonic species using environmental DNA (eDNA), the database will aid in the evaluation of the health of aquatic ecosystems.
Source:
Journal reference:
Gao, X., et al. (2023). The P10K database: a data portal for the protist 10 000 genomes project. Nucleic Acids Research. doi.org/10.1093/nar/gkad1179