A single cell’s genome or transcriptome can reveal considerably more data about its place in biological systems than sequencing a full batch of cells, just as interviewing a single person about their health will provide specialized, personalized information hard to obtain from a big poll.
However, until recently, the technology to obtain high resolution genomic data did not exist, and there was no dependable means to assure that the data was of high quality and utility.
Dr Weijun Luo and Dr Cory Brouwer of the University of North Carolina at Charlotte created an artificial intelligence system to “clean” noisy single-cell RNA sequencing (scRNA-Seq) information. The research was released in Nature Communications on April 7th, 2022.
Since the Human Genome Project in the 1990s, researchers have been examining genomes to understand the secrets of life, from discovering specific genes associated with sickle cell anemia and breast cancer to developing mRNA vaccines in the continuing COVID-19 pandemic.
Technology has advanced since those early days of batching thousands of cells together to decrypt the millions of base pairs that make up genetic data, and in 2009, scientists developed scRNA-Seq, which only sequences the transcriptome, or the articulated portion of the genome, in a single living cell. It is now commonly used in medical research.
Unfortunately, scRNA-Seq data is noisy and prone to mistakes and poor quality. When a single cell is sequenced rather than a large number of cells, “dropouts”—missing genes in the data—occur often.
A single cell, like a single person, may have its unique health difficulties or be in an inconvenient stage of its life cycle—it may have just split or be on the verge of cell death—which might lead to additional errors or technical differences in the scRNA-Seq data. Aside from single-cell concerns, genomic profiling is frequently associated with “normal” sequencing mistakes.
Before the data can be used or comprehended, all of these flaws must be “cleaned” from it, which is where the new AI algorithm comes in.
The AutoClass algorithm is a step forward from current statistical approaches. Most present approaches presume that errors (or noises) will follow a predetermined distribution, indicating how likely they will occur and how large they can be.
Existing approaches are frequently unable to adequately clean data in order to disclose biological signals, and may even introduce additional mistakes as a result of incorrect data distribution assumptions. AutoClass, on the other hand, has no distributional assumptions and, as a result, can efficiently remedy a wide range of sounds or technical variances.
AutoClass is an AI algorithm based on a special deep neural network designed to maximize both noise removal and signal retention. The AI teaches itself to differentiate signal vs noise in the data by seeing enough data. Usually the more data it sees, the better it performs.”
Dr Weijun Luo, University of North Carolina at Charlotte
Dr Luo and his colleagues demonstrated in this paper that AutoClass can rebuild high-quality scRNA-Seq data and improve downstream analysis in a variety of ways. AutoClass is also robust, performing well in a variety of scRNA-Seq data formats and situations.
AutoClass is extremely efficient and scalable, and it performs well with data of various sample sizes and feature sizes. It also operates smoothly on a standard PC or laptop.
Source:
Journal reference:
Li, Hui., et al. (2022) A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data. Nature Communications. doi.org/10.1038/s41467-022-29576-y.