Researchers develop a universal AI algorithm for deep clean-up of single cell genomic data

Download PDF Copy

Reviewed

Reviewed by Emily Henderson, B.Sc.Apr 11 2022

A single cell’s genome or transcriptome can reveal considerably more data about its place in biological systems than sequencing a full batch of cells, just as interviewing a single person about their health will provide specialized, personalized information hard to obtain from a big poll.

Researchers develop a universal AI algorithm for deep clean-up of single cell genomic data — AutoClass integrates a classifier to a regular autoencoder, as to fully reconstruct scRNA-Seq data. a AutoClass consists a regular autoencoder and a classifier branch from the bottleneck layer. The input raw expression data is compressed in the encoder, and reconstructed in the decoder, the classifier branch helps to retain signal in data compression. The output of the autoencoder is the desired imputed data (see Methods for details). b t-SNE plots of Dataset 1 without dropout, with dropout, with dropout imputed by a regular autoencoder and AutoClass. Image Credit: The University of North Carolina at Charlotte

However, until recently, the technology to obtain high resolution genomic data did not exist, and there was no dependable means to assure that the data was of high quality and utility.

Dr Weijun Luo and Dr Cory Brouwer of the University of North Carolina at Charlotte created an artificial intelligence system to “clean” noisy single-cell RNA sequencing (scRNA-Seq) information. The research was released in Nature Communications on April 7^th, 2022.

Since the Human Genome Project in the 1990s, researchers have been examining genomes to understand the secrets of life, from discovering specific genes associated with sickle cell anemia and breast cancer to developing mRNA vaccines in the continuing COVID-19 pandemic.

Technology has advanced since those early days of batching thousands of cells together to decrypt the millions of base pairs that make up genetic data, and in 2009, scientists developed scRNA-Seq, which only sequences the transcriptome, or the articulated portion of the genome, in a single living cell. It is now commonly used in medical research.

Unfortunately, scRNA-Seq data is noisy and prone to mistakes and poor quality. When a single cell is sequenced rather than a large number of cells, “dropouts”—missing genes in the data—occur often.

A single cell, like a single person, may have its unique health difficulties or be in an inconvenient stage of its life cycle—it may have just split or be on the verge of cell death—which might lead to additional errors or technical differences in the scRNA-Seq data. Aside from single-cell concerns, genomic profiling is frequently associated with “normal” sequencing mistakes.

Before the data can be used or comprehended, all of these flaws must be “cleaned” from it, which is where the new AI algorithm comes in.

The AutoClass algorithm is a step forward from current statistical approaches. Most present approaches presume that errors (or noises) will follow a predetermined distribution, indicating how likely they will occur and how large they can be.

Existing approaches are frequently unable to adequately clean data in order to disclose biological signals, and may even introduce additional mistakes as a result of incorrect data distribution assumptions. AutoClass, on the other hand, has no distributional assumptions and, as a result, can efficiently remedy a wide range of sounds or technical variances.

AutoClass is an AI algorithm based on a special deep neural network designed to maximize both noise removal and signal retention. The AI teaches itself to differentiate signal vs noise in the data by seeing enough data. Usually the more data it sees, the better it performs.”

Dr Weijun Luo, University of North Carolina at Charlotte

Dr Luo and his colleagues demonstrated in this paper that AutoClass can rebuild high-quality scRNA-Seq data and improve downstream analysis in a variety of ways. AutoClass is also robust, performing well in a variety of scRNA-Seq data formats and situations.

AutoClass is extremely efficient and scalable, and it performs well with data of various sample sizes and feature sizes. It also operates smoothly on a standard PC or laptop.

Source:

The University of North Carolina at Charlotte

Journal reference:

Li, Hui., et al. (2022) A universal deep neural network for in-depth cleaning of single-cell RNA-Seq data. Nature Communications. doi.org/10.1038/s41467-022-29576-y.

Posted in: Cell Biology | Genomics | Life Sciences News