Interpreting Next-Generation Sequencing Results: Key Considerations

Download PDF Copy

By Dr. Priyom Bose, Ph.D.Reviewed by Lily Ramsey, LLM

Next Generation Sequencing (NGS) is a relatively new technique for sequencing nucleic acids and detecting genomic mutations.¹

This technology is based on the advantages of sequencing chemistries, various sequencing matrices, and bioinformatics. NGS has revolutionized genomics due to its high throughput, speed, and scalability.

Interpreting NGS results accurately is important. This depends on the quality of raw sequencing data, which can be affected by various factors, such as library preparation and data quality.

Image Credit: Elpisterra/Shutterstock.com Image Credit: Elpisterra/Shutterstock.com

Basic Steps in NGS

The four key steps of NGS involve nucleic acid extraction, library preparation, sequencing, and data analysis and interpretation.² These steps are briefly discussed below:

Nucleic acid extraction

Nucleic acids, i.e., DNA or RNA, are isolated from biological samples, such as individual cells, bulk tissues, or biofluids.³ The purity of the extracted genetic materials is assessed via UV spectrophotometry, and fluorometric methods are used for nucleic acid quantitation.⁴

Library preparation

Before genomic DNA or cDNA (synthesized from RNA) are sequenced, they are fragmented, end-repaired, and made into sequencing libraries. Library preparation is associated with converting genomic DNA or cDNA into pools of DNA fragments with adapter sequences compatible with a specific NGS sequencing platform and indexing barcodes for identifying individual samples.²

Based on the type of sequencing platform and downstream analysis, the library preparation protocol is selected. Two commonly used methods are ligation-based library preparation and amplicon library preparation.⁵

Sequencing

Nucleotides are read on a sequencer (e.g., Illumina) at a read length or depth based on a particular application.⁶ Read length refers to the length of a DNA fragment read on a sequencer, while depth refers to the number of reads obtained per sample. Many sequencers are available that support a broad range of throughputs and applications.

Data analysis and interpretation

Bioinformatic tools interpret the reads obtained from sequencers.⁷ Many current NGS platforms have in-built data analysis systems that simplify data analysis. For instance, the Illumina Connected Software offers a versatile and accessible data analysis solution that supports high-end research.

Learn more about Illumina

Critical Factors for NGS Data Interpretation

To ensure data reproducibility, standard protocols for sample preparation, library preparation, and sequencing must be followed.⁸

This strategy supports the generation of high-quality data for downstream analysis. Quality control (QC) and pre-processing of NGS data are essential steps to interpret downstream analyses accurately.⁹ Some key considerations of these steps are discussed below:

Quality Control

Quality control (QC) is performed at various stages of the NGS workflow to identify potential problems that may affect the accuracy of downstream analyses. QC involves assessing data quality metrics, adapter contamination, and removing low-quality reads. The use of multiple QC tools ensures the generation of high-quality data.

Quality metrices provide information about the overall data quality concerning sequencing depth, read length, and base quality. Bioinformatic tools, such as FastQC and HTSQualC, are used to assess the quality of a given set of sequencing reads.¹⁰

Adapter contamination can be detected and removed from the reads by various tools, including Trimmomatic and Cutadapt.

Typically, adapter contamination occurs when adapter sequences are used in library preparation and are not completely removed from the sequencing data. It is important to detect and remove adapter contamination because it may lead to false positives, which could affect the accuracy of downstream analyses.¹¹

Low-quality data reads contain different types of sequencing errors (e.g., phasing errors, base-calling errors, insertion-deletion errors), which may influence the accuracy of downstream analyses. Bioinformatic tools, such as Trimmomatic and Cutadapt, remove low-quality reads based on quality score thresholds.

Pre-processing of NGS Data

Pre-processing of NGS data ensures its suitability for downstream analyses, such as differential expression analysis, variant calling, and functional annotation. Data pre-processing involves several steps, including transcript quantification, read alignment and differential expression analysis.¹²

RNA-sequence data contains tens of millions of short-sequenced reads from different transcripts. Transcriptomics quantification estimates the abundance of transcripts from RNA-seq data.

Several bioinformatic tools (e.g., Kallisto, and Salmon) are designed to assess transcript abundance based on different algorithms.¹³ Scientists select a specific tool for transcript quantification depending on the reference transcriptome, the type of sequencing data, and the downstream analyses.

Read alignment enables the detection of the differences between the read and the reference genome. This step involves mapping the sequencing reads to a reference genome/transcriptome, which is critical for interpreting biological data.

Several bioinformatic tools, such as Bowtie, and STAR, are available to align reads to the reference.

Differential expression analysis enables the identification of differentially expressed genes in varied conditions. Bioinformatic tools, such as edgeR, DESeq2, and limma, are used to identify these genes.

NGS Data Analysis and Representation

High throughput NGS data is analyzed via cleaning, data exploration, visualization, and deepening. NGS data cleaning is associated with rescuing meaningful biological data from raw data fresh off the sequencer.

Several computational algorithms are designed to remove small sequences and adapters from the library. Subsequently, the data quality is assessed via the Phred score, indicating the likelihood of incorrect bases. This process enhances the confidence of high-quality data analysis.

NGS data can be graphically interpreted using Circos or MethGet. NGS data visualization helps extract meaningful information from a high data volume.¹⁴

This process enables data summarization and highlights important information. For example, heatmaps describe the differences in gene expression between two or more treatments. Network graphs are used for co-relation expression analyses.

Circular layouts are commonly used to display whole genome sequencing data. This layout represents the overall presence of genes or genomes. In epigenomic profiling studies, histograms and heatmaps are commonly used to understand the differences in methylation rates.

Whole genome sequence data is used to perform variant analyses, sequencing of plasmids in cloning protocols, and microsatellite marker detection. Different bioinformatic tools are used for each analysis, for example, Platypus is used for variant analysis.

Applications of NGS

NGS has wide-ranging applications that include the identification of novel pathogens, the analysis of epigenetic factors, the discovery of novel RNA variants, and the quantification of mRNAs for gene expression analysis.

Furthermore, it is also used to analyze epigenetic factors, such as genome-wide DNA methylation. The NGS technique enables the study of the human microbiome and identifying tumor subclones.

A key advantage of NGS for clinical applications is its ability to simultaneously analyze multiple targets, i.e., hundreds and thousands or even millions of targets. For instance, multiple mutations could be present in cancer patients with any given tumor.

Conventional molecular techniques are less accurate and require a larger amount of tissue for identifying these mutations.

In contrast, NGS technology requires significantly fewer tissue samples and performs high-throughput analyses to identify cellular mutations rapidly.

References

Qin D. Next-generation sequencing and its clinical application. Cancer Biol Med. 2019;16(1):4-10. doi: 10.20892/j.issn.2095-3941.2018.0055.
Head SR, et al. Library construction for next-generation sequencing: overviews and challenges. Biotechniques. 2014;56(2):61-4, 66, 68, passim. doi: 10.2144/000114133.
Mäki A, et al. Sample Preservation, DNA or RNA Extraction and Data Analysis for High-Throughput Phytoplankton Community Sequencing. Front Microbiol. 2017;8:1848. doi: 10.3389/fmicb.2017.01848.
Bruijns B, et al. Performance of Spectrophotometric and Fluorometric DNA Quantification Methods. Analytica. 2022; 3(3):371-384. https://doi.org/10.3390/analytica3030025
Chiniquy J, et al. Fluorescent amplification for next generation sequencing (FA-NGS) library preparation. BMC Genomics. 2020;21(1):85. doi: 10.1186/s12864-020-6481-8.
Nakamura K, et al. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011; 39(13):e90. doi: 10.1093/nar/gkr344.
Pereira R, Oliveira J, Sousa M. Bioinformatics and Computational Tools for Next-Generation Sequencing Analysis in Clinical Genetics. J Clin Med. 2020;9(1):132. doi: 10.3390/jcm9010132.
Socea JN, Stone VN, Qian X, Gibbs PL, Levinson KJ. Implementing laboratory automation for next-generation sequencing: benefits and challenges for library preparation. Front Public Health. 2023; 11:1195581. doi: 10.3389/fpubh.2023.1195581.
Zhou Q, Su X, Wang A, Xu J, Ning K. QC-Chain: fast and holistic quality control method for next-generation sequencing data. PLoS One. 2013;8(4):e60234. doi: 10.1371/journal.pone.0060234.
Bedre R, Avila C, Mandadi K. HTSQualC is a flexible and one-step quality control software for high-throughput sequencing data analysis. Sci Rep. 2021;11(1):18725. doi: 10.1038/s41598-021-98124-3.
Liao Y, Shi W. Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level. NAR Genom Bioinform. 2020;2(3):lqaa068. doi: 10.1093/nargab/lqaa068.
Federico A, et al. Transcriptomics in Toxicogenomics, Part II: Preprocessing and Differential Expression Analysis for High Quality Data. Nanomaterials (Basel). 2020;10(5):903. doi: 10.3390/nano10050903.
Patro R, et al. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14(4):417-419. doi: 10.1038/nmeth.4197.
Georgiou G, van Heeringen SJ. fluff: exploratory analysis and visualization of high-throughput sequencing data. PeerJ. 2016;4:e2209. doi: 10.7717/peerj.2209.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Bose, Priyom. (2024, August 02). Interpreting Next-Generation Sequencing Results: Key Considerations. AZoLifeSciences. Retrieved on April 24, 2025 from https://www.azolifesciences.com/article/Interpreting-Next-Generation-Sequencing-Results-Key-Considerations.aspx.
MLA
Bose, Priyom. "Interpreting Next-Generation Sequencing Results: Key Considerations". AZoLifeSciences. 24 April 2025. <https://www.azolifesciences.com/article/Interpreting-Next-Generation-Sequencing-Results-Key-Considerations.aspx>.
Chicago
Bose, Priyom. "Interpreting Next-Generation Sequencing Results: Key Considerations". AZoLifeSciences. https://www.azolifesciences.com/article/Interpreting-Next-Generation-Sequencing-Results-Key-Considerations.aspx. (accessed April 24, 2025).
Harvard
Bose, Priyom. 2024. Interpreting Next-Generation Sequencing Results: Key Considerations. AZoLifeSciences, viewed 24 April 2025, https://www.azolifesciences.com/article/Interpreting-Next-Generation-Sequencing-Results-Key-Considerations.aspx.