Interpreting Next-Generation Sequencing Results: Key Considerations

Next Generation Sequencing (NGS) is a relatively new technique for sequencing nucleic acids and detecting genomic mutations.1

This technology is based on the advantages of sequencing chemistries, various sequencing matrices, and bioinformatics. NGS has revolutionized genomics due to its high throughput, speed, and scalability.

Interpreting NGS results accurately is important. This depends on the quality of raw sequencing data, which can be affected by various factors, such as library preparation and data quality.

Image Credit: Elpisterra/Shutterstock.comImage Credit: Elpisterra/Shutterstock.com

Basic Steps in NGS

The four key steps of NGS involve nucleic acid extraction, library preparation, sequencing, and data analysis and interpretation.2 These steps are briefly discussed below:

Nucleic acid extraction

Nucleic acids, i.e., DNA or RNA, are isolated from biological samples, such as individual cells, bulk tissues, or biofluids.3 The purity of the extracted genetic materials is assessed via UV spectrophotometry, and fluorometric methods are used for nucleic acid quantitation.4

Library preparation

Before genomic DNA or cDNA (synthesized from RNA) are sequenced, they are fragmented, end-repaired, and made into sequencing libraries. Library preparation is associated with converting genomic DNA or cDNA into pools of DNA fragments with adapter sequences compatible with a specific NGS sequencing platform and indexing barcodes for identifying individual samples.2

Based on the type of sequencing platform and downstream analysis, the library preparation protocol is selected. Two commonly used methods are ligation-based library preparation and amplicon library preparation.5

Sequencing

Nucleotides are read on a sequencer (e.g., Illumina) at a read length or depth based on a particular application.6 Read length refers to the length of a DNA fragment read on a sequencer, while depth refers to the number of reads obtained per sample. Many sequencers are available that support a broad range of throughputs and applications.

Data analysis and interpretation

Bioinformatic tools interpret the reads obtained from sequencers.7 Many current NGS platforms have in-built data analysis systems that simplify data analysis. For instance, the Illumina Connected Software offers a versatile and accessible data analysis solution that supports high-end research.

Learn more about Illumina

Critical Factors for NGS Data Interpretation

To ensure data reproducibility, standard protocols for sample preparation, library preparation, and sequencing must be followed.8

This strategy supports the generation of high-quality data for downstream analysis. Quality control (QC) and pre-processing of NGS data are essential steps to interpret downstream analyses accurately.9 Some key considerations of these steps are discussed below:

Quality Control

Quality control (QC) is performed at various stages of the NGS workflow to identify potential problems that may affect the accuracy of downstream analyses. QC involves assessing data quality metrics, adapter contamination, and removing low-quality reads. The use of multiple QC tools ensures the generation of high-quality data.

Quality metrices provide information about the overall data quality concerning sequencing depth, read length, and base quality. Bioinformatic tools, such as FastQC and HTSQualC, are used to assess the quality of a given set of sequencing reads.10

Adapter contamination can be detected and removed from the reads by various tools, including Trimmomatic and Cutadapt.

Typically, adapter contamination occurs when adapter sequences are used in library preparation and are not completely removed from the sequencing data. It is important to detect and remove adapter contamination because it may lead to false positives, which could affect the accuracy of downstream analyses.11

Low-quality data reads contain different types of sequencing errors (e.g., phasing errors, base-calling errors, insertion-deletion errors), which may influence the accuracy of downstream analyses. Bioinformatic tools, such as Trimmomatic and Cutadapt, remove low-quality reads based on quality score thresholds.

Pre-processing of NGS Data

Pre-processing of NGS data ensures its suitability for downstream analyses, such as differential expression analysis, variant calling, and functional annotation. Data pre-processing involves several steps, including transcript quantification, read alignment and differential expression analysis.12

RNA-sequence data contains tens of millions of short-sequenced reads from different transcripts. Transcriptomics quantification estimates the abundance of transcripts from RNA-seq data.

Several bioinformatic tools (e.g., Kallisto, and Salmon) are designed to assess transcript abundance based on different algorithms.13 Scientists select a specific tool for transcript quantification depending on the reference transcriptome, the type of sequencing data, and the downstream analyses.

Read alignment enables the detection of the differences between the read and the reference genome. This step involves mapping the sequencing reads to a reference genome/transcriptome, which is critical for interpreting biological data.

Several bioinformatic tools, such as Bowtie, and STAR, are available to align reads to the reference.

Differential expression analysis enables the identification of differentially expressed genes in varied conditions. Bioinformatic tools, such as edgeR, DESeq2, and limma, are used to identify these genes.

Trending Technologies: A Guide to NGS Sequencers

NGS Data Analysis and Representation

High throughput NGS data is analyzed via cleaning, data exploration, visualization, and deepening. NGS data cleaning is associated with rescuing meaningful biological data from raw data fresh off the sequencer.

Several computational algorithms are designed to remove small sequences and adapters from the library. Subsequently, the data quality is assessed via the Phred score, indicating the likelihood of incorrect bases. This process enhances the confidence of high-quality data analysis.

NGS data can be graphically interpreted using Circos or MethGet. NGS data visualization helps extract meaningful information from a high data volume.14

This process enables data summarization and highlights important information. For example, heatmaps describe the differences in gene expression between two or more treatments. Network graphs are used for co-relation expression analyses.

Circular layouts are commonly used to display whole genome sequencing data. This layout represents the overall presence of genes or genomes. In epigenomic profiling studies, histograms and heatmaps are commonly used to understand the differences in methylation rates.

Whole genome sequence data is used to perform variant analyses, sequencing of plasmids in cloning protocols, and microsatellite marker detection. Different bioinformatic tools are used for each analysis, for example, Platypus is used for variant analysis.

Applications of NGS

NGS has wide-ranging applications that include the identification of novel pathogens, the analysis of epigenetic factors, the discovery of novel RNA variants, and the quantification of mRNAs for gene expression analysis.

Furthermore, it is also used to analyze epigenetic factors, such as genome-wide DNA methylation. The NGS technique enables the study of the human microbiome and identifying tumor subclones.

A key advantage of NGS for clinical applications is its ability to simultaneously analyze multiple targets, i.e., hundreds and thousands or even millions of targets. For instance, multiple mutations could be present in cancer patients with any given tumor.

Conventional molecular techniques are less accurate and require a larger amount of tissue for identifying these mutations.

In contrast, NGS technology requires significantly fewer tissue samples and performs high-throughput analyses to identify cellular mutations rapidly.

References

  1. Qin D. Next-generation sequencing and its clinical application. Cancer Biol Med. 2019;16(1):4-10. doi: 10.20892/j.issn.2095-3941.2018.0055.
  2. Head SR, et al. Library construction for next-generation sequencing: overviews and challenges. Biotechniques. 2014;56(2):61-4, 66, 68, passim. doi: 10.2144/000114133.
  3. Mäki A, et al. Sample Preservation, DNA or RNA Extraction and Data Analysis for High-Throughput Phytoplankton Community Sequencing. Front Microbiol. 2017;8:1848. doi: 10.3389/fmicb.2017.01848.
  4. Bruijns B, et al. Performance of Spectrophotometric and Fluorometric DNA Quantification Methods. Analytica. 2022; 3(3):371-384. https://doi.org/10.3390/analytica3030025
  5. Chiniquy J, et al. Fluorescent amplification for next generation sequencing (FA-NGS) library preparation. BMC Genomics. 2020;21(1):85. doi: 10.1186/s12864-020-6481-8.
  6. Nakamura K, et al. Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011; 39(13):e90. doi: 10.1093/nar/gkr344.
  7. Pereira R, Oliveira J, Sousa M. Bioinformatics and Computational Tools for Next-Generation Sequencing Analysis in Clinical Genetics. J Clin Med. 2020;9(1):132. doi: 10.3390/jcm9010132.
  8. Socea JN, Stone VN, Qian X, Gibbs PL, Levinson KJ. Implementing laboratory automation for next-generation sequencing: benefits and challenges for library preparation. Front Public Health. 2023; 11:1195581. doi: 10.3389/fpubh.2023.1195581.
  9. Zhou Q, Su X, Wang A, Xu J, Ning K. QC-Chain: fast and holistic quality control method for next-generation sequencing data. PLoS One. 2013;8(4):e60234. doi: 10.1371/journal.pone.0060234.
  10. Bedre R, Avila C, Mandadi K. HTSQualC is a flexible and one-step quality control software for high-throughput sequencing data analysis. Sci Rep. 2021;11(1):18725. doi: 10.1038/s41598-021-98124-3.
  11. Liao Y, Shi W. Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level. NAR Genom Bioinform. 2020;2(3):lqaa068. doi: 10.1093/nargab/lqaa068.
  12. Federico A, et al. Transcriptomics in Toxicogenomics, Part II: Preprocessing and Differential Expression Analysis for High Quality Data. Nanomaterials (Basel). 2020;10(5):903. doi: 10.3390/nano10050903.
  13. Patro R, et al. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14(4):417-419. doi: 10.1038/nmeth.4197.
  14. Georgiou G, van Heeringen SJ. fluff: exploratory analysis and visualization of high-throughput sequencing data. PeerJ. 2016;4:e2209. doi: 10.7717/peerj.2209.

Further Reading 

Last Updated: Aug 2, 2024

Dr. Priyom Bose

Written by

Dr. Priyom Bose

Priyom holds a Ph.D. in Plant Biology and Biotechnology from the University of Madras, India. She is an active researcher and an experienced science writer. Priyom has also co-authored several original research articles that have been published in reputed peer-reviewed journals. She is also an avid reader and an amateur photographer.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Bose, Priyom. (2024, August 02). Interpreting Next-Generation Sequencing Results: Key Considerations. AZoLifeSciences. Retrieved on November 21, 2024 from https://www.azolifesciences.com/article/Interpreting-Next-Generation-Sequencing-Results-Key-Considerations.aspx.

  • MLA

    Bose, Priyom. "Interpreting Next-Generation Sequencing Results: Key Considerations". AZoLifeSciences. 21 November 2024. <https://www.azolifesciences.com/article/Interpreting-Next-Generation-Sequencing-Results-Key-Considerations.aspx>.

  • Chicago

    Bose, Priyom. "Interpreting Next-Generation Sequencing Results: Key Considerations". AZoLifeSciences. https://www.azolifesciences.com/article/Interpreting-Next-Generation-Sequencing-Results-Key-Considerations.aspx. (accessed November 21, 2024).

  • Harvard

    Bose, Priyom. 2024. Interpreting Next-Generation Sequencing Results: Key Considerations. AZoLifeSciences, viewed 21 November 2024, https://www.azolifesciences.com/article/Interpreting-Next-Generation-Sequencing-Results-Key-Considerations.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.