A new study has investigated the fragments of cancer DNA that arise from the human genome, the sequence of which is the outcomes of millions of years of evolution and has been structured by “copy-paste-edit” processes and co-evolution with parasitic elements.
DNA. Image Credit: vitstudio/Shutterstock.com
For instance, 8% of the human DNA arises from past viral infections. The serpentine mutational processes that have structured the human genomes strengthen and turn life-threatening in the cancer cell genomes, resulting in anarchic cell mutation and proliferation.
Apart from being a fossil of the past human evolution, the repeated sequences of DNA in the human genomes also hold a history of how a specific cancer type has evolved, which enables researchers to understand and analyze the development and progression of cancer.
Existing technologies enable researchers to read and assemble billions of short DNA sequences to investigate cancer genomes and detect mutations within them.
However, such an investigation in repeated DNA has been hampered by a basic characteristic of the human genome: what is the method to substitute short quasi-identical sequences, usually pasted from the same ancestral copy, back to their original genome location? What is the way to identify mutations in those sequences?
The latest paper published in Nature Biotechnology takes advantage of the power of artificial intelligence to resolve this issue. The application of this innovative tool to the widest collection of primary cancer genomes so far has enabled interesting discoveries.
For instance, mutations that could not be detected using common tools earlier were found even in the coding sequence of well-known cancer genes. This implies patients with cancers that carry those mutations might benefit from therapy targeted at those genes.
The researchers identified other mutations in families of genes duplicated several times along the human genome. Although a few of the families were already related to cancers, their mutations could not be noticed. The researchers have rendered this rich resource accessible to the scientific community, thus further enriching a gold mine in cancer genomics.
The algorithm created by Maxime Tarabichi and his colleagues is neither restricted to cancer nor the human genome. It is a versatile tool for data produced with existing sequencing technologies, accessible to all researchers across the globe who study the evolution of life.