In a recent study published in Fundamental Research, researchers propose a novel interpretable neural network model, MULGONET, based on multi-omics information analysis by deep learning to predict tumor recurrence.
The model explores the interactions between genes, biological networks, and molecular processes that the scientific community can target to identify novel biomarkers for tumor recurrence. Accurate predictions could improve decision-making and patient outcomes.
Image Credit: DC Studio/Shutterstock.com
Introduction
Cancer has led to unprecedented morbidity and mortality worldwide. Despite advancements in surgery, chemotherapy, and immunotherapy, the five-year survival rates for pancreatic, liver, and lung cancer patients are low due to tumor recurrence and metastasis. Therefore, risk evaluation among cancer patients is necessary.
Multi-omics information provides valuable insights into the cellular pathways and metabolic processes that drive tumor development and progression, a detailed knowledge of which could help design optimal treatment.
Researchers have integrated multi-omics information into deep learning-based neural network models to predict tumor recurrence; however, inherent challenges such as heterogeneity, sparsity, noise, and high dimensionality exist. Moreover, deep learning models lack interpretability, posing a significant barrier to biological understanding.
About the study
In the present study, researchers developed an interpretable biological framework, MULGONET, that integrates multi-omics information and uses gene ontology (GO) graphic representations to predict tumor recurrence and identify potential biomarkers.
The biological network model comprises gene layers with two GO hierarchical networks from the Gene Ontology (GO) database-molecular function (MF) and biological process (BP).
The model analyzes annotation relationships between Gene Ontology terms and genes to integrate multi-omics information. In the model, network nodes denote specific biological entities (genes and Gene Ontology terms), whereas edges denote inter-entity relationships.
To develop MULGONET, the team annotated genes, selected non-redundant features, and constructed matrices based on GO-BP-MF relationships.
After matrix construction, they integrated multi-omics information into a biological framework, which underwent computational modeling to enable deep learning. Importance scores from the computational model revealed individual gene contributions to tumor recurrence prediction. Preprocessing procedures removed non-coding proteins, low-expressed genes, and those without annotations from the datasets.
To evaluate model performance, the team analyzed datasets from the Cancer Genome Atlas (TCGA), including pancreatic adenocarcinoma (PAAD), bladder carcinoma (BLCA), and stomach adenocarcinoma (STAD). These datasets include messenger ribonucleic acid (mRNA) expressions, copy number variations (CNV), and deoxyribonucleic acid (DNA) methylations.
The TCGA database provided clinical data corresponding to these datasets. Model evaluation metrics included the area under the receiver-operating curve (AUC), the area under the precision-recall curve (AUPR), F1 scores, and recall (REC).
To ensure reliability, the team performed five-fold cross-validations. They divided the dataset into four training sets and one test set and repeated the analysis five times.
To evaluate effectiveness, the researchers compared MULGONET with other advanced classification methods, such as random forest (RF), stochastic gradient descent (SGD), decision tree (DT), and logistic regressions (LR). Other methods included adaptive boosting using the decision tree (AdaBoost), Radial Basis Function support vector machines (RBFSVM), multi-omics graph convolutional networks (MOGONET), multi-omics integration framework with auxiliary classifiers-enhanced autoencoders (MOCAT), PathCNN, and ProgCAE.
Results
The study findings showed that MULGONET effectively overcomes dimensionality and interpretability issues in multi-omics information integration. The model identified prognostic genes and Gene Ontology terms associated with tumor recurrence.
MULGONET outperformed other classification techniques, with AUPR values of 0.70, 0.87, and 0.77 for stomach, pancreatic, and bladder cancer datasets, respectively. Using four omics data integrated into a five-layer network achieved the best performance.
The model has distinct advantages. It maps different genomic features to the same gene ontology space, facilitating better data integration. The model's hierarchical structure simplifies biological concepts, making additional omics data integration easy.
MULGONET identified biomarkers for cancers of the bladder and pancreas. The study findings demonstrate poor prognosis with low survival rates among bladder cancer patients with increased expression of cyclin-dependent kinase 6 (CDK6), Wnt Family Member 5A (WNT5A), Kit ligand (KITLG), and secernin 1 (SCRN1).
GO terms related to bladder cancer include the negative regulation of cell differentiation, positive regulation of mitochondrial fission, and regulation of signaling pathways involving insulin and G protein-coupled receptors.
The model identified several high-risk genes related to pancreatic cancer, including myelocytomatosis oncogene (MYC), transforming growth factor beta-2 (TGFB2), Gonadotropin-releasing hormone 2 (GNRH2), and CDK6.
Potential GO terms for recurrence of pancreatic cancer identified by MULGONET include lipopolysaccharides, G protein-coupled receptors, and serine/threonine kinase. While lipopolysaccharides (LPS) increase tumor invasiveness, G Protein-coupled receptors are crucial for tumor proliferation, and serine/threonine kinase aids tumor progression.
Based on the study findings, MULGONET represents a significant advancement in integrating multi-omics information for tumor recurrence prediction and biomarker discovery. The model could improve clinical outcomes and deepen our understanding of tumor biology.
However, the model has certain limitations, including dependency on the quality and size of data. Future studies could focus on improving data quality and updating biological networks to enhance model performance.