EpiBERT AI Model Reveals How Genes Are Regulated in Human Cells

Summary: A team of investigators from Dana-Farber Cancer Institute, The Broad Institute of MIT and Harvard, Google, and Columbia University have created an artificial intelligence model that can predict which genes are expressed in any type of human cell. The model, called EpiBERT, was inspired by BERT, a deep learning model designed to understand and generate human-like language.

EpiBERT was trained on data from hundreds of human cell types in multiple phases. It was fed the genomic sequence, which is 3 billion base pairs long, along with maps of chromatin accessibility that inform which of these sequences are unwound from the chromosome and read by the cell. The model was first trained to learn the relationship between DNA sequence and chromatin accessibility across large chunks of the genome in a specific cell type. It then uses these learned relationships to predict which genes were active in the corresponding cell type. It accurately identified regulatory elements – parts of the genome recognized by transcription factors – and their influence on gene expression across many cell types, building a "grammar" that is generalizable and predictable. This grammar-building process can be likened to the way a large language model, such as ChatGPT, learns to build meaningful sentences and paragraphs from many examples of text. The EpiBERT model can process accessibility and predict functional bases as well as RNA expression for a never-before-seen cell type. 

Significance: Every cell in the body has the same genome sequence, so the difference between two types of cells is not the genes in the genome, but which genes are turned on, when, and how much. Approximately 20% of the genome codes for regulatory elements determine which genes are turned on, but very little is known about where those codes are in the genome, what their instructions look like, or how mutations affect function in a cell. EpiBERT will shed light on how genes are regulated in cells and, potentially, how that cell's regulatory system can be mutated in ways that lead to diseases such as cancer.

Funding: The Broad Institute, the Novo Nordisk Foundation, the National Genome Research Institute, the Sharf Green Cancer Research Fund, the Richard and Nancy Lubin Family, and the American Cancer Society. Tensor Processing Unit (TPU) access and support provided by Google.

Source:
Journal reference:

Javed, N., et al. (2025). A multi-modal transformer for cell type-agnostic regulatory predictions. Cell Genomics. doi.org/10.1016/j.xgen.2025.100762.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Genomic Study Reveals Key Insights into Modern Maize Inbred Lines