Machine Learning Addresses Ancestral Diversity in Disease Prediction

To ensure that medical genetic research is more inclusive and benefits people from all backgrounds, researchers at the University of Florida are addressing a critical gap in the field.

To improve human health, they are tackling "ancestral bias" in genetic data—an issue that arises when most research is based on data from a single ancestral group. Leading this effort is Kiley Graim, Ph.D., an Assistant Professor in the Department of Computer & Information Science & Engineering.

According to Graim, this bias limits the effectiveness of precision medicine and leaves large segments of the global population underserved in disease prevention and treatment.

To address this, the team developed PhyloFrame, a machine-learning tool that uses artificial intelligence to incorporate ancestral diversity into genetic data analysis. Funded by the National Institutes of Health, the project aims to improve disease prediction, diagnosis, and treatment for all individuals, regardless of their genetic background.

The study, published in Nature Communications, demonstrated significant improvements in precision medicine outcomes using PhyloFrame.

Graim, a member of the UF Health Cancer Center, was inspired to focus on ancestral bias after speaking with a physician frustrated by a study’s lack of applicability to his diverse patient base. This conversation led her to explore how AI could bridge the gap in genetic research.

"I thought to myself, ‘I can fix that problem.’ If our training data doesn’t match real-world data, machine learning can help. It’s not perfect, but it can go a long way in addressing the issue."

Kiley Graim, Assistant Professor, University of Florida

Graim, who specializes in machine learning and precision medicine, has a background in population genomics. PhyloFrame integrates small disease-specific datasets with large databases of healthy human genomes, such as the population genomics database gnomAD, to develop more inclusive and effective models.

For example, PhyloFrame can predict variations among subtypes of diseases like breast cancer, ensuring treatment recommendations are tailored to individuals regardless of ancestry.

Processing this vast amount of data requires significant computational power. The team leverages UF’s HiPerGator, one of the nation’s most powerful supercomputers, to analyze genomic data from millions of people—equivalent to processing three billion DNA base pairs per individual.

"I didn’t expect it to work as well as it did. What started as a small project to illustrate the impact of population genomics data has grown into securing funding for more advanced models and refining how populations are defined," Graim said, crediting her doctoral student, Leslie Smith, for significant contributions to the study.

What sets PhyloFrame apart is its ability to account for genetic variations linked to ancestry, ensuring accurate predictions across populations. Most existing models are built on non-representative data, making them less effective for diverse groups.

Currently, much of the available genetic data comes from patients who trust the healthcare system and research institutions. However, populations in smaller communities or those with historical mistrust of medical systems are often underrepresented.

Graim highlights that 97% of sequenced genetic samples come from individuals of European ancestry. This disparity is largely due to funding priorities at federal and state levels, as well as socioeconomic factors like insurance coverage, which influence access to medical care and genetic sequencing.

"Some countries, like China and Japan, are working to close this gap, so we have more data from these populations than before. But it’s still nowhere near the volume of European data. Poorer populations are generally excluded entirely."

Kiley Graim, Assistant Professor, University of Florida

Graim emphasizes that diverse training data benefits everyone—including Europeans.

"We want these models to work for any patient, not just the ones in our studies. Having diverse training data actually improves models for Europeans as well. Population genomics data helps prevent overfitting, making models more robust for all populations."

Looking ahead, Graim believes PhyloFrame and similar AI-driven tools will eventually replace traditional models in clinical settings, enabling truly personalized treatment plans based on each patient’s genetic profile. The team plans to refine PhyloFrame further and expand its application to additional diseases.

"My dream is to advance precision medicine through machine learning so people can be diagnosed earlier and receive treatments tailored specifically to them, with minimal side effects. Our goal is to get the right treatment to the right person at the right time."

Kiley Graim, Assistant Professor, University of Florida

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Genomics can Transform Care for Children with Rare Diseases