Most human genetics research has traditionally concentrated on individuals with European ancestry, a long-standing bias that may reduce the precision of scientific predictions for individuals from other populations.
Scientists from Johns Hopkins University have recently created a new global catalog of human gene expression data. Thanks to the increased representation of understudied populations, researchers should be able to gain more precise insights into the genetic factors influencing human diversity, including characteristics like height, hormone levels, and disease risk.
Through this work, the scientific community gains a deeper understanding of gene expression in populations from South and East Asia, Latin America, and other regions where there needed to be more data.
The study in Nature could help future research on human variation and evolution.
We now have this global view of how gene expression contributes to the world's diversity, the broadest picture to date in populations poorly represented in previous studies. We're trying to better understand the connection between variation at the level of our DNA and variation at the level of our traits, which previous genetic studies have looked at but with a really persistent bias that often excludes non-European ancestry populations."
Rajiv C. McCoy, Study Senior Author and Geneticist, Johns Hopkins University
The process by which genes in DNA are "transcribed" into RNA molecules is known as gene expression. While genetic research typically focuses on variations in DNA, the researchers set out to investigate this process. RNA acts as a blueprint to direct the assembly of amino acids into proteins, which give cells their structure and perform various functions.
However, genetic mutations can modify the amount of RNA that genes produce or the structure of the RNA itself, which can alter how genes are expressed. The development of traits and diseases can be significantly impacted by these mutations and their effects on gene expression.
The researchers measured RNA in cells from 731 participants in the 1000 Genomes Project, an earlier international collaboration that characterized the DNA sequence of the same individuals, to find mutations that alter gene expression.
We know not only their genome sequences, which were previously published, but we now have measurements of their gene expression. By combining these data, we can understand at a very basic level the genetic sources of gene expression differences between individuals. Ultimately that's what contributes most to the differences between you and me, even though at the level of DNA we are 99.9% identical."
Rajiv C. McCoy, Study Senior Author and Geneticist, Johns Hopkins University
Even though the 731 people represent 26 distinct groups spread over five continents, the scientists discovered that gene expression patterns—a phenomenon also seen in patterns of DNA variation—are frequently shared across groups. Most variations in gene expression were found within populations as opposed to across them.
“The distribution of our diversity is more complex than these geographically, politically, or socially defined labels,” said McCoy
According to him, the scientists were able to identify potential links between certain traits and health risks and mutations because of the diverse makeup of the cohort. These connections included mutations exclusive to subsets of populations that had not been previously studied.
“We are demonstrating that by having this more diverse cohort, we can really hone in on specific mutations that could be driving these gene expression changes, and ultimately how they might be driving variation and how that affects traits or susceptibility to a disease,” added McCoy.
According to lead author Dylan Taylor, a PhD in biology, better-tailored treatments may result from these discoveries.
We can't really use these studies in a predictive fashion for personalized medicine equitably unless we have more diverse datasets. If you try to use results from a study using only European individuals to predict gene expression in individuals from an underrepresented population—South Asians, for example—your results won't necessarily be very reliable.”
Dylan J. Taylor, Study Lead Author and Ph.D. Candidate, Department of Biology, Johns Hopkins University
There are still significant gaps. The 1,000 Genomes dataset has a few samples from the Americas and Africa and a few groups from the Pacific Islands, Australia, and the Middle East.
“The field is starting to move in this exciting direction to include diverse individuals in human genetic studies. Our research is a proof of concept for other scientists. We are demonstrating we can really do this, and we should, and it's valuable,” said Taylor.
Source:
Journal reference:
Taylor, J. D., et al. (2024) Sources of gene expression variation in a globally diverse human cohort. Nature. doi.org/10.1038/s41586-024-07708-2