A fish on land can still wave its fins, but the outcomes are very different when the creature is in water. The comparison, attributed to prominent computer scientist Alan Kay, is used to demonstrate the importance of context in clarifying research issues.
In a first for artificial intelligence (AI), a tool known as PINNACLE encapsulates Kay's insight into understanding protein activity in its right context, as established by the tissues and cells in which these proteins work and interact. Notably, PINNACLE addresses some of the shortcomings of current AI models, which typically evaluate how proteins function and malfunction in isolation, one cell and tissue type at a time.
Harvard Medical School researchers led the creation of the new AI model, as detailed in Nature Methods.
The natural world is interconnected, and PINNACLE helps identify these linkages, which we can use to gain more detailed knowledge about proteins and safer, more effective medications.”
Marinka Zitnik, Study Senior Author and Assistant Professor, Department of Biomedical Informatics in the Blavatnik Institute, Harvard Medical School
Zitnik added, “It overcomes the limitations of current, context-free models and suggests the future direction for enhancing analyses of protein interactions.”
According to the researchers, this breakthrough could improve awareness of the role of proteins in health and disease and provide novel drug targets for developing more precise, better-personalized medicines.
A Major Step Forward
It is difficult to untangle protein interactions and the impacts of contiguous biological neighbors. Current analytical technologies play an important role in providing information on the structural characteristics and forms of individual proteins. These techniques, however, are not intended to address the contextual complexities of the larger protein environment. Instead, they generate context-free protein representations, which lack cell—and tissue-type contextual information.
Proteins, however, perform varied roles depending on the cellular and tissue settings in which they are found and whether the same tissue or cell is healthy or ill. Single-protein representation approaches cannot identify protein activities that change across several settings.
When it Comes to Protein Behavior, It’s Location, Location, Location
Proteins, made up of twenty different amino acids, serve as the building blocks of cells and tissues and are required for a variety of life-sustaining biological functions, including transporting oxygen throughout the body, contracting muscles for breathing and walking, enabling digestion, and fighting infection.
Scientists believe the human body has between 20,000 and hundreds of thousands of proteins.
The complicated interplay between and across proteins results in tangled protein interaction networks. These networks, located within and between cells, interact with other proteins and protein networks in various intricate ways.
PINNACLE's benefit comes from its capacity to understand that protein behavior varies by cell and tissue type. The same protein can behave differently in a healthy lung cell than in a healthy kidney or diseased colon cell.
PINNACLE reveals how these cells and tissues affect the same proteins differently, which is impossible with existing models.
PINNACLE can decide which proteins participate in certain talks and which remain silent based on the cell type in which a protein network is located. This enables PINNACLE to better analyze protein crosstalk and behavior, ultimately allowing it to anticipate precisely tailored therapeutic targets for dysfunctional proteins that cause diseases.
The researchers highlighted that PINNACLE does not replace but rather complements single-representation models by analyzing protein interactions in a variety of cellular contexts.
Thus, PINNACLE might help researchers better understand and predict protein function and illuminate important cellular processes and disease pathways.
This ability can assist in identifying “druggable” proteins that can be used as targets for individual treatments and predict the effects of various drugs in different cell types. As a result, PINNACLE might become a crucial tool for scientists and drug makers looking to narrow down prospective targets more effectively.
Zitnik, who is also an associate faculty member at Harvard University's Kempner Institute for the Study of Natural and Artificial Intelligence, believes that such streamlining of the drug development process is critical.
It can take 10-15 years and cost up to one billion dollars to bring a new treatment to market. The path from discovery to drug is notoriously rough, with uncertain outcomes. Indeed, over 90% of medication candidates are not developed into medicines.
Building and Training PINNACLE
Using human cell data from a comprehensive multiorgan atlas, as well as multiple networks of protein-protein interactions, cell type-to-cell type interactions, and tissues, the researchers trained PINNACLE to generate panoramic graphic protein representations of 156 cell types and 62 tissues and organs.
To date, PINNACLE has produced roughly 395,000 multidimensional representations, compared to around 22,000 potential representations under existing single-protein models. Each of its 156 cell types has context-rich protein interaction networks with around 2,500 proteins.
The current number of cell types, tissues, and organs does not represent the model's upper limit. To date, the analyzed cell types have originated from living human donors and span most, if not all, of the human body's cell types. Furthermore, many cell types have yet to be found, while others are uncommon or difficult to study, such as neurons in the brain.
Zitnik intends to employ a data platform containing tens of millions of cells collected from the entire human body to broaden PINNACLE's cellular repertoire.
Source:
Journal reference:
Li, M., M., et al. (2024) The Human Proteoform Project: Defining the human proteome. Nature Methods. doi.org/10.1038/s41592-024-02341-3