EMBL's European Bioinformatics Institute (EMBL-EBI) has launched the Pathogens Portal – an online platform that enables researchers, clinicians, and policymakers to access the most comprehensive collection of biomolecular data about pathogens. The portal features data spanning over 200,000 pathogen species and strains and is set to become a key tool for infection biology and pathogen surveillance.
The list of pathogens featured in the portal was collated using the UK's Health and Safety Executive's list of approved biological agents and the WHO's global priority pathogens list. It includes well-known pathogens that affect humans, including HIV, influenza, Hepatitis B, and the malaria parasite Plasmodium falciparum. It also covers lesser-known pathogens affecting humans, such as Lassa mammarenavirus, the cause of Lassa hemorrhagic fever, which can lead to deafness and even death in severe cases. The portal also contains hundreds of pathogens that affect other animals, which makes it a useful tool for food security and biodiversity.
The Pathogens Portal currently contains nucleotide sequences, raw genomic data, sample metadata, and relevant scientific literature. The intention is to integrate additional data types, including protein sequence and structure and chemistry data from other public data resources.
The unique feature of the Pathogens Portal is that it brings together different data types, which are currently scattered in lots of different places. This new approach enables researchers, clinical scientists, and public health agencies to access all publicly-available data about their pathogen of interest with just one quick search. The portal also contains intuitive tools for discovery, which make it easy for users to refine their searches."
Guy Cochrane, Team Leader at EMBL-EBI
"The Pathogens Portal is an important step in preparing for the next pandemic," said Marion Koopmans Head of the Erasmus Medical Centre's Department of Viroscience. "Pulling together multiple open biological data resources for a breath of pathogens is a key knowledge base to ready ourselves for future pandemics."
Pandemic Preparedness
"The COVID-19 pandemic demonstrated that having robust and easy-to-use data sharing structures in place can save lives because these enable a quick and informed public health response," explained Marianna Ventouratou, Data Platform Manager at EMBL-EBI. "Building on the lessons learned from COVID-19 pandemic, EMBL-EBI and partners have now developed the Pathogens Portal, which researchers and public health authorities around the world can use to enhance global pathogen surveillance efforts."
Importantly, the data accessible through the Pathogens Portal is open and FAIR (Findable, Accessible, Interoperable, and Reusable), meaning it is available to anyone with an internet connection. This approach is particularly valuable during a public health emergency, when data sharing speed is of the essence.
"It is invaluable to have a data portal like the Pathogens Portal, which represents the pathogen world beyond viruses, and takes a much more holistic and flexible view of where the next threatening pathogen may come from," explained Frank Møller Aarestrup, Head of Genomic Epidemiology at the Technical University of Denmark.
Private Data and Cohort Data
There is also a key component, called the Data Hubs system, which allows researchers and health agencies to keep their data private in the first instance. This is operated from EMBL-EBI's existing infrastructure, including the European Nucleotide Archive (ENA). This is an important functionality for countries and researchers who wish to keep their data private before publication, but still want to be able to analyze them alongside other public records available through the portal.
Another exciting feature of the portal is the cohort browser, which contains highly sought-after clinical-epidemiological data from patient cohorts. There is currently only one pilot study focusing on SARS-CoV-2 available in the browser, provided through the ReCoDID project by the Erasmus Medical Centre, with the help of the University Hospital Heidelberg. The Pathogen Portal team is actively encouraging researchers to submit more cohort data.
"The Cohort Browser interoperates genomic data with clinical epidemiological data, which enables deep interrogation of disease data by linking infomation on the pathogen and the host it directly infected," said Lauren Maxwell, Group Leader at the Universitätsklinikum Heidelberg.
Building on Success
The Pathogens Portal is built on the same framework as the European COVID-19 Data Portal, which EMBL-EBI and collaborators set up during the COVID-19 pandemic to support international data sharing essential for the pandemic response. Since launch, the COVID-19 Data Portal has been accessed by almost 300,000 users in 187 countries and geographical areas.
Already, three EMBL-EBI resources feed data into the Pathogens Portal, with more coming soon.
- European Nucleotide Archive (ENA), which provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information, and functional annotation.
- BioSamples, which stores and supplies descriptions and metadata about biological samples used in research and development by academia and industry.
- Europe PMC, which provides comprehensive access to over 40 million life sciences publications from trusted sources.
The Pathogens Portal is a part of EMBL's Infection Biology transversal theme, within EMBL's Scientific Programme Molecules to Ecosystems. The theme enhances our understanding of the biology and mechanisms of infection, as well as diagnostics and treatment of infectious diseases.
Analyzing raw reads requires more bioinformatics knowledge, but also enables deeper analysis, whereas nucleotide sequences are more readily applied to downstream applications, which can benefit non-specialist users. Raw reads offer deeper questions and tailored analyses. Both data types are important and synergistic, enabling greater utility to a wider audience of users.
The Pathogens Portal is a community-driven initiative, and users are invited to submit feedback and questions to the project team on [email protected].