Novel AI Models Enhance Disease Diagnosis and Pathogen Identification

The widespread availability of AI tools is accelerating progress across the technical and natural sciences. Nowhere is this more apparent than in biotechnology, where AI is unlocking new possibilities in drug discovery, precision medicine, gene editing, food security, and beyond.

One area seeing particularly exciting developments is proteomics, the large-scale study of proteins. Researchers in this field rely on vast protein databases to compare sample data, helping them identify which proteins—and by extension, which microorganisms—are present. These insights support critical tasks such as diagnosing diseases, evaluating treatment responses, and detecting pathogens in patient samples.

Yet, despite the power of these tools, there are notable limitations, according to Timothy Patrick Jenkins, Associate Professor at DTU Bioengineering and corresponding author of a new study.

“No database includes everything, so you need to know which ones are relevant to your needs,” Jenkins explains. “Deep searches take a lot of time and computing power, and it’s nearly impossible to identify proteins that haven’t been registered yet.”

To overcome these obstacles, some teams have turned to de novo sequencing algorithms—techniques that infer protein sequences without relying solely on existing databases.

While these approaches promise greater flexibility and reduced computational cost, Jenkins and his colleagues from DTU, Delft University of Technology, and British AI company InstaDeep found that current algorithms often fall short.

Outperforming the State-of-the-Art

In a study published in Nature Machine Intelligence, the research team introduced two new AI models, InstaNovo and InstaNovo+, designed to help scientists, clinicians, and industry professionals more efficiently analyze large proteomics datasets.

The models are available to researchers via the InstaDeep platform.

“Our models go beyond the current state-of-the-art in both precision and versatility,” said Kevin Michael Eloff, co-first author of the study and a research engineer at InstaDeep. “As we show in the paper, these tools aren’t limited to one research area—they have the potential to advance any field that uses proteomics.”

To validate the models, the researchers tested them across a variety of real-world use cases.

One such case involved wound fluid from patients with venous leg ulcers, which are notoriously difficult to treat and prone to becoming chronic. Accurate identification of the microbial makeup in these wounds is crucial for guiding therapy. Compared to traditional database searches, the InstaNovo models mapped ten times as many protein sequences, including the detection of E. coli and Pseudomonas aeruginosa, a multidrug-resistant bacterium.

Another application focused on peptides—small protein fragments presented on cell surfaces that help the immune system recognize threats like cancer. The models identified thousands of previously undetected peptides, which could serve as valuable targets in immunotherapy, where personalized treatments aim to amplify the body’s immune response.

“Our results in complex scenarios—like those involving unknown proteins or organisms—show that these models can substantially boost our understanding,” said Konstantinos Kalogeropoulos, co-first author and Assistant Professor at DTU. “This holds real promise for advancing microbiome research, personalized medicine, and cancer immunology.”

Beyond Biomedicine

The study also highlights six additional case studies, demonstrating how these models improve therapeutic sequencing, uncover novel peptides, detect previously unreported organisms, and enhance the performance of proteomics searches overall.

Jenkins believes the benefits reach well beyond clinical settings.

“From a technical and scientific perspective, these tools offer the chance to deepen our understanding of biology more broadly—not just in healthcare, but also in fields like industrial biotech, environmental monitoring, plant science, veterinary science, and even archaeology,” Jenkins said. “Wherever proteomics is applied, these models open access to protein landscapes we couldn’t see before.”

Source:
Journal reference:

Eloff, K., et al. (2025) InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments. Nature Machine Intelligence. doi.org/10.1038/s42256-025-01019-5.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Study Reveals How H3K36 Methylation Controls Intestinal Cell Plasticity and Regeneration