Employing a groundbreaking algorithm named FLSHclust (“flash clust”), scientists have unearthed 188 uncommon and previously undiscovered CRISPR-associated gene modules, among billions of protein sequences. This includes the identification of a novel type VII CRISPR-Cas system.
This innovative approach and its revelations open up fresh possibilities for utilizing CRISPR systems and gaining insights into the extensive functional diversity of microbial proteins. CRISPR systems have been pivotal in developing a diverse range of biomolecular methods, notably CRISPR/Cas-mediated genome editing.
The revelation of previously unknown CRISPR systems holds promise for advancing biotechnologies, leading to safer and more efficient genomic therapeutics.
While the CRISPR toolbox has expanded through computational searches of protein sequence databases, the prevalent algorithmic methods have become impractical for navigating the exponentially growing datasets containing billions of proteins.
In response to this limitation, Han Altae-Tran and collaborators devised FLSHclust (fast locality-sensitive hashing-based clustering) – an algorithm designed for clustering proteins based on sequence similarity. Unlike current methods, FLSHclust can rapidly and efficiently analyze extensive protein sequence databases.
To validate their methodology, Altae-Tran et al. utilized FLSHclust to explore rare CRISPR systems within an 8.8 terrabase pair metagenomic database housing 8 billion proteins and 10.2 million CRISPR arrays. The analysis brought to light 188 previously unknown CRISPR-associated genes.
Furthermore, the researchers pinpointed and characterized a novel class of Cas-14 containing CRISPR system, specifically type VII, which operates on RNA. The newly identified systems were deemed rare, with many confined to a single cluster among the nearly 130,000 CRISPR-linked clusters unveiled by FLSHclust.
The discovery of previously unknown cas genes and CRISPR systems substantially expands the known CRISPR diversity, emphasizing the functional versatility of CRISPR whereby previously undiscovered proteins and domains are often recruited, either replacing preexisting components or conferring newly identified functions to the preexisting scaffold of Cas proteins.”
Han Altae-Tran, Massachusetts Institute of Technology
Altae-Tran added, “Taken together, the results of the work reveal unprecedented organizational and functional flexibility and modularity of CRISPR systems but also demonstrates that most variants are rare and only found in relatively unusual bacteria and archaea.”
Source:
Journal reference:
Altae-Tran, H., et al. (2023) Uncovering the functional diversity of rare CRISPR-Cas systems with deep terascale clustering. Science. doi.org/10.1126/science.adi1910.