Publication
Title
Large‐scale tandem mass spectrum clustering using fast nearest neighbor searching
Author
Abstract
Rationale Advanced algorithmic solutions are necessary to process the ever increasing amounts of mass spectrometry data that is being generated. Here we describe the falcon spectrum clustering tool for efficient clustering of millions of MS/MS spectra. Methods falcon succeeds in efficiently clustering large amounts of mass spectral data using advanced techniques for fast spectrum similarity searching. First, high-resolution spectra are binned and converted to lowdimensional vectors using feature hashing. Next, the spectrum vectors are used to construct nearest neighbor indexes for fast similarity searching. The nearest neighbor indexes are used to efficiently compute a sparse pairwise distance matrix without having to exhaustively perform all pairwise spectrum comparisons within the relevant precursor mass tolerance. Finally, density-based clustering is performed to group similar spectra into clusters. Results Several state-of-the-art spectrum clustering tools were evaluated using a large draft human proteome dataset consisting of 25 million spectra, indicating that alternative tools produce clustering results with different characteristics. Notably, falcon generates larger highly pure clusters than alternative tools, leading to a larger reduction in data volume without the loss of relevant information for more efficient downstream processing.
Language
English
Source (journal)
Rapid communications in mass spectrometry. - London
Publication
London : 2021
ISSN
0951-4198
DOI
10.1002/RCM.9153
Volume/pages
(2021) , p. 1-20
Article Reference
e9153
ISI
000677764600001
Pubmed ID
34169593
Medium
E-only publicatie
Full text (Publisher's DOI)
Full text (open access)
Full text (publisher's version - intranet only)
UAntwerpen
Faculty/Department
Research group
Project info
Intelligent quality control for mass spectrometry-based proteomics
CalcUA as central calculation facility: supporting core facilities.
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Web of Science
Record
Identifier
Creation 29.06.2021
Last edited 02.10.2024
To cite this reference