Publication
Title
Current and future deep learning algorithms for MS/MS‐based small molecule structure elucidation
Author
Abstract
Rationale Structure elucidation of small molecules has been one of the cornerstone applications of mass spectrometry for decades. Despite the increasing availability of software tools, structure elucidation from tandem mass spectrometry (MS/MS) data remains a challenging task, leaving many spectra unidentified. However, as an increasing number of reference MS/MS spectra are being curated at a repository scale and shared on public servers, there is an exciting opportunity to develop powerful new deep learning (DL) models for automated structure elucidation. Architectures Recent early-stage DL frameworks mostly follow a "two-step approach" that translates MS/MS spectra to database structures after first predicting molecular descriptors. The related architectures could suffer from: (1) computational complexity because of the separate training of descriptor-specific classifiers, (2) the high dimensional nature of mass spectral data and information loss due to data preprocessing, (3) low substructure coverage and class imbalance problem of predefined molecular fingerprints. Inspired by successful DL frameworks employed in drug discovery fields, we have conceptualized and designed hypothetical DL architectures to tackle the above issues. For (1), we recommend multitask learning to achieve better performance with fewer classifiers by grouping structurally related descriptors. For (2) and (3), we introduce feature engineering to extract condensed and higher-order information from spectra and structure data. For instance, encoding spectra with subtrees and pre-calculated spectral patterns add peak interactions to the model input. Encoding structures with graph convolutional networks incorporates connectivity within a molecule. The joint embedding of spectra and structures can enable simultaneous spectral library and molecular database search. Conclusions In principle, given enough training data, adapted DL architectures, optimal hyperparameters and computing power, DL frameworks can predict small molecule structures, completely or at least partially, from MS/MS spectra. However, their performance and general applicability should be fairly evaluated against classical machine learning frameworks.
Language
English
Source (journal)
Rapid communications in mass spectrometry. - London
Publication
London : 2021
ISSN
0951-4198
DOI
10.1002/RCM.9120
Volume/pages
(2021) , p. 1-29
Article Reference
e9120
ISI
000654460700001
Pubmed ID
33955607
Medium
E-only publicatie
Full text (Publisher's DOI)
Full text (open access)
Full text (publisher's version - intranet only)
UAntwerpen
Faculty/Department
Research group
Project info
Efficient mining for unexpected patterns in complex biological data.
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Web of Science
Record
Identifier
Creation 20.05.2021
Last edited 17.11.2024
To cite this reference