Publication
Title
Library of Two Million Unique Small Molecules with Precalculated Fingerprints, Descriptors, and Cardiotoxicity Inhibition Data
Author
Abstract
This repository comprises a dataset of ~2 million unique compounds saved in an hdf5 small molecule library store, which includes the following fields for each molecule: InChI key Standardized SMILES string Compound source ChEMBL identifier if the compound exists in this open access database 1024-bit Morgan fingerprint 2048-bit Morgan fingerprint 881-bit PubChem fingerprints 854 vector-length of preprocessed and standardized Mordred descriptors and cardiotoxicity inhibition predictions for each of the three cardiac ion channels (hERG, Nav1.5, and Cav1.2) using CtoxPred2 along with the model confidence scores. The repository also includes a Jupyter notebook that serves as an initial guide for querying the small molecule library store. Export both files to the same folder, allocate approximately 40 GB of available memory disk space, unzip the library store, and then launch the notebook to begin querying. Upon usage, please cite this publication: Issar Arab, Kris Laukens, Wout Bittremieux, Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set, Journal of Chemical Information and Modeling, (2024). doi:10.1021/acs.jcim.4c01102
Language
English
Related publication(s)
Publication
Zenodo , 2024
DOI
10.5281/ZENODO.11066707
Volume/pages
Full text (Publisher's DOI)
UAntwerpen
Faculty/Department
Research group
Publication type
Subject
Affiliation
Publications with a UAntwerp address
External links
Record
Identifier c:irua:208064
Creation 23.09.2024
Last edited 24.09.2024
To cite this reference