Title
|
|
|
|
Library of Two Million Unique Small Molecules with Precalculated Fingerprints, Descriptors, and Cardiotoxicity Inhibition Data
| |
Author
|
|
|
|
| |
Abstract
|
|
|
|
This repository comprises a dataset of ~2 million unique compounds saved in an hdf5 small molecule library store, which includes the following fields for each molecule: InChI key Standardized SMILES string Compound source ChEMBL identifier if the compound exists in this open access database 1024-bit Morgan fingerprint 2048-bit Morgan fingerprint 881-bit PubChem fingerprints 854 vector-length of preprocessed and standardized Mordred descriptors and cardiotoxicity inhibition predictions for each of the three cardiac ion channels (hERG, Nav1.5, and Cav1.2) using CtoxPred2 along with the model confidence scores. The repository also includes a Jupyter notebook that serves as an initial guide for querying the small molecule library store. Export both files to the same folder, allocate approximately 40 GB of available memory disk space, unzip the library store, and then launch the notebook to begin querying. Upon usage, please cite this publication: Issar Arab, Kris Laukens, Wout Bittremieux, Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set, Journal of Chemical Information and Modeling, (2024). doi:10.1021/acs.jcim.4c01102 |
| |
Language
|
|
|
|
English
| |
Related publication(s)
|
|
|
|
| |
Publication
|
|
|
|
Zenodo
,
2024
| |
DOI
|
|
|
|
10.5281/ZENODO.11066707
| |
Volume/pages
|
|
|
|
| |
Full text (Publisher's DOI)
|
|
|
|
| |
|