Format

Send to

Choose Destination
See comment in PubMed Commons below
Mol Divers. 2006 Aug;10(3):389-403. Epub 2006 Sep 21.

Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers.

Author information

1
Institut de Chimie Organique et Analytique, UMR CNRS 6005, Université d'Orléans, Orléans Cedex 2, France. aurelien.monge@univ-orleans.fr

Abstract

The data for 3.8 million compounds from structural databases of 32 providers were gathered and stored in a single chemical database. Duplicates are removed using the IUPAC International Chemical Identifier. After this, 2.6 million compounds remain. Each database and the final one were studied in term of uniqueness, diversity, frameworks, 'drug-like' and 'lead-like' properties. This study also shows that there are more than 87 000 frameworks in the database. It contains 2.1 million 'drug-like' molecules among which, more than one million are 'lead-like'. This study has been carried out using 'ScreeningAssistant', a software dedicated to chemical databases management and screening sets generation. Compounds are stored in a MySQL database and all the operations on this database are carried out by Java code. The druglikeness and leadlikeness are estimated with 'in-house' scores using functions to estimate convenience to properties; unicity using the InChI code and diversity using molecular frameworks and fingerprints. The software has been conceived in order to facilitate the update of the database. 'ScreeningAssistant' is freely available under the GPL license.

PMID:
17031540
DOI:
10.1007/s11030-006-9033-5
[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Springer
    Loading ...
    Support Center