Format

Send to

Choose Destination
Mol Divers. 2006 Aug;10(3):495-509. Epub 2006 Sep 12.

Toward automated biochemotype annotation for large compound libraries.

Author information

1
Research Center of Modernization of Chinese Traditional Medicine, Department of Chemistry, Central South University, Lu Shan Road, Chang Sha, PR China.

Abstract

Combinatorial chemistry allows scientists to probe large synthetically accessible chemical space. However, identifying the sub-space which is selectively associated with an interested biological target, is crucial to drug discovery and life sciences. This paper describes a process to automatically annotate biochemotypes of compounds in a library and thus to identify bioactivity related chemotypes (biochemotypes) from a large library of compounds. The process consists of two steps: (1) predicting all possible bioactivities for each compound in a library, and (2) deriving possible biochemotypes based on predictions. The Prediction of Activity Spectra for Substances program (PASS) was used in the first step. In second step, structural similarity and scaffold-hopping technologies are employed. These technologies are used to derive biochemotypes from bioactivity predictions and the corresponding annotated biochemotypes from MDL Drug Data Report (MDDR) database. About a one million (982,889) commercially available compound library (CACL) has been tested using this process. This paper demonstrates the feasibility of automatically annotating biochemotypes for large libraries of compounds. Nevertheless, some issues need to be considered in order to improve the process. First, the prediction accuracy of PASS program has no significant correlation with the number of compounds in a training set. Larger training sets do not necessarily increase the maximal error of prediction (MEP), nor do they increase the hit structural diversity. Smaller training sets do not necessarily decrease MEP, nor do they decrease the hit structural diversity. Second, the success of systematic bioactivity prediction relies on modeling, training data, and the definition of bioactivities (biochemotype ontology). Unfortunately, the biochemotype ontology was not well developed in the PASS program. Consequently, "ill-defined" bioactivities can reduce the quality of predictions. This paper suggests the ways in which the systematic bioactivities prediction program should be improved.

PMID:
16967195
DOI:
10.1007/s11030-006-9047-z
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Springer
Loading ...
Support Center