Send to

Choose Destination
J Chem Inf Model. 2013 Jul 22;53(7):1595-601. doi: 10.1021/ci4002712. Epub 2013 Jul 3.

Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening.

Author information

LIMES Program Unit, Chemical Biology and Medicinal Chemistry, Department of Life Science Informatics, Rheinische Friedrich-Wilhelms-Universit├Ąt, Dahlmannstr. 2, D-53113 Bonn, Germany.


The choice of negative training data for machine learning is a little explored issue in chemoinformatics. In this study, the influence of alternative sets of negative training data and different background databases on support vector machine (SVM) modeling and virtual screening has been investigated. Target-directed SVM models have been derived on the basis of differently composed training sets containing confirmed inactive molecules or randomly selected database compounds as negative training instances. These models were then applied to search background databases consisting of biological screening data or randomly assembled compounds for available hits. Negative training data were found to systematically influence compound recall in virtual screening. In addition, different background databases had a strong influence on the search results. Our findings also indicated that typical benchmark settings lead to an overestimation of SVM-based virtual screening performance compared to search conditions that are more relevant for practical applications.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center