Send to

Choose Destination
Mol Pharm. 2019 Apr 1;16(4):1620-1632. doi: 10.1021/acs.molpharmaceut.8b01297. Epub 2019 Feb 26.

Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets.

Author information

Collaborations Pharmaceuticals, Inc. , Main Campus Drive, Lab 3510 , Raleigh , North Carolina 27606 , United States.
The Rutgers Center for Computational and Integrative Biology , Camden , New Jersey 08102 , United States.
Molecular Materials Informatics, Inc. , 2234 Duvernay Street , Montreal , Quebec H3J2Y3 , Canada.
Bach Institute of Biochemistry , Research Center of Biotechnology of the Russian Academy of Sciences , Leninsky Prospekt 33-2 , Moscow 119071 , Russia.


The human immunodeficiency virus (HIV) causes over a million deaths every year and has a huge economic impact in many countries. The first class of drugs approved were nucleoside reverse transcriptase inhibitors. A newer generation of reverse transcriptase inhibitors have become susceptible to drug resistant strains of HIV, and hence, alternatives are urgently needed. We have recently pioneered the use of Bayesian machine learning to generate models with public data to identify new compounds for testing against different disease targets. The current study has used the NIAID ChemDB HIV, Opportunistic Infection and Tuberculosis Therapeutics Database for machine learning studies. We curated and cleaned data from HIV-1 wild-type cell-based and reverse transcriptase (RT) DNA polymerase inhibition assays. Compounds from this database with ≤1 μM HIV-1 RT DNA polymerase activity inhibition and cell-based HIV-1 inhibition are correlated (Pearson r = 0.44, n = 1137, p < 0.0001). Models were trained using multiple machine learning approaches (Bernoulli Naive Bayes, AdaBoost Decision Tree, Random Forest, support vector classification, k-Nearest Neighbors, and deep neural networks as well as consensus approaches) and then their predictive abilities were compared. Our comparison of different machine learning methods demonstrated that support vector classification, deep learning, and a consensus were generally comparable and not significantly different from each other using 5-fold cross validation and using 24 training and test set combinations. This study demonstrates findings in line with our previous studies for various targets that training and testing with multiple data sets does not demonstrate a significant difference between support vector machine and deep neural networks.


HIV; Naïve Bayes; assay central; deep learning; drug discovery; machine learning; reverse transcriptase; support vector machine

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center