Send to

Choose Destination
J Chem Inf Comput Sci. 2001 Sep-Oct;41(5):1237-47.

Prediction of aqueous solubility of heteroatom-containing organic compounds from molecular structure.

Author information

Department of Chemistry, 152 Davey Laboratory, The Pennsylvania State University, University Park, PA 16802, USA.


The use of quantitative structure-property relationships (QSPRs) to predict aqueous solubilities (log S) of heteroatom-containing organic compounds from their molecular structure is presented. Three data sets are examined. Data set 1 contains 176 compounds having one or more nitrogen atoms with some oxygen (log S[mol/L] range is -7.41 to 0.96). Data set 2 contains 223 compounds having one or more oxygen atoms, with no nitrogen (log S[mol/L] range is -8.77 to 1.57). Data set 3 contains all 399 compounds from sets 1 and 2 (log S/mol/L] range is -8.77 to 1.57). After descriptor generation and feature selection, multiple linear regression (MLR) and computational neural network (CNN) models are developed for aqueous solubility prediction. The best results were obtained with nonlinear CNN models. Root-mean-square (rms) errors for training with the three data sets ranged from 0.3 to 0.6 log units. All models were validated with external prediction sets, with the rms errors ranging from 0.6 log units to 1.5 log units.


Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center