Send to

Choose Destination
J Med Chem. 2003 Aug 14;46(17):3572-80.

Prediction of aqueous solubility of a diverse set of compounds using quantitative structure-property relationships.

Author information

ADMET R&D, Accelrys, subsidiary of Pharmacopeia, Inc., CN5375, Princeton, NJ 08543-5375, USA.


"Fail early and fail fast" is the current paradigm that the pharmaceutical industry has adopted widely. Removing non-drug-like compounds from the drug discovery lifecycle in the early stages can lead to tremendous savings of resources. Thus, fast screening methods are needed to profile the large collection of synthesized and virtual libraries involved in the early stage. Solubility is one of the filters that are applied extensively to ensure that the compounds are reasonably soluble so that synthesis of the compounds and assay studies of pharmacokinetics and toxicity are feasible. To address this need, we have developed a fast quantitative structure-property relationship (QSPR) model for the prediction of aqueous solubility (at 298 K, unbuffered solution) from the molecular structures. Multiple linear regressions and genetic algorithms were used to develop the models. The model was based on a set of diverse compounds including small organic molecules and drug and drug-like species. The predicted solubility for the training and test sets agrees well with the experimental values. The coefficient of determination is R(2) = 0.84 for the training set of 775 compounds and the RMS error = 0.87. This model was validated on four sets of compounds. The RMS error for the 1665 compounds from the four validation data sets (including compounds from the Physician's Desk References and Comprehensive Medicinal Chemistry databases) is 1 log unit and the unsigned error is 0.77. This model does not require 3-D structure generation which is rather time-consuming. Using 2-D structure as input, this model is able to compute solubility for 90 000-700 000 compounds/h on a SGI Origin 2000 workstation. This kind of fast calculation allows the model to be used in data mining and screening of large synthesized or virtual libraries.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center