Display Settings:

Format

Send to:

Choose Destination
We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
    J Chem Inf Model. 2006 Nov-Dec;46(6):2412-22.

    Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization.

    Source

    Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom.

    Abstract

    We have applied the k-nearest neighbor (kNN) modeling technique to the prediction of melting points. A data set of 4119 diverse organic molecules (data set 1) and an additional set of 277 drugs (data set 2) were used to compare performance in different regions of chemical space, and we investigated the influence of the number of nearest neighbors using different types of molecular descriptors. To compute the prediction on the basis of the melting temperatures of the nearest neighbors, we used four different methods (arithmetic and geometric average, inverse distance weighting, and exponential weighting), of which the exponential weighting scheme yielded the best results. We assessed our model via a 25-fold Monte Carlo cross-validation (with approximately 30% of the total data as a test set) and optimized it using a genetic algorithm. Predictions for drugs based on drugs (separate training and test sets each taken from data set 2) were found to be considerably better [root-mean-squared error (RMSE)=46.3 degrees C, r2=0.30] than those based on nondrugs (prediction of data set 2 based on the training set from data set 1, RMSE=50.3 degrees C, r2=0.20). The optimized model yields an average RMSE as low as 46.2 degrees C (r2=0.49) for data set 1, and an average RMSE of 42.2 degrees C (r2=0.42) for data set 2. It is shown that the kNN method inherently introduces a systematic error in melting point prediction. Much of the remaining error can be attributed to the lack of information about interactions in the liquid state, which are not well-captured by molecular descriptors.

    PMID:
    17125183
    [PubMed - indexed for MEDLINE]

      Supplemental Content

      Icon for American Chemical Society

      Save items

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk