Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression

Environ Toxicol Chem. 2014 Dec;33(12):2688-93. doi: 10.1002/etc.2746. Epub 2014 Oct 15.

Abstract

Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed.

Keywords: Biodegradability; Decision trees; Functional trees; In silico models; Logistic regression.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biodegradation, Environmental*
  • Decision Trees
  • Logistic Models
  • Models, Theoretical*
  • Organic Chemicals / chemistry
  • Organic Chemicals / metabolism*

Substances

  • Organic Chemicals