Format

Send to

Choose Destination
Comput Biol Chem. 2014 Oct;52:51-9. doi: 10.1016/j.compbiolchem.2014.09.002. Epub 2014 Sep 15.

newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation.

Author information

1
Department of Mathematics, School of Science, Hebei University of Engineering, Handan 056038, PR China.
2
College of Mathematical Sciences and LPMC, Nankai University, No. 94 Weijin Road, Tianjin 300071, PR China.
3
School of Computer Science and Software Engineering, Tianjin Polytechnic University, No. 399 Binshui Road, Tianjin 300387, PR China. Electronic address: kchen1.tjpu@hotmail.com.

Abstract

Identification of DNA-binding proteins is essential in studying cellular activities as the DNA-binding proteins play a pivotal role in gene regulation. In this study, we propose newDNA-Prot, a DNA-binding protein predictor that employs support vector machine classifier and a comprehensive feature representation. The sequence representation are categorized into 6 groups: primary sequence based, evolutionary profile based, predicted secondary structure based, predicted relative solvent accessibility based, physicochemical property based and biological function based features. The mRMR, wrapper and two-stage feature selection methods are employed for removing irrelevant features and reducing redundant features. Experiments demonstrate that the two-stage method performs better than the mRMR and wrapper methods. We also perform a statistical analysis on the selected features and results show that more than 95% of the selected features are statistically significant and they cover all 6 feature groups. The newDNA-Prot method is compared with several state of the art algorithms, including iDNA-Prot, DNAbinder and DNA-Prot. The results demonstrate that newDNA-Prot method outperforms the iDNA-Prot, DNAbinder and DNA-Prot methods. More specific, newDNA-Prot improves the runner-up method, DNA-Prot for around 10% on several evaluation measures. The proposed newDNA-Prot method is available at http://sourceforge.net/projects/newdnaprot/.

KEYWORDS:

DNA-binding proteins; Feature selection methods; Features; ROC; SVM

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center