Format

Send to

Choose Destination
J Theor Biol. 2014 Feb 21;343:186-92. doi: 10.1016/j.jtbi.2013.10.009. Epub 2013 Nov 1.

Predicting DNA binding proteins using support vector machine with hybrid fractal features.

Author information

1
College of Science, Huazhong Agricultural University, Wuhan, PR China.
2
College of Science, Huazhong Agricultural University, Wuhan, PR China. Electronic address: huxuehai@mail.hzau.edu.cn.

Abstract

DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances.

KEYWORDS:

Chaos game representation; Cross validation; Fractal dimension; Protein classification

PMID:
24189096
DOI:
10.1016/j.jtbi.2013.10.009
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center