Format

Send to

Choose Destination
Biomed Res Int. 2014;2014:294279. doi: 10.1155/2014/294279. Epub 2014 May 26.

enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.

Author information

1
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China ; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.
2
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.
3
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China ; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China ; Shanghai Key Laboratory of Intelligent Information Processing, Shanghai 518055, China ; Gordon Life Science Institute, Belmont, Massachusetts, USA.
4
PKU-HKUST ShenZhen-Hong Kong Institution, Shenzhen, Guangdong 518055, China ; Peking University Shenzhen Graduate School, Shenzhen, Guangdong 518055, China.
5
School of Engineering & Applied Science, Aston University, Birmingham B47ET, UK.
6
School of Information Science and Technology, Xiamen University, Xiamen, Fujian 316005, China.

Abstract

DNA-binding proteins are crucial for various cellular processes, such as recognition of specific nucleotide, regulation of transcription, and regulation of gene expression. Developing an effective model for identifying DNA-binding proteins is an urgent research problem. Up to now, many methods have been proposed, but most of them focus on only one classifier and cannot make full use of the large number of negative samples to improve predicting performance. This study proposed a predictor called enDNA-Prot for DNA-binding protein identification by employing the ensemble learning technique. Experiential results showed that enDNA-Prot was comparable with DNA-Prot and outperformed DNAbinder and iDNA-Prot with performance improvement in the range of 3.97-9.52% in ACC and 0.08-0.19 in MCC. Furthermore, when the benchmark dataset was expanded with negative samples, the performance of enDNA-Prot outperformed the three existing methods by 2.83-16.63% in terms of ACC and 0.02-0.16 in terms of MCC. It indicated that enDNA-Prot is an effective method for DNA-binding protein identification and expanding training dataset with negative samples can improve its performance. For the convenience of the vast majority of experimental scientists, we developed a user-friendly web-server for enDNA-Prot which is freely accessible to the public.

PMID:
24977146
PMCID:
PMC4058174
DOI:
10.1155/2014/294279
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Hindawi Limited Icon for PubMed Central
Loading ...
Support Center