Format

Send to

Choose Destination
J Theor Biol. 2016 Apr 7;394:223-230. doi: 10.1016/j.jtbi.2016.01.020. Epub 2016 Jan 22.

pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach.

Author information

1
Computer Department, Jingdezhen Ceramic Institute, Jing-De-Zhen, 333403, China; Computer science, University of Birmingham, B29 2TT, UK; Gordon Life Science Institute, Boston, MA 02478, USA. Electronic address: jjia@gordonlifescience.org.
2
Computer Department, Jingdezhen Ceramic Institute, Jing-De-Zhen, 333403, China. Electronic address: liuzi189836@163.com.
3
Computer Department, Jingdezhen Ceramic Institute, Jing-De-Zhen, 333403, China; Gordon Life Science Institute, Boston, MA 02478, USA. Electronic address: xxiao@gordonlifescience.org.
4
Computer Department, Jingdezhen Ceramic Institute, Jing-De-Zhen, 333403, China. Electronic address: lbx1966@163.com.
5
Gordon Life Science Institute, Boston, MA 02478, USA; Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia. Electronic address: kcchou@gordonlifescience.org.

Abstract

Being one type of post-translational modifications (PTMs), protein lysine succinylation is important in regulating varieties of biological processes. It is also involved with some diseases, however. Consequently, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence having many Lys residues therein, which ones can be succinylated, and which ones cannot? To address this problem, we have developed a predictor called pSuc-Lys through (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing out skewed training dataset by random sampling, and (3) constructing an ensemble predictor by fusing a series of individual random forest classifiers. Rigorous cross-validations indicated that it remarkably outperformed the existing methods. A user-friendly web-server for pSuc-Lys has been established at http://www.jci-bioinfo.cn/pSuc-Lys, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics.

KEYWORDS:

Ensemble random forest; General PseAAC; Lysine succinylation; Random downsampling; Sequence-coupling model; pSuc-Lys web-server

PMID:
26807806
DOI:
10.1016/j.jtbi.2016.01.020
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center