Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest

Amino Acids. 2014 Apr;46(4):1069-78. doi: 10.1007/s00726-014-1669-3. Epub 2014 Jan 23.

Abstract

Reversible protein phosphorylation is one of the most important post-translational modifications, which regulates various biological cellular processes. Identification of the kinase-specific phosphorylation sites is helpful for understanding the phosphorylation mechanism and regulation processes. Although a number of computational approaches have been developed, currently few studies are concerned about hierarchical structures of kinases, and most of the existing tools use only local sequence information to construct predictive models. In this work, we conduct a systematic and hierarchy-specific investigation of protein phosphorylation site prediction in which protein kinases are clustered into hierarchical structures with four levels including kinase, subfamily, family and group. To enhance phosphorylation site prediction at all hierarchical levels, functional information of proteins, including gene ontology (GO) and protein-protein interaction (PPI), is adopted in addition to primary sequence to construct prediction models based on random forest. Analysis of selected GO and PPI features shows that functional information is critical in determining protein phosphorylation sites for every hierarchical level. Furthermore, the prediction results of Phospho.ELM and additional testing dataset demonstrate that the proposed method remarkably outperforms existing phosphorylation prediction methods at all hierarchical levels. The proposed method is freely available at http://bioinformatics.ustc.edu.cn/phos_pred/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs
  • Computational Biology
  • Databases, Protein
  • Gene Regulatory Networks
  • Humans
  • Phosphorylation
  • Protein Interaction Maps
  • Protein Kinases / metabolism*
  • Proteins / chemistry*
  • Proteins / metabolism*

Substances

  • Proteins
  • Protein Kinases