iAcet-Sumo: Identification of lysine acetylation and sumoylation sites in proteins by multi-class transformation methods

Comput Biol Med. 2018 Sep 1:100:144-151. doi: 10.1016/j.compbiomed.2018.07.006. Epub 2018 Jul 10.

Abstract

Motivation: Posttranslational modification (PTM) is a biological mechanism involved in the enzymatic modification of proteins after translation by ribosomes. Two or more modifications occurring at one residue can be transformed into a multi-label system. Two or more simultaneous modifications on a residue is more common than single PTMs. Lysine residues in proteins can be subjected to a variety of PTMs, such as ubiquitination, acetylation, sumoylation, methylation, and succinylation. Identification of uncharacterized sequences in proteins is a highly significant and state-of-the-art issue. Notably, in order to provide a method of processing multi-label sequences of lysine residues, it is highly desirable to develop computational methods to predict lysine acetylation and sumoylation modifications.

Results: In this paper, we first launched an integrated approach, known as the five-step prediction method (FSPM), to solve the problem effectively by (1) using one-sided selection (OSS) to deal with imbalanced data, (2) extracting binary features from protein sequences, (3) incorporating binary relevance, classifier chains and multi-class transformation methods to simplify multi-label problems, (4) constructing different classifiers, and (5) implementing cross-validation and evaluating these classifiers. In 10-fold cross-validation, FSPM achieved an accuracy of 61.49% and an absolute-true rate of 60.17%. The results showed that FSPM is accurate and could be used as a powerful engine in multi-label systems. We also conducted a variety of statistical analyses of the predicted results to discuss the biological functions of lysine acetylation and sumoylation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acetylation
  • Databases, Protein*
  • Lysine
  • Methylation
  • Sequence Analysis, Protein*
  • Software*
  • Sumoylation / physiology*
  • Ubiquitinated Proteins* / chemistry
  • Ubiquitinated Proteins* / genetics

Substances

  • Ubiquitinated Proteins
  • Lysine