Format

Send to

Choose Destination
Bioinformatics. 2014 Jul 15;30(14):1974-82. doi: 10.1093/bioinformatics/btu165. Epub 2014 Mar 28.

Motifs tree: a new method for predicting post-translational modifications.

Author information

1
Department of Computer Science, University of Geneva, 1227 Carouge and Swiss Institute of Bioinformatics, Centre Médical Universitaire, Geneva 4, SwitzerlandDepartment of Computer Science, University of Geneva, 1227 Carouge and Swiss Institute of Bioinformatics, Centre Médical Universitaire, Geneva 4, Switzerland.
2
Department of Computer Science, University of Geneva, 1227 Carouge and Swiss Institute of Bioinformatics, Centre Médical Universitaire, Geneva 4, Switzerland.

Abstract

MOTIVATION:

Post-translational modifications (PTMs) are important steps in the maturation of proteins. Several models exist to predict specific PTMs, from manually detected patterns to machine learning methods. On one hand, the manual detection of patterns does not provide the most efficient classifiers and requires an important workload, and on the other hand, models built by machine learning methods are hard to interpret and do not increase biological knowledge. Therefore, we developed a novel method based on patterns discovery and decision trees to predict PTMs. The proposed algorithm builds a decision tree, by coupling the C4.5 algorithm with genetic algorithms, producing high-performance white box classifiers. Our method was tested on the initiator methionine cleavage (IMC) and N(α)-terminal acetylation (N-Ac), two of the most common PTMs.

RESULTS:

The resulting classifiers perform well when compared with existing models. On a set of eukaryotic proteins, they display a cross-validated Matthews correlation coefficient of 0.83 (IMC) and 0.65 (N-Ac). When used to predict potential substrates of N-terminal acetyltransferaseB and N-terminal acetyltransferaseC, our classifiers display better performance than the state of the art. Moreover, we present an analysis of the model predicting IMC for Homo sapiens proteins and demonstrate that we are able to extract experimentally known facts without prior knowledge. Those results validate the fact that our method produces white box models.

AVAILABILITY AND IMPLEMENTATION:

Predictors for IMC and N-Ac and all datasets are freely available at http://terminus.unige.ch/.

PMID:
24681905
DOI:
10.1093/bioinformatics/btu165
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center