Recognition of translation initiation sites of eukaryotic genes based on an EM algorithm

J Comput Biol. 2003;10(5):699-708. doi: 10.1089/106652703322539042.

Abstract

With the rapid increase of DNA databases of human and other eukaryotic model organisms, a large great number of genes need to be distinguished from the DNA databases. Exact recognition of translation initiation sites (TISs) of eukaryotic genes is very important to understand the translation initiation process, predict the detailed structure of eukaryotic genes, and annotate uncharacterized sequences. The problem has not been solved satisfactorily, especially for recognizing TISs of the eukaryotic genes with shorter first exons. It is an important task for extracting new features and finding new powerful algorithms for recognizing TISs of eukaryotic genes. In this paper, the important characteristics of shorter flanking fragments around TISs are extracted and an expectation-maximization (EM) algorithm based on incomplete data is used to recognize TISs of eukaryotic genes. The accuracy is up to 87.8% over a six-fold cross-validation test. The result shows that the identification variables are effectively extracted and the EM algorithm is a powerful tool to predict the TISs of eukaryotic genes. The algorithm also can be applied to other classification or clustering tasks in bioinformatics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Base Sequence
  • Computational Biology / methods
  • Databases, Nucleic Acid
  • Models, Genetic
  • Peptide Chain Initiation, Translational / genetics*
  • Reproducibility of Results