Computational detection of prokaryotic core promoters in genomic sequences

Ki-Bong Kim; Jeong Seop Sim

Computational detection of prokaryotic core promoters in genomic sequences

J Microbiol. 2005 Oct;43(5):411-6.

Authors

Ki-Bong Kim¹, Jeong Seop Sim

Affiliation

¹ Department of Bioinformatics Engineering, Sangmyung University, Cheonan 330-180, Republic of Korea. kbkim@smu.ac.kr

PMID: 16273032

Abstract

The high-throughput sequencing of microbial genomes has resulted in the relatively rapid accumulation of an enormous amount of genomic sequence data. In this context, the problem posed by the detection of promoters in genomic DNA sequences via computational methods has attracted considerable research attention in recent years. This paper addresses the development of a predictive model, known as the dependence decomposition weight matrix model (DDWMM), which was designed to detect the core promoter region, including the -10 region and the transcription start sites (TSSs), in prokaryotic genomic DNA sequences. This is an issue of some importance with regard to genome annotation efforts. Our predictive model captures the most significant dependencies between positions (allowing for non-adjacent as well as adjacent dependencies) via the maximal dependence decomposition (MDD) procedure, which iteratively decomposes data sets into subsets, based on the significant dependence between positions in the promoter region to be modeled. Such dependencies may be intimately related to biological and structural concerns, since promoter elements are present in a variety of combinations, which are separated by various distances. In this respect, the DDWMM may prove to be appropriate with regard to the detection of core promoter regions and TSSs in long microbial genomic contigs. In order to demonstrate the effectiveness of our predictive model, we applied 10-fold cross-validation experiments on the 607 experimentally-verified promoter sequences, which evidenced good performance in terms of sensitivity.

MeSH terms

Base Sequence / genetics*
Computational Biology / methods*
Escherichia coli / genetics*
Forecasting
Genome, Bacterial*
Markov Chains
Models, Biological
Promoter Regions, Genetic / genetics*
Software Validation
Transcription Initiation Site