Format

Send to

Choose Destination
See comment in PubMed Commons below
Nucleic Acids Res. 2012 Jun;40(11):4765-73. doi: 10.1093/nar/gks154. Epub 2012 Feb 16.

Exploiting mid-range DNA patterns for sequence classification: binary abstraction Markov models.

Author information

  • 1Department of Medicine, University of Toledo, Health Science Campus, Toledo, OH 43614, USA.

Abstract

Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exon/intron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exon/intron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5'-untranslated regions.

PMID:
22344692
PMCID:
PMC3367190
DOI:
10.1093/nar/gks154
[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Support Center