Display Settings:


Send to:

Choose Destination
See comment in PubMed Commons below
J Comput Biol. 1995 Fall;2(3):417-37.

Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences.

Author information

  • 1INRA, D├ępartement de Biom├ętrie et Intelligence Artificielle, Jouy-en-Josas, France.


Identifying exceptional motifs is often used for extracting information from long DNA sequences. The two difficulties of the method are the choice of the model that defines the expected frequencies of words and the approximation of the variance of the difference T(W) between the number of occurrences of a word W and its estimation. We consider here different Markov chain models, either with stationary or periodic transition probabilities. We estimate the variance of the difference T(W) by the conditional variance of the number of occurrences of W given the oligonucleotides counts that define the model. Two applications show how to use asymptotically standard normal statistics associated with the counts to describe a given sequence in terms of its outlying words. Sequences of Escherichia coli and of Bacillus subtilis are compared with respect to their exceptional tri- and tetranucleotides. For both bacteria, exceptional 3-words are mainly found in the coding frame. E. coli palindrome counts are analyzed in different models, showing that many overabundant words are one-letter mutations of avoided palindromes.

[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Loading ...
    Write to the Help Desk