Logo of narLink to Publisher's site
Nucleic Acids Res. 1987 Mar 25; 15(6): 2611–2626.
PMCID: PMC340672

Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis.


Several statistical methods were tested for accuracy in predicting observed frequencies of di- through hexanucleotides in 74,444 bp of E. coli DNA. A Markov chain was most accurate overall, whereas other methods, including a random model based on mononucleotide frequencies, were very inaccurate. When ranked highest to lowest abundance, the observed frequencies of oligonucleotides up to six bases in length in E. coli DNA were highly asymmetric. All ordered abundance plots had a wide linear range containing the majority of the oligomers which deviated sharply at the high and low ends of the curves. In general, values predicted by a Markov chain closely followed the overall shape of the ordered abundance curves. A simple equation was derived by which the frequency of any nucleotide longer than four bases in the E. coli genome (or any genome) can be relatively accurately estimated from the nested set of component tri- and tetranucleotides by serial application of a 3rd order Markov chain. The equation yielded a mean ratio of 1.03 +/- 0.94 for the observed-to-expected frequencies of the 4,096 hexanucleotides. Hence, the method is a relatively accurate but not perfect predictor of the length in nucleotides between hexanucleotide sites. Higher accuracy can be achieved using a 4th order Markov chain and larger data sets. The high asymmetry in oligonucleotide abundance means that in the E. coli genome of 4.2 X 10(6) bp many relatively short sequences of 7-9 bp are very rare or absent.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.2M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Images in this article

Click on the image to see a larger version.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Nussinov R. Some rules in the ordering of nucleotides in the DNA. Nucleic Acids Res. 1980 Oct 10;8(19):4545–4562. [PMC free article] [PubMed]
  • Nussinov R. Nearest neighbor nucleotide patterns. Structural and biological implications. J Biol Chem. 1981 Aug 25;256(16):8458–8462. [PubMed]
  • Nussinov R. The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice. J Mol Evol. 1981;17(4):237–244. [PubMed]
  • Nussinov R. Strong doublet preferences in nucleotide sequences and DNA geometry. J Mol Evol. 1984;20(2):111–119. [PubMed]
  • Smith TF, Waterman MS, Burks C. The statistical distribution of nucleic acid similarities. Nucleic Acids Res. 1985 Jan 25;13(2):645–656. [PMC free article] [PubMed]
  • Elleman TC. A method for detecting distant evolutionary relationships between protein or nucleic acid sequences in the presence of deletions or insertions. J Mol Evol. 1978 Jun 20;11(2):143–161. [PubMed]
  • Sankoff D, Cedergren RJ. A test for nucleotide sequence homology. J Mol Biol. 1973 Jun 15;77(1):169–164. [PubMed]
  • Blaisdell BE. A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci U S A. 1986 Jul;83(14):5155–5159. [PMC free article] [PubMed]
  • Arnold J, Eckenrode VK, Lemke K, Phillips GJ, Schaeffer SW. A comprehensive package for DNA sequence analysis in FORTRAN IV for the PDP-11. Nucleic Acids Res. 1986 Jan 10;14(1):239–254. [PMC free article] [PubMed]
  • Almagor H. A Markov analysis of DNA sequences. J Theor Biol. 1983 Oct 21;104(4):633–645. [PubMed]
  • Blaisdell BE. Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol. 1984;21(3):278–288. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...