• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jul 19, 1994; 91(15): 7134–7138.

Atypical regions in large genomic DNA sequences.


Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. We describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of > 1000 nt and human sequences of > 10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. We consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (1.0M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Green P, Lipman D, Hillier L, Waterston R, States D, Claverie JM. Ancient conserved regions in new gene sequences and the protein databases. Science. 1993 Mar 19;259(5102):1711–1716. [PubMed]
  • Oliver SG, van der Aart QJ, Agostoni-Carbone ML, Aigle M, Alberghina L, Alexandraki D, Antoine G, Anwar R, Ballesta JP, Benit P, et al. The complete DNA sequence of yeast chromosome III. Nature. 1992 May 7;357(6373):38–46. [PubMed]
  • JOSSE J, KAISER AD, KORNBERG A. Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J Biol Chem. 1961 Mar;236:864–875. [PubMed]
  • Burge C, Campbell AM, Karlin S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci U S A. 1992 Feb 15;89(4):1358–1362. [PMC free article] [PubMed]
  • Cuticchia AJ, Ivarie R, Arnold J. The application of Markov chain analysis to oligonucleotide frequency prediction and physical mapping of Drosophila melanogaster. Nucleic Acids Res. 1992 Jul 25;20(14):3651–3657. [PMC free article] [PubMed]
  • Barondess JJ, Beckwith J. A bacterial virulence determinant encoded by lysogenic coliphage lambda. Nature. 1990 Aug 30;346(6287):871–874. [PubMed]
  • Ouellette BF, Clark MW, Keng T, Storms RK, Zhong W, Zeng B, Fortin N, Delaney S, Barton A, Kaback DB, et al. Sequencing of chromosome I from Saccharomyces cerevisiae: analysis of a 32 kb region between the LTE1 and SPO7 genes. Genome. 1993 Feb;36(1):32–42. [PubMed]
  • Hardison R, Miller W. Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters. Mol Biol Evol. 1993 Jan;10(1):73–102. [PubMed]
  • Tagle DA, Stanhope MJ, Siemieniak DR, Benson P, Goodman M, Slightom JL. The beta globin gene cluster of the prosimian primate Galago crassicaudatus: nucleotide sequence determination of the 41-kb cluster and comparative sequence analyses. Genomics. 1992 Jul;13(3):741–760. [PubMed]
  • Coderre PE, Earhart CF. The entD gene of the Escherichia coli K12 enterobactin gene cluster. J Gen Microbiol. 1989 Nov;135(11):3043–3055. [PubMed]
  • Karlin S, Brendel V. Chance and statistical significance in protein and DNA sequence analysis. Science. 1992 Jul 3;257(5066):39–49. [PubMed]
  • Pesole G, Prunella N, Liuni S, Attimonelli M, Saccone C. WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences. Nucleic Acids Res. 1992 Jun 11;20(11):2871–2875. [PMC free article] [PubMed]
  • Kleffe J, Borodovsky M. First and second moment of counts of words in random texts generated by Markov chains. Comput Appl Biosci. 1992 Oct;8(5):433–441. [PubMed]
  • Sharp PM, Lloyd AT. Regional base composition variation along yeast chromosome III: evolution of chromosome primary structure. Nucleic Acids Res. 1993 Jan 25;21(2):179–183. [PMC free article] [PubMed]
  • Lamperti ED, Kittelberger JM, Smith TF, Villa-Komaroff L. Corruption of genomic databases with anomalous sequence. Nucleic Acids Res. 1992 Jun 11;20(11):2741–2747. [PMC free article] [PubMed]
  • Binns M. Contamination of DNA database sequence entries with Escherichia coli insertion sequences. Nucleic Acids Res. 1993 Feb 11;21(3):779–779. [PMC free article] [PubMed]
  • Karlin S, Brendel V. Patchiness and correlations in DNA sequences. Science. 1993 Jan 29;259(5095):677–680. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...