Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. 1986 Jul; 83(14): 5155–5159.
PMCID: PMC323909

A measure of the similarity of sets of sequences not requiring sequence alignment.


Determination of first- and second-order Markov chain homogeneity of sets of nuclear eukaryotic DNA sequences, both coding and noncoding, finds similarities imperceptible to the standard Needleman-Wunsch base matching or dot-matrix algorithms. These measures of the similarities of the distributions of adjacent pairs or triplets are in agreement with accepted evolutionary-tree topologies. Hierarchical clustering of the distributions of doublets of 30 miscellaneous coding sequences gives clusters in reasonable agreement with accepted biological classifications. In addition to similarity by homology, there is also observed similarity of disparate genes in the same organism--for example, all three disparate yeast genes (two enzymes and actin) form a well-distinguished cluster.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (990K), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Blaisdell BE. Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol. 1984;21(3):278–288. [PubMed]
  • Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. [PubMed]
  • Konkel DA, Maizel JV, Jr, Leder P. The evolution and sequence comparison of two recently diverged mouse chromosomal beta--globin genes. Cell. 1979 Nov;18(3):865–873. [PubMed]
  • Blaisdell BE. Choice of base at silent codon site 3 is not selectively neutral in eucaryotic structural genes: it maintains excess short runs of weak and strong hydrogen bonding bases. J Mol Evol. 1983;19(3-4):226–236. [PubMed]
  • Proudfoot NJ, Maniatis T. The structure of a human alpha-globin pseudogene and its relationship to alpha-globin gene duplication. Cell. 1980 Sep;21(2):537–544. [PubMed]
  • Lawn RM, Efstratiadis A, O'Connell C, Maniatis T. The nucleotide sequence of the human beta-globin gene. Cell. 1980 Oct;21(3):647–651. [PubMed]
  • Slightom JL, Blechl AE, Smithies O. Human fetal G gamma- and A gamma-globin genes: complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell. 1980 Oct;21(3):627–638. [PubMed]
  • Spritz RA, DeRiel JK, Forget BG, Weissman SM. Complete nucleotide sequence of the human delta-globin gene. Cell. 1980 Oct;21(3):639–646. [PubMed]
  • Baralle FE, Shoulders CC, Proudfoot NJ. The primary structure of the human epsilon-globin gene. Cell. 1980 Oct;21(3):621–626. [PubMed]
  • Nishioka Y, Leder P. The complete sequence of a chromosomal mouse alpha--globin gene reveals elements conserved throughout vertebrate evolution. Cell. 1979 Nov;18(3):875–882. [PubMed]
  • Konkel DA, Maizel JV, Jr, Leder P. The evolution and sequence comparison of two recently diverged mouse chromosomal beta--globin genes. Cell. 1979 Nov;18(3):865–873. [PubMed]
  • van Ooyen A, van den Berg J, Mantei N, Weissmann C. Comparison of total sequence of a cloned rabbit beta-globin gene and its flanking regions with a homologous mouse sequence. Science. 1979 Oct 19;206(4416):337–344. [PubMed]
  • Richards RI, Shine J, Ullrich A, Wells JR, Goodman HM. Molecular cloning and sequence analysis of adult chicken betal globin cDNA. Nucleic Acids Res. 1979 Nov 10;7(5):1137–1146. [PMC free article] [PubMed]
  • Hieter PA, Max EE, Seidman JG, Maizel JV, Jr, Leder P. Cloned human and mouse kappa immunoglobulin constant and J region genes conserve homology in functional segments. Cell. 1980 Nov;22(1 Pt 1):197–207. [PubMed]
  • Altenburger W, Neumaier PS, Steinmetz M, Zachau HG. DNA sequence of the constant gene region of the mouse immunoglobulin kappa chain. Nucleic Acids Res. 1981 Feb 25;9(4):971–981. [PMC free article] [PubMed]
  • Takahashi N, Kataoka T, Honjo T. Nucleotide sequences of class-switch recombination region of the mouse immunoglobulin gamma 2b-chain gene. Gene. 1980 Oct;11(1-2):117–127. [PubMed]
  • Nishioka Y, Leder P. Organization and complete sequence of identical embryonic and plasmacytoma kappa V-region genes. J Biol Chem. 1980 Apr 25;255(8):3691–3694. [PubMed]
  • Sakano H, Maki R, Kurosawa Y, Roeder W, Tonegawa S. Two types of somatic recombination are necessary for the generation of complete immunoglobulin heavy-chain genes. Nature. 1980 Aug 14;286(5774):676–683. [PubMed]
  • Ullrich A, Dull TJ, Gray A, Brosius J, Sures I. Genetic variation in the human insulin gene. Science. 1980 Aug 1;209(4456):612–615. [PubMed]
  • Bell GI, Pictet RL, Rutter WJ, Cordell B, Tischer E, Goodman HM. Sequence of the human insulin gene. Nature. 1980 Mar 6;284(5751):26–32. [PubMed]
  • Lomedico P, Rosenthal N, Efstratidadis A, Gilbert W, Kolodner R, Tizard R. The structure and evolution of the two nonallelic rat preproinsulin genes. Cell. 1979 Oct;18(2):545–558. [PubMed]
  • Perler F, Efstratiadis A, Lomedico P, Gilbert W, Kolodner R, Dodgson J. The evolution of genes: the chicken preproinsulin gene. Cell. 1980 Jun;20(2):555–566. [PubMed]
  • Chang AC, Cochet M, Cohen SN. Structural organization of human genomic DNA encoding the pro-opiomelanocortin peptide. Proc Natl Acad Sci U S A. 1980 Aug;77(8):4890–4894. [PMC free article] [PubMed]
  • Gubbins EJ, Maurer RA, Lagrimini M, Erwin CR, Donelson JE. Structure of the rat prolactin gene. J Biol Chem. 1980 Sep 25;255(18):8655–8662. [PubMed]
  • Goeddel DV, Yelverton E, Ullrich A, Heyneker HL, Miozzari G, Holmes W, Seeburg PH, Dull T, May L, Stebbing N, et al. Human leukocyte interferon produced by E. coli is biologically active. Nature. 1980 Oct 2;287(5781):411–416. [PubMed]
  • Lawn RM, Adelman J, Franke AE, Houck CM, Gross M, Najarian R, Goeddel DV. Human fibroblast interferon gene lacks introns. Nucleic Acids Res. 1981 Mar 11;9(5):1045–1052. [PMC free article] [PubMed]
  • Holland JP, Holland MJ. The primary structure of a glyceraldehyde-3-phosphate dehydrogenase gene from Saccharomyces cerevisiae. J Biol Chem. 1979 Oct 10;254(19):9839–9845. [PubMed]
  • Tschumper G, Carbon J. Sequence of a yeast DNA fragment containing a chromosomal replicator and the TRP1 gene. Gene. 1980 Jul;10(2):157–166. [PubMed]
  • Young RA, Hagenbüchle O, Schibler U. A single mouse alpha-amylase gene specifies two different tissue-specific mRNAs. Cell. 1981 Feb;23(2):451–458. [PubMed]
  • Ng R, Abelson J. Isolation and sequence of the gene for actin in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 1980 Jul;77(7):3912–3916. [PMC free article] [PubMed]
  • Sures I, Lowry J, Kedes LH. The DNA sequence of sea urchin (S. purpuratus) H2A, H2B and H3 histone coding and spacer regions. Cell. 1978 Nov;15(3):1033–1044. [PubMed]
  • Robertson MA, Staden R, Tanaka Y, Catterall JF, O'Malley BW, Brownlee GG. Sequence of three introns in the chick ovalbumin gene. Nature. 1979 Mar 22;278(5702):370–372. [PubMed]
  • Bell GI, Pictet R, Rutter WJ. Analysis of the regions flanking the human insulin gene and sequence of an Alu family member. Nucleic Acids Res. 1980 Sep 25;8(18):4091–4109. [PMC free article] [PubMed]
  • Pan J, Elder JT, Duncan CH, Weissman SM. Structural analysis of interspersed repetitive polymerase III transcription units in human DNA. Nucleic Acids Res. 1981 Mar 11;9(5):1151–1170. [PMC free article] [PubMed]
  • Tsujimoto Y, Suzuki Y. The DNA sequence of Bombyx mori fibroin gene including the 5' flanking, mRNA coding, entire intervening and fibroin protein coding regions. Cell. 1979 Oct;18(2):591–600. [PubMed]
  • Baralle FE, Shoulders CC, Goodbourn S, Jeffreys A, Proudfoot NJ. The 5' flanking region of human epsilon-globin gene. Nucleic Acids Res. 1980 Oct 10;8(19):4393–4404. [PMC free article] [PubMed]
  • Sakano H, Hüppi K, Heinrich G, Tonegawa S. Sequences at the somatic recombination sites of immunoglobulin light-chain genes. Nature. 1979 Jul 26;280(5720):288–294. [PubMed]
  • Newell N, Richards JE, Tucker PW, Blattner FR. J genes for heavy chain immunoglobulins of mouse. Science. 1980 Sep 5;209(4461):1128–1132. [PubMed]
  • Efstratiadis A, Posakony JW, Maniatis T, Lawn RM, O'Connell C, Spritz RA, DeRiel JK, Forget BG, Weissman SM, Slightom JL, et al. The structure and evolution of the human beta-globin gene family. Cell. 1980 Oct;21(3):653–668. [PubMed]
  • JOSSE J, KAISER AD, KORNBERG A. Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J Biol Chem. 1961 Mar;236:864–875. [PubMed]
  • Bird AP. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980 Apr 11;8(7):1499–1504. [PMC free article] [PubMed]
  • Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F. The mosaic genome of warm-blooded vertebrates. Science. 1985 May 24;228(4702):953–958. [PubMed]
  • Smith TF, Waterman MS, Sadler JR. Statistical characterization of nucleic acid sequence functional domains. Nucleic Acids Res. 1983 Apr 11;11(7):2205–2220. [PMC free article] [PubMed]
  • Grantham R, Gautier C, Gouy M, Mercier R, Pavé A. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 1980 Jan 11;8(1):r49–r62. [PMC free article] [PubMed]
  • Zuckerkandl E. The appearance of new structures and functions in proteins during evolution. J Mol Evol. 1975 Dec 31;7(1):1–57. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Compound
    PubChem Compound links
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...