Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 11, 1984; 12(1 Pt 1): 215–226.
PMCID: PMC320998

On the statistical significance of nucleic acid similarities.


When evaluating sequence similarities among nucleic acids by the usual methods, statistical significance is often found when the biological significance of the similarity is dubious. We demonstrate that the known statistical properties of nucleic acid sequences strongly affect the statistical distribution of similarity values when calculated by standard procedures. We propose a series of models which account for some of these known statistical properties. The utility of the method is demonstrated in evaluating high relative similarity scores in four specific cases in which there is little biological context by which to judge the similarities. In two of the cases we identify the statistical properties which are responsible for the apparent similarity. In the other two cases the statistical significance of the similarity persists even when the known statistical properties of sequences are modelled. For one of these cases biological significance is likely while the other case remains an enigma.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (893K), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Nussinov R. Some rules in the ordering of nucleotides in the DNA. Nucleic Acids Res. 1980 Oct 10;8(19):4545–4562. [PMC free article] [PubMed]
  • Fickett JW. Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 1982 Sep 11;10(17):5303–5318. [PMC free article] [PubMed]
  • Smith TF, Waterman MS, Sadler JR. Statistical characterization of nucleic acid sequence functional domains. Nucleic Acids Res. 1983 Apr 11;11(7):2205–2220. [PMC free article] [PubMed]
  • Grantham R, Gautier C, Gouy M, Mercier R, Pavé A. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 1980 Jan 11;8(1):r49–r62. [PMC free article] [PubMed]
  • Grantham R, Gautier C, Gouy M. Codon frequencies in 119 individual genes confirm consistent choices of degenerate bases according to genome type. Nucleic Acids Res. 1980 May 10;8(9):1893–1912. [PMC free article] [PubMed]
  • Lipman DJ, Wilbur WJ. Contextual constraints on synonymous codon choice. J Mol Biol. 1983 Jan 25;163(3):363–376. [PubMed]
  • Nussinov R. Strong adenine clustering in nucleotide sequences. J Theor Biol. 1980 Jul 21;85(2):285–291. [PubMed]
  • Moreau J, Marcaud L, Maschat F, Kejzlarova-Lepesant J, Lepesant JA, Scherrer K. A + T-rich linkers define functional domains in eukaryotic DNA. Nature. 1982 Jan 21;295(5846):260–262. [PubMed]
  • Fitch WM. Random sequences. J Mol Biol. 1983 Jan 15;163(2):171–176. [PubMed]
  • Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed]
  • Gubbins EJ, Maurer RA, Lagrimini M, Erwin CR, Donelson JE. Structure of the rat prolactin gene. J Biol Chem. 1980 Sep 25;255(18):8655–8662. [PubMed]
  • Korn LJ, Queen CL, Wegman MN. Computer analysis of nucleic acid regulatory sequences. Proc Natl Acad Sci U S A. 1977 Oct;74(10):4401–4405. [PMC free article] [PubMed]
  • Goad WB, Kanehisa MI. Pattern recognition in nucleic acid sequences. I. A general method for finding local homologies and symmetries. Nucleic Acids Res. 1982 Jan 11;10(1):247–263. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...