Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jun 15, 1993; 90(12): 5873–5877.
PMCID: PMC46825

Applications and statistics for multiple high-scoring segments in molecular sequences.

Abstract

Score-based measures of molecular-sequence features provide versatile aids for the study of proteins and DNA. They are used by many sequence data base search programs, as well as for identifying distinctive properties of single sequences. For any such measure, it is important to know what can be expected to occur purely by chance. The statistical distribution of high-scoring segments has been described elsewhere. However, molecular sequences will frequently yield several high-scoring segments for which some combined assessment is in order. This paper describes the statistical distribution for the sum of the scores of multiple high-scoring segments and illustrates its application to the identification of possible transmembrane segments and the evaluation of sequence similarity.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (960K), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Karlin S, Brendel V. Chance and statistical significance in protein and DNA sequence analysis. Science. 1992 Jul 3;257(5066):39–49. [PubMed]
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed]
  • Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed]
  • Collins JF, Coulson AF, Lyall A. The significance of protein sequence similarities. Comput Appl Biosci. 1988 Mar;4(1):67–71. [PubMed]
  • Karlin S, Bucher P, Brendel V, Altschul SF. Statistical methods and insights for protein and DNA sequences. Annu Rev Biophys Biophys Chem. 1991;20:175–203. [PubMed]
  • Smith TF, Waterman MS, Burks C. The statistical distribution of nucleic acid similarities. Nucleic Acids Res. 1985 Jan 25;13(2):645–656. [PMC free article] [PubMed]
  • Altschul SF, Erickson BW. Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Mol Biol Evol. 1985 Nov;2(6):526–538. [PubMed]
  • Fitch WM. Random sequences. J Mol Biol. 1983 Jan 15;163(2):171–176. [PubMed]
  • Altschul SF. Amino acid substitution matrices from an information theoretic perspective. J Mol Biol. 1991 Jun 5;219(3):555–565. [PubMed]
  • Karlin S, Altschul SF. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. [PMC free article] [PubMed]
  • Brendel V, Bucher P, Nourbakhsh IR, Blaisdell BE, Karlin S. Methods and algorithms for statistical analysis of protein sequences. Proc Natl Acad Sci U S A. 1992 Mar 15;89(6):2002–2006. [PMC free article] [PubMed]
  • Feng DF, Johnson MS, Doolittle RF. Aligning amino acid sequences: comparison of commonly used methods. J Mol Evol. 1984;21(2):112–125. [PubMed]
  • Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the entire protein sequence database. Science. 1992 Jun 5;256(5062):1443–1445. [PubMed]
  • Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. [PMC free article] [PubMed]
  • Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992 Jun;8(3):275–282. [PubMed]
  • McLachlan AD. Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . J Mol Biol. 1971 Oct 28;61(2):409–424. [PubMed]
  • Wilbur WJ. On the PAM matrix model of protein evolution. Mol Biol Evol. 1985 Sep;2(5):434–447. [PubMed]
  • Altschul SF, Lipman DJ. Protein database searches for multiple alignments. Proc Natl Acad Sci U S A. 1990 Jul;87(14):5509–5513. [PMC free article] [PubMed]
  • Gish W, States DJ. Identification of protein coding regions by database similarity search. Nat Genet. 1993 Mar;3(3):266–272. [PubMed]
  • Michael WM, Bowtell DD, Rubin GM. Comparison of the sevenless genes of Drosophila virilis and Drosophila melanogaster. Proc Natl Acad Sci U S A. 1990 Jul;87(14):5351–5353. [PMC free article] [PubMed]
  • Simon MA, Bowtell DD, Rubin GM. Structure and activity of the sevenless protein: a protein tyrosine kinase receptor required for photoreceptor development in Drosophila. Proc Natl Acad Sci U S A. 1989 Nov;86(21):8333–8337. [PMC free article] [PubMed]
  • Kobilka BK, Frielle T, Collins S, Yang-Feng T, Kobilka TS, Francke U, Lefkowitz RJ, Caron MG. An intronless gene encoding a potential member of the family of receptors coupled to guanine nucleotide regulatory proteins. Nature. 1987 Sep 3;329(6134):75–79. [PubMed]
  • Karlin S, Brendel V, Bucher P. Significant similarity and dissimilarity in homologous proteins. Mol Biol Evol. 1992 Jan;9(1):152–167. [PubMed]
  • Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022. [PMC free article] [PubMed]
  • Heilig R, Perrin F, Gannon F, Mandel JL, Chambon P. The ovalbumin gene family: structure of the X gene and evolution of duplicated split genes. Cell. 1980 Jul;20(3):625–637. [PubMed]
  • Tomley F, Binns M, Campbell J, Boursnell M. Sequence analysis of an 11.2 kilobase, near-terminal, BamHI fragment of fowlpox virus. J Gen Virol. 1988 May;69(Pt 5):1025–1040. [PubMed]
  • Barker WC, George DG, Mewes HW, Tsugita A. The PIR-International Protein Sequence Database. Nucleic Acids Res. 1992 May 11;20 (Suppl):2023–2026. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • Compound
    Compound
    PubChem Compound links
  • Gene
    Gene
    Gene links
  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...