Logo of narLink to Publisher's site
Nucleic Acids Res. Sep 1, 1998; 26(17): 3986–3990.
PMCID: PMC147803

Protein sequence similarity searches using patterns as seeds.

Abstract

Protein families often are characterized by conserved sequence patterns or motifs. A researcher frequently wishes to evaluate the significance of a specific pattern within a protein, or to exploit knowledge of known motifs to aid the recognition of greatly diverged but homologous family members. To assist in these efforts, the pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains. PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence. The random distribution of PHI-BLAST alignment scores is studied analytically and empirically. In many instances, the program is able to detect statistically significant similarity between homologous proteins that are not recognizably related using traditional single-pass database search methods. PHI-BLAST is applied to the analysis of CED4-like cell death regulators, HS90-type ATPase domains, archaeal tRNA nucleotidyltransferases and archaeal homologs of DnaG-type DNA primases.

Full Text

The Full Text of this article is available as a PDF (67K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed]
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed]
  • Altschul SF, Gish W. Local alignment statistics. Methods Enzymol. 1996;266:460–480. [PubMed]
  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. [PMC free article] [PubMed]
  • Myers EW, Miller W. Approximate matching of regular expressions. Bull Math Biol. 1989;51(1):5–37. [PubMed]
  • Smith RF, Smith TF. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. [PMC free article] [PubMed]
  • Staden R. Searching for patterns in protein and nucleic acid sequences. Methods Enzymol. 1990;183:193–211. [PubMed]
  • Mehldau G, Myers G. A system for pattern matching applications on biosequences. Comput Appl Biosci. 1993 Jun;9(3):299–314. [PubMed]
  • Tatusov RL, Koonin EV. A simple tool to search for sequence motifs that are conserved in BLAST outputs. Comput Appl Biosci. 1994 Jul;10(4):457–459. [PubMed]
  • Ogiwara A, Uchiyama I, Takagi T, Kanehisa M. Construction and analysis of a profile library characterizing groups of structurally known proteins. Protein Sci. 1996 Oct;5(10):1991–1999. [PMC free article] [PubMed]
  • Bairoch A, Bucher P, Hofmann K. The PROSITE database, its status in 1997. Nucleic Acids Res. 1997 Jan 1;25(1):217–221. [PMC free article] [PubMed]
  • Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. [PubMed]
  • Sankoff D. Matching sequences under deletion-insertion constraints. Proc Natl Acad Sci U S A. 1972 Jan;69(1):4–6. [PMC free article] [PubMed]
  • Zhang Z, Berman P, Miller W. Alignments without low-scoring regions. J Comput Biol. 1998 Summer;5(2):197–210. [PubMed]
  • Staden R. Methods for calculating the probabilities of finding patterns in sequences. Comput Appl Biosci. 1989 Apr;5(2):89–96. [PubMed]
  • Robinson AB, Robinson LR. Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. Proc Natl Acad Sci U S A. 1991 Oct 15;88(20):8880–8884. [PMC free article] [PubMed]
  • Smith TF, Waterman MS, Burks C. The statistical distribution of nucleic acid similarities. Nucleic Acids Res. 1985 Jan 25;13(2):645–656. [PMC free article] [PubMed]
  • Collins JF, Coulson AF, Lyall A. The significance of protein sequence similarities. Comput Appl Biosci. 1988 Mar;4(1):67–71. [PubMed]
  • Pearson WR. Empirical statistical estimates for sequence similarity searches. J Mol Biol. 1998 Feb 13;276(1):71–84. [PubMed]
  • Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. [PMC free article] [PubMed]
  • Altschul SF, Boguski MS, Gish W, Wootton JC. Issues in searching molecular sequence databases. Nat Genet. 1994 Feb;6(2):119–129. [PubMed]
  • Benson DA, Boguski MS, Lipman DJ, Ostell J, Ouellette BF. GenBank. Nucleic Acids Res. 1998 Jan 1;26(1):1–7. [PMC free article] [PubMed]
  • Seshagiri S, Miller LK. Caenorhabditis elegans CED-4 stimulates CED-3 processing and CED-3-induced apoptosis. Curr Biol. 1997 Jul 1;7(7):455–460. [PubMed]
  • Chinnaiyan AM, Chaudhary D, O'Rourke K, Koonin EV, Dixit VM. Role of CED-4 in the activation of CED-3. Nature. 1997 Aug 21;388(6644):728–729. [PubMed]
  • Zou H, Henzel WJ, Liu X, Lutschg A, Wang X. Apaf-1, a human protein homologous to C. elegans CED-4, participates in cytochrome c-dependent activation of caspase-3. Cell. 1997 Aug 8;90(3):405–413. [PubMed]
  • Li P, Nijhawan D, Budihardjo I, Srinivasula SM, Ahmad M, Alnemri ES, Wang X. Cytochrome c and dATP-dependent formation of Apaf-1/caspase-9 complex initiates an apoptotic protease cascade. Cell. 1997 Nov 14;91(4):479–489. [PubMed]
  • Bergerat A, de Massy B, Gadelle D, Varoutas PC, Nicolas A, Forterre P. An atypical topoisomerase II from Archaea with implications for meiotic recombination. Nature. 1997 Mar 27;386(6623):414–417. [PubMed]
  • Mushegian AR, Bassett DE, Jr, Boguski MS, Bork P, Koonin EV. Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. Proc Natl Acad Sci U S A. 1997 May 27;94(11):5831–5836. [PMC free article] [PubMed]
  • Tsui HT, Mandavilli BS, Winkler ME. Nonconserved segment of the MutL protein from Escherichia coli K-12 and Salmonella typhimurium. Nucleic Acids Res. 1992 May 11;20(9):2379–2379. [PMC free article] [PubMed]
  • Wilson R, Ainscough R, Anderson K, Baynes C, Berks M, Bonfield J, Burton J, Connell M, Copsey T, Cooper J, et al. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature. 1994 Mar 3;368(6466):32–38. [PubMed]
  • Nagase T, Seki N, Tanaka A, Ishikawa K, Nomura N. Prediction of the coding sequences of unidentified human genes. IV. The coding sequences of 40 new genes (KIAA0121-KIAA0160) deduced by analysis of cDNA clones from human cell line KG-1. DNA Res. 1995 Aug 31;2(4):167–210. [PubMed]
  • Yue D, Maizels N, Weiner AM. CCA-adding enzymes and poly(A) polymerases are all members of the same nucleotidyltransferase superfamily: characterization of the CCA-adding enzyme from the archaeal hyperthermophile Sulfolobus shibatae. RNA. 1996 Sep;2(9):895–908. [PMC free article] [PubMed]
  • Dracheva S, Koonin EV, Crute JJ. Identification of the primase active site of the herpes simplex virus type 1 helicase-primase. J Biol Chem. 1995 Jun 9;270(23):14148–14153. [PubMed]
  • Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996 Aug 23;273(5278):1058–1073. [PubMed]
  • Klenk HP, Clayton RA, Tomb JF, White O, Nelson KE, Ketchum KA, Dodson RJ, Gwinn M, Hickey EK, Peterson JD, et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature. 1997 Nov 27;390(6658):364–370. [PubMed]
  • Koonin EV, Mushegian AR, Galperin MY, Walker DR. Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol Microbiol. 1997 Aug;25(4):619–637. [PubMed]
  • LeBlanc DJ, Lee LN, Inamine JM. Cloning and nucleotide base sequence analysis of a spectinomycin adenyltransferase AAD(9) determinant from Enterococcus faecalis. Antimicrob Agents Chemother. 1991 Sep;35(9):1804–1810. [PMC free article] [PubMed]
  • Black CG, Fyfe JA, Davies JK. A promoter associated with the neisserial repeat can be used to transcribe the uvrB gene from Neisseria gonorrhoeae. J Bacteriol. 1995 Apr;177(8):1952–1958. [PMC free article] [PubMed]
  • Altschul SF. Generalized affine gap costs for protein sequence alignment. Proteins. 1998 Jul 1;32(1):88–96. [PubMed]
  • Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982 Dec 15;162(3):705–708. [PubMed]
  • Fitch WM, Smith TF. Optimal sequence alignments. Proc Natl Acad Sci U S A. 1983 Mar;80(5):1382–1386. [PMC free article] [PubMed]
  • Altschul SF, Erickson BW. Optimal sequence alignment using affine gap costs. Bull Math Biol. 1986;48(5-6):603–616. [PubMed]
  • Myers EW, Miller W. Optimal alignments in linear space. Comput Appl Biosci. 1988 Mar;4(1):11–17. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...