• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Sep 1, 1997; 25(17): 3389–3402.
PMCID: PMC146917

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Abstract

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

Full Text

The Full Text of this article is available as a PDF (205K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed]
  • Altschul SF, Gish W. Local alignment statistics. Methods Enzymol. 1996;266:460–480. [PubMed]
  • Chao KM, Pearson WR, Miller W. Aligning two sequences within a specified diagonal band. Comput Appl Biosci. 1992 Oct;8(5):481–487. [PubMed]
  • Altschul SF, Erickson BW. Locally optimal subalignments using nonlinear similarity functions. Bull Math Biol. 1986;48(5-6):633–660. [PubMed]
  • Waterman MS, Eggert M. A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J Mol Biol. 1987 Oct 20;197(4):723–728. [PubMed]
  • Karlin S, Altschul SF. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. [PMC free article] [PubMed]
  • Altschul SF. Amino acid substitution matrices from an information theoretic perspective. J Mol Biol. 1991 Jun 5;219(3):555–565. [PubMed]
  • Altschul SF. A protein alignment scoring system sensitive at all evolutionary distances. J Mol Evol. 1993 Mar;36(3):290–300. [PubMed]
  • Smith TF, Waterman MS, Burks C. The statistical distribution of nucleic acid similarities. Nucleic Acids Res. 1985 Jan 25;13(2):645–656. [PMC free article] [PubMed]
  • Collins JF, Coulson AF, Lyall A. The significance of protein sequence similarities. Comput Appl Biosci. 1988 Mar;4(1):67–71. [PubMed]
  • Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. [PMC free article] [PubMed]
  • Wilbur WJ, Lipman DJ. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726–730. [PMC free article] [PubMed]
  • Robinson AB, Robinson LR. Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. Proc Natl Acad Sci U S A. 1991 Oct 15;88(20):8880–8884. [PMC free article] [PubMed]
  • Karlin S, Altschul SF. Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci U S A. 1993 Jun 15;90(12):5873–5877. [PMC free article] [PubMed]
  • Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. [PubMed]
  • Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed]
  • Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acids Res. 1997 Jan 1;25(1):31–36. [PMC free article] [PubMed]
  • Jou WM, Verhoeyen M, Devos R, Saman E, Fang R, Huylebroeck D, Fiers W, Threlfall G, Barber C, Carey N, et al. Complete structure of the hemagglutinin gene from the human influenza A/Victoria/3/75 (H3N2) strain as determined from cloned DNA. Cell. 1980 Mar;19(3):683–696. [PubMed]
  • McLachlan AD. Analysis of gene duplication repeats in the myosin rod. J Mol Biol. 1983 Sep 5;169(1):15–30. [PubMed]
  • Staden R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 2):505–519. [PMC free article] [PubMed]
  • Schneider TD, Stormo GD, Gold L, Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol. 1986 Apr 5;188(3):415–431. [PubMed]
  • Taylor WR. Identification of protein sequence homology by consensus template alignment. J Mol Biol. 1986 Mar 20;188(2):233–258. [PubMed]
  • Berg OG, von Hippel PH. Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J Mol Biol. 1987 Feb 20;193(4):723–750. [PubMed]
  • Dodd IB, Egan JB. Systematic method for the detection of potential lambda Cro-like DNA-binding regions in proteins. J Mol Biol. 1987 Apr 5;194(3):557–564. [PubMed]
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Patthy L. Detecting homology of distantly related proteins with consensus sequences. J Mol Biol. 1987 Dec 20;198(4):567–577. [PubMed]
  • Stormo GD, Hartzell GW., 3rd Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A. 1989 Feb;86(4):1183–1187. [PMC free article] [PubMed]
  • Tatusov RL, Altschul SF, Koonin EV. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091–12095. [PMC free article] [PubMed]
  • Yi TM, Lander ES. Recognition of related proteins by iterative template refinement (ITR). Protein Sci. 1994 Aug;3(8):1315–1328. [PMC free article] [PubMed]
  • Henikoff S, Henikoff JG. Embedding strategies for effective use of information from multiple sequence alignments. Protein Sci. 1997 Mar;6(3):698–705. [PMC free article] [PubMed]
  • Bucher P, Karplus K, Moeri N, Hofmann K. A flexible motif search technique based on generalized profiles. Comput Chem. 1996 Mar;20(1):3–23. [PubMed]
  • Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. [PubMed]
  • Altschul SF, Carroll RJ, Lipman DJ. Weights for data related by a tree. J Mol Biol. 1989 Jun 20;207(4):647–653. [PubMed]
  • Sibbald PR, Argos P. Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J Mol Biol. 1990 Dec 20;216(4):813–818. [PubMed]
  • Sander C, Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56–68. [PubMed]
  • Gerstein M, Sonnhammer EL, Chothia C. Volume changes in protein evolution. J Mol Biol. 1994 Mar 4;236(4):1067–1078. [PubMed]
  • Henikoff S, Henikoff JG. Position-based sequence weights. J Mol Biol. 1994 Nov 4;243(4):574–578. [PubMed]
  • Thompson JD, Higgins DG, Gibson TJ. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. [PubMed]
  • Eddy SR, Mitchison G, Durbin R. Maximum discrimination hidden Markov models of sequence consensus. J Comput Biol. 1995 Spring;2(1):9–23. [PubMed]
  • Gotoh O. A weighting system and algorithm for aligning many phylogenetically related sequences. Comput Appl Biosci. 1995 Oct;11(5):543–551. [PubMed]
  • Henikoff JG, Henikoff S. Using substitution probabilities to improve position-specific scoring matrices. Comput Appl Biosci. 1996 Apr;12(2):135–143. [PubMed]
  • Brown M, Hughey R, Krogh A, Mian IS, Sjölander K, Haussler D. Using Dirichlet mixture priors to derive hidden Markov models for protein families. Proc Int Conf Intell Syst Mol Biol. 1993;1:47–55. [PubMed]
  • Sjölander K, Karplus K, Brown M, Hughey R, Krogh A, Mian IS, Haussler D. Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Comput Appl Biosci. 1996 Aug;12(4):327–345. [PubMed]
  • Altschul SF, Boguski MS, Gish W, Wootton JC. Issues in searching molecular sequence databases. Nat Genet. 1994 Feb;6(2):119–129. [PubMed]
  • Benson DA, Boguski MS, Lipman DJ, Ostell J. GenBank. Nucleic Acids Res. 1997 Jan 1;25(1):1–6. [PMC free article] [PubMed]
  • Holm L, Sander C. New structure--novel fold? Structure. 1997 Feb 15;5(2):165–171. [PubMed]
  • Ohta M, Inoue H, Cotticelli MG, Kastury K, Baffa R, Palazzo J, Siprashvili Z, Mori M, McCue P, Druck T, et al. The FHIT gene, spanning the chromosome 3p14.2 fragile site and renal carcinoma-associated t(3;8) breakpoint, is abnormal in digestive tract cancers. Cell. 1996 Feb 23;84(4):587–597. [PubMed]
  • Heidenreich RA, Mallee J, Segal S. Rat galactose-1-phosphate uridyltransferase coding sequence, transcription start site and genomic organization. DNA Seq. 1993;3(5):311–318. [PubMed]
  • Maskell DJ, Szabo MJ, Deadman ME, Moxon ER. The gal locus from Haemophilus influenzae: cloning, sequencing and the use of gal mutants to study lipopolysaccharide. Mol Microbiol. 1992 Oct;6(20):3051–3063. [PubMed]
  • Plateau P, Fromant M, Schmitter JM, Buhler JM, Blanquet S. Isolation, characterization, and inactivation of the APA1 gene encoding yeast diadenosine 5',5'''-P1,P4-tetraphosphate phosphorylase. J Bacteriol. 1989 Dec;171(12):6437–6445. [PMC free article] [PubMed]
  • Koonin EV, Altschul SF, Bork P. BRCA1 protein products ... Functional motifs... Nat Genet. 1996 Jul;13(3):266–268. [PubMed]
  • Bork P, Hofmann K, Bucher P, Neuwald AF, Altschul SF, Koonin EV. A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J. 1997 Jan;11(1):68–76. [PubMed]
  • Callebaut I, Mornon JP. From BRCA1 to RAP1: a widespread BRCT module closely associated with DNA repair. FEBS Lett. 1997 Jan 2;400(1):25–30. [PubMed]
  • Miki Y, Swensen J, Shattuck-Eidens D, Futreal PA, Harshman K, Tavtigian S, Liu Q, Cochran C, Bennett LM, Ding W, et al. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science. 1994 Oct 7;266(5182):66–71. [PubMed]
  • Wu LC, Wang ZW, Tsan JT, Spillman MA, Phung A, Xu XL, Yang MC, Hwang LY, Bowcock AM, Baer R. Identification of a RING protein that can interact in vivo with the BRCA1 gene product. Nat Genet. 1996 Dec;14(4):430–440. [PubMed]
  • Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996 Aug 23;273(5278):1058–1073. [PubMed]
  • Nagase T, Seki N, Ishikawa K, Ohira M, Kawarabayasi Y, Ohara O, Tanaka A, Kotani H, Miyajima N, Nomura N. Prediction of the coding sequences of unidentified human genes. VI. The coding sequences of 80 new genes (KIAA0201-KIAA0280) deduced by analysis of cDNA clones from cell line KG-1 and brain. DNA Res. 1996 Oct 31;3(5):321–354. [PubMed]
  • Wilson R, Ainscough R, Anderson K, Baynes C, Berks M, Bonfield J, Burton J, Connell M, Copsey T, Cooper J, et al. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature. 1994 Mar 3;368(6466):32–38. [PubMed]
  • Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, et al. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res. 1996 Jun 30;3(3):109–136. [PubMed]
  • Allende ML, Amsterdam A, Becker T, Kawakami K, Gaiano N, Hopkins N. Insertional mutagenesis in zebrafish identifies two novel genes, pescadillo and dead eye, essential for embryonic development. Genes Dev. 1996 Dec 15;10(24):3141–3155. [PubMed]
  • Sonnhammer EL, Durbin R. A workbench for large-scale sequence homology analysis. Comput Appl Biosci. 1994 Jun;10(3):301–307. [PubMed]
  • Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982 Dec 15;162(3):705–708. [PubMed]
  • Fitch WM, Smith TF. Optimal sequence alignments. Proc Natl Acad Sci U S A. 1983 Mar;80(5):1382–1386. [PMC free article] [PubMed]
  • Altschul SF, Erickson BW. Optimal sequence alignment using affine gap costs. Bull Math Biol. 1986;48(5-6):603–616. [PubMed]
  • Myers EW, Miller W. Optimal alignments in linear space. Comput Appl Biosci. 1988 Mar;4(1):11–17. [PubMed]
  • Richardson M, Dilworth MJ, Scawen MD. The amino acid sequence of leghaemoglobin I from root nodules of broad bean (Vicia faba L.). FEBS Lett. 1975 Mar 1;51(1):33–37. [PubMed]
  • Matsuda G, Maita T, Braunitzer G, Schrank B. Hämoglobine, XXXIII: Notiz zur Sequenz der Hämoglobine des Pferdes. Hoppe Seylers Z Physiol Chem. 1980 Jul;361(7):1107–1116. [PubMed]
  • Tokunaga O, Yaegashi T, Lowe J, Dobbs L, Padmanabhan R. Sequence analysis in the E1 region of adenovirus type 4 DNA. Virology. 1986 Dec;155(2):418–433. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • Compound
    Compound
    PubChem Compound links
  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...