Logo of narLink to Publisher's site
Nucleic Acids Res. 1997 May 1; 25(9): 1665–1677.
PMCID: PMC146639

Extracting protein alignment models from the sequence database.


Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences.

Full Text

The Full Text of this article is available as a PDF (594K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed]
  • Greenwell PW, Kronmal SL, Porter SE, Gassenhuber J, Obermaier B, Petes TD. TEL1, a gene involved in controlling telomere length in S. cerevisiae, is homologous to the human ataxia telangiectasia gene. Cell. 1995 Sep 8;82(5):823–829. [PubMed]
  • Papadopoulos N, Nicolaides NC, Wei YF, Ruben SM, Carter KC, Rosen CA, Haseltine WA, Fleischmann RD, Fraser CM, Adams MD, et al. Mutation of a mutL homolog in hereditary colon cancer. Science. 1994 Mar 18;263(5153):1625–1629. [PubMed]
  • Bronner CE, Baker SM, Morrison PT, Warren G, Smith LG, Lescoe MK, Kane M, Earabino C, Lipford J, Lindblom A, et al. Mutation in the DNA mismatch repair gene homologue hMLH1 is associated with hereditary non-polyposis colon cancer. Nature. 1994 Mar 17;368(6468):258–261. [PubMed]
  • Henikoff S, Henikoff JG. Protein family classification based on searching a database of blocks. Genomics. 1994 Jan 1;19(1):97–107. [PubMed]
  • Gribskov M. Profile analysis. Methods Mol Biol. 1994;25:247–266. [PubMed]
  • Gribskov M, Veretnik S. Identification of sequence pattern with profile analysis. Methods Enzymol. 1996;266:198–212. [PubMed]
  • Lüthy R, Xenarios I, Bucher P. Improving the sensitivity of the sequence profile method. Protein Sci. 1994 Jan;3(1):139–146. [PMC free article] [PubMed]
  • Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994 Feb 4;235(5):1501–1531. [PubMed]
  • Baldi P, Chauvin Y, Hunkapiller T, McClure MA. Hidden Markov models of biological primary sequence information. Proc Natl Acad Sci U S A. 1994 Feb 1;91(3):1059–1063. [PMC free article] [PubMed]
  • Eddy SR. Multiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol. 1995;3:114–120. [PubMed]
  • Tatusov RL, Altschul SF, Koonin EV. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091–12095. [PMC free article] [PubMed]
  • Neuwald AF, Liu JS, Lawrence CE. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 1995 Aug;4(8):1618–1632. [PMC free article] [PubMed]
  • Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. [PubMed]
  • Eddy SR, Mitchison G, Durbin R. Maximum discrimination hidden Markov models of sequence consensus. J Comput Biol. 1995 Spring;2(1):9–23. [PubMed]
  • Ouzounis C, Sander C. Homology of the NifS family of proteins to a new class of pyridoxal phosphate-dependent enzymes. FEBS Lett. 1993 May 10;322(2):159–164. [PubMed]
  • Gribskov M. Translational initiation factors IF-1 and eIF-2 alpha share an RNA-binding motif with prokaryotic ribosomal protein S1 and polynucleotide phosphorylase. Gene. 1992 Sep 21;119(1):107–111. [PubMed]
  • Koonin EV, Tatusov RL. Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity. Application of an iterative approach to database search. J Mol Biol. 1994 Nov 18;244(1):125–132. [PubMed]
  • Koonin EV, Tatusov RL, Rudd KE. Protein sequence comparison at genome scale. Methods Enzymol. 1996;266:295–322. [PubMed]
  • Bork P, Gibson TJ. Applying motif and profile searches. Methods Enzymol. 1996;266:162–184. [PubMed]
  • Yi TM, Lander ES. Iterative template refinement: protein-fold prediction using iterative search and hybrid sequence/structure templates. Methods Enzymol. 1996;266:322–339. [PubMed]
  • Green P, Lipman D, Hillier L, Waterston R, States D, Claverie JM. Ancient conserved regions in new gene sequences and the protein databases. Science. 1993 Mar 19;259(5102):1711–1716. [PubMed]
  • Koonin EV, Bork P, Sander C. Yeast chromosome III: new gene functions. EMBO J. 1994 Feb 1;13(3):493–503. [PMC free article] [PubMed]
  • Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed]
  • Henikoff S, Henikoff JG. Position-based sequence weights. J Mol Biol. 1994 Nov 4;243(4):574–578. [PubMed]
  • Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. [PubMed]
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Aronson HE, Royer WE, Jr, Hendrickson WA. Quantification of tertiary structural conservation despite primary sequence drift in the globin fold. Protein Sci. 1994 Oct;3(10):1706–1711. [PMC free article] [PubMed]
  • Bashford D, Chothia C, Lesk AM. Determinants of a protein fold. Unique features of the globin amino acid sequences. J Mol Biol. 1987 Jul 5;196(1):199–216. [PubMed]
  • Kapp OH, Moens L, Vanfleteren J, Trotman CN, Suzuki T, Vinogradov SN. Alignment of 700 globin sequences: extent of amino acid substitution and its correlation with variation in volume. Protein Sci. 1995 Oct;4(10):2179–2190. [PMC free article] [PubMed]
  • Blaxter ML. Nemoglobins: divergent nematode globins. Parasitol Today. 1993 Oct;9(10):353–360. [PubMed]
  • Trewitt PM, Luhm RA, Samad F, Ramakrishnan S, Kao WY, Bergtrom G. Molecular evolutionary analysis of the YWVZ/7B globin gene cluster of the insect Chironomus thummi. J Mol Evol. 1995 Sep;41(3):313–328. [PubMed]
  • Wootton JC. Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem. 1994 Sep;18(3):269–285. [PubMed]
  • Minning DM, Kloek AP, Yang J, Mathews FS, Goldberg DE. Subunit interactions in Ascaris hemoglobin octamer formation. J Biol Chem. 1995 Sep 22;270(38):22248–22253. [PubMed]
  • Saraste M, Sibbald PR, Wittinghofer A. The P-loop--a common motif in ATP- and GTP-binding proteins. Trends Biochem Sci. 1990 Nov;15(11):430–434. [PubMed]
  • Walker JE, Saraste M, Runswick MJ, Gay NJ. Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J. 1982;1(8):945–951. [PMC free article] [PubMed]
  • Glucksmann MA, Reuber TL, Walker GC. Family of glycosyl transferases needed for the synthesis of succinoglycan by Rhizobium meliloti. J Bacteriol. 1993 Nov;175(21):7033–7044. [PMC free article] [PubMed]
  • Rawlings ND, Barrett AJ. Evolutionary families of metallopeptidases. Methods Enzymol. 1995;248:183–228. [PubMed]
  • Deddish PA, Skidgel RA, Erdös EG. Enhanced Co2+ activation and inhibitor binding of carboxypeptidase M at low pH. Similarity to carboxypeptidase H (enkephalin convertase). Biochem J. 1989 Jul 1;261(1):289–291. [PMC free article] [PubMed]
  • Salowe SP, Marcy AI, Cuca GC, Smith CK, Kopka IE, Hagmann WK, Hermes JD. Characterization of zinc-binding sites in human stromelysin-1: stoichiometry of the catalytic domain and identification of a cysteine ligand in the proenzyme. Biochemistry. 1992 May 19;31(19):4535–4540. [PubMed]
  • Carfi A, Pares S, Duée E, Galleni M, Duez C, Frère JM, Dideberg O. The 3-D structure of a zinc metallo-beta-lactamase from Bacillus cereus reveals a new type of protein fold. EMBO J. 1995 Oct 16;14(20):4914–4921. [PMC free article] [PubMed]
  • Richter D, Niegemann E, Brendel M. Molecular structure of the DNA cross-link repair gene SNM1 (PSO2) of the yeast Saccharomyces cerevisiae. Mol Gen Genet. 1992 Jan;231(2):194–200. [PubMed]
  • Jiang W, Metcalf WW, Lee KS, Wanner BL. Molecular cloning, mapping, and regulation of Pho regulon genes for phosphonate breakdown by the phosphonatase pathway of Salmonella typhimurium LT2. J Bacteriol. 1995 Nov;177(22):6411–6421. [PMC free article] [PubMed]
  • Perham RN. Structure and posttranslational modification of lipoyl domain of 2-oxo-acid dehydrogenase multienzyme complexes. Methods Enzymol. 1995;251:436–448. [PubMed]
  • Green JD, Laue ED, Perham RN, Ali ST, Guest JR. Three-dimensional structure of a lipoyl domain from the dihydrolipoyl acetyltransferase component of the pyruvate dehydrogenase multienzyme complex of Escherichia coli. J Mol Biol. 1995 Apr 28;248(2):328–343. [PubMed]
  • Wood HG, Barden RE. Biotin enzymes. Annu Rev Biochem. 1977;46:385–413. [PubMed]
  • Saier MH, Jr, Tam R, Reizer A, Reizer J. Two novel families of bacterial membrane proteins concerned with nodulation, cell division and transport. Mol Microbiol. 1994 Mar;11(5):841–847. [PubMed]
  • Dinh T, Paulsen IT, Saier MH., Jr A family of extracytoplasmic proteins that allow transport of large molecules across the outer membranes of gram-negative bacteria. J Bacteriol. 1994 Jul;176(13):3825–3831. [PMC free article] [PubMed]
  • Lewis K. Multidrug resistance pumps in bacteria: variations on a theme. Trends Biochem Sci. 1994 Mar;19(3):119–123. [PubMed]
  • Ma D, Cook DN, Hearst JE, Nikaido H. Efflux pumps and drug resistance in gram-negative bacteria. Trends Microbiol. 1994 Dec;2(12):489–493. [PubMed]
  • Brocklehurst SM, Perham RN. Prediction of the three-dimensional structures of the biotinylated domain from yeast pyruvate carboxylase and of the lipoylated H-protein from the pea leaf glycine cleavage system: a new automated method for the prediction of protein tertiary structure. Protein Sci. 1993 Apr;2(4):626–639. [PMC free article] [PubMed]
  • Lim F, Morris CP, Occhiodoro F, Wallace JC. Sequence and domain structure of yeast pyruvate carboxylase. J Biol Chem. 1988 Aug 15;263(23):11493–11497. [PubMed]
  • Hale G, Wallis NG, Perham RN. Interaction of avidin with the lipoyl domains in the pyruvate dehydrogenase multienzyme complex: three-dimensional location and similarity to biotinyl domains in carboxylases. Proc Biol Sci. 1992 Jun 22;248(1323):247–253. [PubMed]
  • Boguski MS, McCormick F. Proteins regulating Ras and its relatives. Nature. 1993 Dec 16;366(6456):643–654. [PubMed]
  • Cerione RA, Zheng Y. The Dbl family of oncogenes. Curr Opin Cell Biol. 1996 Apr;8(2):216–222. [PubMed]
  • Takai Y, Sasaki T, Tanaka K, Nakanishi H. Rho as a regulator of the cytoskeleton. Trends Biochem Sci. 1995 Jun;20(6):227–231. [PubMed]
  • Hart MJ, Eva A, Zangrilli D, Aaronson SA, Evans T, Cerione RA, Zheng Y. Cellular transformation and guanine nucleotide exchange activity are catalyzed by a common domain on the dbl oncogene product. J Biol Chem. 1994 Jan 7;269(1):62–65. [PubMed]
  • Musacchio A, Gibson T, Rice P, Thompson J, Saraste M. The PH domain: a common piece in the structural patchwork of signalling proteins. Trends Biochem Sci. 1993 Sep;18(9):343–348. [PubMed]
  • Gibson TJ, Hyvönen M, Musacchio A, Saraste M, Birney E. PH domain: the first anniversary. Trends Biochem Sci. 1994 Sep;19(9):349–353. [PubMed]
  • Shaw G. The pleckstrin homology domain: an intriguing multifunctional protein module. Bioessays. 1996 Jan;18(1):35–46. [PubMed]
  • Sanders SL, Field CM. Cell division. Bud-site selection is only skin deep. Curr Biol. 1995 Nov 1;5(11):1213–1215. [PubMed]
  • Chant J, Mischke M, Mitchell E, Herskowitz I, Pringle JR. Role of Bud3p in producing the axial budding pattern of yeast. J Cell Biol. 1995 May;129(3):767–778. [PMC free article] [PubMed]
  • Elion EA, Trueheart J, Fink GR. Fus2 localizes near the site of cell fusion and is required for both cell fusion and nuclear alignment during zygote formation. J Cell Biol. 1995 Sep;130(6):1283–1296. [PMC free article] [PubMed]
  • Chenevert J, Valtz N, Herskowitz I. Identification of genes required for normal pheromone-induced cell polarization in Saccharomyces cerevisiae. Genetics. 1994 Apr;136(4):1287–1296. [PMC free article] [PubMed]
  • Chant J. Cell polarity in yeast. Trends Genet. 1994 Sep;10(9):328–333. [PubMed]
  • Drubin DG, Nelson WJ. Origins of cell polarity. Cell. 1996 Feb 9;84(3):335–344. [PubMed]
  • Bender A, Pringle JR. Multicopy suppression of the cdc24 budding defect in yeast by CDC42 and three newly identified genes including the ras-related gene RSR1. Proc Natl Acad Sci U S A. 1989 Dec;86(24):9976–9980. [PMC free article] [PubMed]
  • Chant J, Corrado K, Pringle JR, Herskowitz I. Yeast BUD5, encoding a putative GDP-GTP exchange factor, is necessary for bud site selection and interacts with bud formation gene BEM1. Cell. 1991 Jun 28;65(7):1213–1224. [PubMed]
  • Park HO, Chant J, Herskowitz I. BUD2 encodes a GTPase-activating protein for Bud1/Rsr1 necessary for proper bud-site selection in yeast. Nature. 1993 Sep 16;365(6443):269–274. [PubMed]
  • Way JC, Wang L, Run JQ, Hung MS. Cell polarity and the mechanism of asymmetric cell division. Bioessays. 1994 Dec;16(12):925–931. [PubMed]
  • Sanders SL, Herskowitz I. The BUD4 protein of yeast, required for axial budding, is localized to the mother/BUD neck in a cell cycle-dependent manner. J Cell Biol. 1996 Jul;134(2):413–427. [PMC free article] [PubMed]
  • Simon MN, De Virgilio C, Souza B, Pringle JR, Abo A, Reed SI. Role for the Rho-family GTPase Cdc42 in yeast mating-pheromone signal pathway. Nature. 1995 Aug 24;376(6542):702–705. [PubMed]
  • Zhao ZS, Leung T, Manser E, Lim L. Pheromone signalling in Saccharomyces cerevisiae requires the small GTP-binding protein Cdc42p and its activator CDC24. Mol Cell Biol. 1995 Oct;15(10):5246–5257. [PMC free article] [PubMed]
  • Wittenberg C, Reed SI. Plugging it in: signaling circuits and the yeast cell cycle. Curr Opin Cell Biol. 1996 Apr;8(2):223–230. [PubMed]
  • Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992 Jun 18;357(6379):543–544. [PubMed]
  • Nagai K, Luisi B, Shih D, Miyazaki G, Imai K, Poyart C, De Young A, Kwiatkowsky L, Noble RW, Lin SH, et al. Distal residues in the oxygen binding site of haemoglobin studied by protein engineering. Nature. 329(6142):858–860. [PubMed]
  • Lin SH, Yu NT, Tame J, Shih D, Renaud JP, Pagnier J, Nagai K. Effect of the distal residues on the vibrational modes of the Fe-CO bond in hemoglobin studied by protein engineering. Biochemistry. 1990 Jun 12;29(23):5562–5566. [PubMed]
  • Chevrier B, Schalk C, D'Orchymont H, Rondeau JM, Moras D, Tarnus C. Crystal structure of Aeromonas proteolytica aminopeptidase: a prototypical member of the co-catalytic zinc enzyme family. Structure. 1994 Apr 15;2(4):283–291. [PubMed]
  • Chevrier B, D'Orchymont H, Schalk C, Tarnus C, Moras D. The structure of the Aeromonas proteolytica aminopeptidase complexed with a hydroxamate inhibitor. Involvement in catalysis of Glu151 and two zinc ions of the co-catalytic unit. Eur J Biochem. 1996 Apr 15;237(2):393–398. [PubMed]
  • Dardel F, Davis AL, Laue ED, Perham RN. Three-dimensional structure of the lipoyl domain from Bacillus stearothermophilus pyruvate dehydrogenase multienzyme complex. J Mol Biol. 1993 Feb 20;229(4):1037–1048. [PubMed]
  • Altschul SF, Gish W. Local alignment statistics. Methods Enzymol. 1996;266:460–480. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    NCBI Bookshelf books that cite the current articles.
  • Conserved Domains
    Conserved Domains
    Conserved Domain Database (CDD) records that cite the current articles. Citations are from the CDD source database records (PFAM, SMART).
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...