• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Oct 1, 1996; 24(19): 3836–3845.
PMCID: PMC146152

Searching databases of conserved sequence regions by aligning protein multiple-alignments.

Abstract

A general searching method for comparing multiple sequence alignments was developed to detect sequence relationships between conserved protein regions. Multiple alignments are treated as sequences of amino acid distributions and aligned by comparing pairs of such distributions. Four different comparison measures were tested and the Pearson correlation coefficient chosen. The method is sensitive, detecting weak sequence relationships between protein families. Relationships are detected beyond the range of conventional sequence database searches, illustrating the potential usefulness of the method. The previously undetected relation between flavoprotein subunits of two oxidoreductase families points to the potential active site in one of the families. The similarity between the bacterial RecA, DnaA and Rad51 protein families reveals a region in DnaA and Rad51 proteins likely to bind and unstack single-stranded DNA. Helix--turn--helix DNA binding domains from diverse proteins are readily detected and shown to be similar to each other. Glycosylasparaginase and gamma-glutamyltransferase enzymes are found to be similar in their proteolytic cleavage sites. The method has been fully implemented on the World Wide Web at URL: http://blocks.fhcrc.org/blocks-bin/LAMAvsearch.

Full Text

The Full Text of this article is available as a PDF (240K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. [PMC free article] [PubMed]
  • Henikoff S, Henikoff JG. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 1991 Dec 11;19(23):6565–6572. [PMC free article] [PubMed]
  • Henikoff S. Detection of Caenorhabditis transposon homologs in diverse organisms. New Biol. 1992 Apr;4(4):382–388. [PubMed]
  • Tatusov RL, Altschul SF, Koonin EV. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091–12095. [PMC free article] [PubMed]
  • Sander C, Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56–68. [PubMed]
  • Smith RF, Smith TF. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. [PMC free article] [PubMed]
  • Sonnhammer EL, Kahn D. Modular arrangement of proteins as inferred from analysis of homology. Protein Sci. 1994 Mar;3(3):482–492. [PMC free article] [PubMed]
  • Attwood TK, Beck ME, Bleasby AJ, Parry-Smith DJ. PRINTS--a database of protein motif fingerprints. Nucleic Acids Res. 1994 Sep;22(17):3590–3596. [PMC free article] [PubMed]
  • Henikoff S, Henikoff JG. Protein family classification based on searching a database of blocks. Genomics. 1994 Jan 1;19(1):97–107. [PubMed]
  • Henikoff S, Henikoff JG. Position-based sequence weights. J Mol Biol. 1994 Nov 4;243(4):574–578. [PubMed]
  • Henikoff JG, Henikoff S. Blocks database and its applications. Methods Enzymol. 1996;266:88–105. [PubMed]
  • Pearson WR, Miller W. Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 1992;210:575–601. [PubMed]
  • Henikoff JG, Henikoff S. Using substitution probabilities to improve position-specific scoring matrices. Comput Appl Biosci. 1996 Apr;12(2):135–143. [PubMed]
  • Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. [PubMed]
  • Altschul SF. Amino acid substitution matrices from an information theoretic perspective. J Mol Biol. 1991 Jun 5;219(3):555–565. [PubMed]
  • Birch-Machin MA, Farnsworth L, Ackrell BA, Cochran B, Jackson S, Bindoff LA, Aitken A, Diamond AG, Turnbull DM. The sequence of the flavoprotein subunit of bovine heart succinate dehydrogenase. J Biol Chem. 1992 Jun 5;267(16):11553–11558. [PubMed]
  • Wierenga RK, Terpstra P, Hol WG. Prediction of the occurrence of the ADP-binding beta alpha beta-fold in proteins, using an amino acid sequence fingerprint. J Mol Biol. 1986 Jan 5;187(1):101–107. [PubMed]
  • Schulz GE, Schirmer RH, Pai EF. FAD-binding site of glutathione reductase. J Mol Biol. 1982 Sep 15;160(2):287–308. [PubMed]
  • Miyano M, Fukui K, Watanabe F, Takahashi S, Tada M, Kanashiro M, Miyake Y. Studies on Phe-228 and Leu-307 recombinant mutants of porcine kidney D-amino acid oxidase: expression, purification, and characterization. J Biochem. 1991 Jan;109(1):171–177. [PubMed]
  • Schröder I, Gunsalus RP, Ackrell BA, Cochran B, Cecchini G. Identification of active site residues of Escherichia coli fumarate reductase by site-directed mutagenesis. J Biol Chem. 1991 Jul 25;266(21):13572–13579. [PubMed]
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed]
  • Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2247–2249. [PMC free article] [PubMed]
  • Clark AJ, Sandler SJ. Homologous genetic recombination: the pieces begin to fall into place. Crit Rev Microbiol. 1994;20(2):125–142. [PubMed]
  • Roca AI, Cox MM. The RecA protein: structure and function. Crit Rev Biochem Mol Biol. 1990;25(6):415–456. [PubMed]
  • Karlin S, Brocchieri L. Evolutionary conservation of RecA genes in relation to protein structure and function. J Bacteriol. 1996 Apr;178(7):1881–1894. [PMC free article] [PubMed]
  • Skarstad K, Boye E. The initiator protein DnaA: evolution, properties and function. Biochim Biophys Acta. 1994 Mar 1;1217(2):111–130. [PubMed]
  • Walker JE, Saraste M, Runswick MJ, Gay NJ. Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J. 1982;1(8):945–951. [PMC free article] [PubMed]
  • Koonin EV. A common set of conserved motifs in a vast variety of putative nucleic acid-dependent ATPases including MCM proteins involved in the initiation of eukaryotic DNA replication. Nucleic Acids Res. 1993 Jun 11;21(11):2541–2547. [PMC free article] [PubMed]
  • Story RM, Steitz TA. Structure of the recA protein-ADP complex. Nature. 1992 Jan 23;355(6358):374–376. [PubMed]
  • Voloshin ON, Wang L, Camerini-Otero RD. Homologous DNA pairing promoted by a 20-amino acid peptide derived from RecA. Science. 1996 May 10;272(5263):868–872. [PubMed]
  • Bramhill D, Kornberg A. Duplex opening by dnaA protein at novel sequences in initiation of replication at the origin of the E. coli chromosome. Cell. 1988 Mar 11;52(5):743–755. [PubMed]
  • Bishop DK, Park D, Xu L, Kleckner N. DMC1: a meiosis-specific yeast homolog of E. coli recA required for recombination, synaptonemal complex formation, and cell cycle progression. Cell. 1992 May 1;69(3):439–456. [PubMed]
  • Shinohara A, Ogawa H, Ogawa T. Rad51 protein involved in repair and recombination in S. cerevisiae is a RecA-like protein. Cell. 1992 May 1;69(3):457–470. [PubMed]
  • Ogawa T, Yu X, Shinohara A, Egelman EH. Similarity of the yeast RAD51 filament to the bacterial RecA filament. Science. 1993 Mar 26;259(5103):1896–1899. [PubMed]
  • Sandler SJ, Satin LH, Samra HS, Clark AJ. recA-like genes from three archaean species with putative protein products similar to Rad51 and Dmc1 proteins of the yeast Saccharomyces cerevisiae. Nucleic Acids Res. 1996 Jun 1;24(11):2125–2132. [PMC free article] [PubMed]
  • Story RM, Bishop DK, Kleckner N, Steitz TA. Structural relationship of bacterial RecA proteins to recombination proteins from bacteriophage T4 and yeast. Science. 1993 Mar 26;259(5103):1892–1896. [PubMed]
  • Dong Q, Sadouk A, van der Lelie D, Taghavi S, Ferhat A, Nuyten JM, Borremans B, Mergeay M, Toussaint A. Cloning and sequencing of IS1086, an Alcaligenes eutrophus insertion element related to IS30 and IS4351. J Bacteriol. 1992 Dec;174(24):8133–8138. [PMC free article] [PubMed]
  • Pabo CO, Sauer RT. Transcription factors: structural families and principles of DNA recognition. Annu Rev Biochem. 1992;61:1053–1095. [PubMed]
  • Dodd IB, Egan JB. Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res. 1990 Sep 11;18(17):5019–5026. [PMC free article] [PubMed]
  • Stalder R, Caspers P, Olasz F, Arber W. The N-terminal domain of the insertion sequence 30 transposase interacts specifically with the terminal inverted repeats of the element. J Biol Chem. 1990 Mar 5;265(7):3757–3762. [PubMed]
  • Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2241–2245. [PMC free article] [PubMed]
  • Green P, Lipman D, Hillier L, Waterston R, States D, Claverie JM. Ancient conserved regions in new gene sequences and the protein databases. Science. 1993 Mar 19;259(5102):1711–1716. [PubMed]
  • Koonin EV, Bork P, Sander C. Yeast chromosome III: new gene functions. EMBO J. 1994 Feb 1;13(3):493–503. [PMC free article] [PubMed]
  • Koonin EV, Tatusov RL, Rudd KE. Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. Proc Natl Acad Sci U S A. 1995 Dec 5;92(25):11921–11925. [PMC free article] [PubMed]
  • Dodd IB, Egan JB. Systematic method for the detection of potential lambda Cro-like DNA-binding regions in proteins. J Mol Biol. 1987 Apr 5;194(3):557–564. [PubMed]
  • Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. [PubMed]
  • Neuwald AF, Green P. Detecting patterns in protein sequences. J Mol Biol. 1994 Jun 24;239(5):698–712. [PubMed]
  • Neuwald AF, Liu JS, Lawrence CE. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 1995 Aug;4(8):1618–1632. [PMC free article] [PubMed]
  • Altschul SF, Lipman DJ. Protein database searches for multiple alignments. Proc Natl Acad Sci U S A. 1990 Jul;87(14):5509–5513. [PMC free article] [PubMed]
  • Brunelle A, Schleif R. Determining residue-base interactions between AraC protein and araI DNA. J Mol Biol. 1989 Oct 20;209(4):607–622. [PubMed]
  • Lewis M, Chang G, Horton NC, Kercher MA, Pace HC, Schumacher MA, Brennan RG, Lu P. Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science. 1996 Mar 1;271(5253):1247–1254. [PubMed]
  • Dubnau J, Struhl G. RNA recognition and translational regulation by a homeodomain protein. Nature. 1996 Feb 22;379(6567):694–699. [PubMed]
  • Grau UM, Trommer WE, Rossmann MG. Structure of the active ternary complex of pig heart lactate dehydrogenase with S-lac-NAD at 2.7 A resolution. J Mol Biol. 1981 Sep 15;151(2):289–307. [PubMed]
  • Abad-Zapatero C, Griffith JP, Sussman JL, Rossmann MG. Refined crystal structure of dogfish M4 apo-lactate dehydrogenase. J Mol Biol. 1987 Dec 5;198(3):445–467. [PubMed]
  • Iwata S, Ohta T. Molecular basis of allosteric activation of bacterial L-lactate dehydrogenase. J Mol Biol. 1993 Mar 5;230(1):21–27. [PubMed]
  • Mononen I, Fisher KJ, Kaartinen V, Aronson NN., Jr Aspartylglycosaminuria: protein chemistry and molecular biology of the most common lysosomal storage disorder of glycoprotein degradation. FASEB J. 1993 Oct;7(13):1247–1256. [PubMed]
  • Fisher KJ, Tollersrud OK, Aronson NN., Jr Cloning and sequence analysis of a cDNA for human glycosylasparaginase. A single gene encodes the subunits of this lysosomal amidase. FEBS Lett. 1990 Dec 10;276(1-2):232–232. [PubMed]
  • Tarentino AL, Quinones G, Hauer CR, Changchien LM, Plummer TH., Jr Molecular cloning and sequence analysis of Flavobacterium meningosepticum glycosylasparaginase: a single gene encodes the alpha and beta subunits. Arch Biochem Biophys. 1995 Jan 10;316(1):399–406. [PubMed]
  • Lieberman MW, Barrios R, Carter BZ, Habib GM, Lebovitz RM, Rajagopalan S, Sepulveda AR, Shi ZZ, Wan DF. gamma-Glutamyl transpeptidase. What does the organization and expression of a multipromoter gene tell us about its functions? Am J Pathol. 1995 Nov;147(5):1175–1185. [PMC free article] [PubMed]
  • Hashimoto W, Suzuki H, Yamamoto K, Kumagai H. Effect of site-directed mutations on processing and activity of gamma-glutamyltranspeptidase of Escherichia coli K-12. J Biochem. 1995 Jul;118(1):75–80. [PubMed]
  • Guan C, Cui T, Rao V, Liao W, Benner J, Lin CL, Comb D. Activation of glycosylasparaginase. Formation of active N-terminal threonine by intramolecular autoproteolysis. J Biol Chem. 1996 Jan 19;271(3):1732–1737. [PubMed]
  • Kaartinen V, Williams JC, Tomich J, Yates JR, 3rd, Hood LE, Mononen I. Glycosaparaginase from human leukocytes. Inactivation and covalent modification with diazo-oxonorvaline. J Biol Chem. 1991 Mar 25;266(9):5860–5869. [PubMed]
  • Gardell SJ, Tate SS. Latent proteinase activity of gamma-glutamyl transpeptidase light subunit. J Biol Chem. 1979 Jun 25;254(12):4942–4945. [PubMed]
  • Kuno T, Matsuda Y, Katunuma N. Characterization of a processing protease that converts the precursor form of gamma-glutamyltranspeptidase to its subunits. Biochem Int. 1984 Apr;8(4):581–588. [PubMed]
  • Taylor WR. A flexible method to align large numbers of biological sequences. J Mol Evol. 1988 Dec;28(1-2):161–169. [PubMed]
  • Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. [PMC free article] [PubMed]
  • Thompson JD, Higgins DG, Gibson TJ. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. [PubMed]
  • Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990 Oct 25;18(20):6097–6100. [PMC free article] [PubMed]
  • Pearson WR. Comparison of methods for searching protein sequence databases. Protein Sci. 1995 Jun;4(6):1145–1160. [PMC free article] [PubMed]
  • Pabo CO, Sauer RT. Protein-DNA recognition. Annu Rev Biochem. 1984;53:293–321. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Compound
    Compound
    PubChem Compound links
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...