• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of prosciprotein sciencecshl presssubscriptionsetoc alertsthe protein societyjournal home
Protein Sci. Mar 1994; 3(3): 482–492.
PMCID: PMC2142695

Modular arrangement of proteins as inferred from analysis of homology.


The structure of many proteins consists of a combination of discrete modules that have been shuffled during evolution. Such modules can frequently be recognized from the analysis of homology. Here we present a systematic analysis of the modular organization of all sequenced proteins. To achieve this we have developed an automatic method to identify protein domains from sequence comparisons. Homologous domains can then be clustered into consistent families. The method was applied to all 21,098 nonfragment protein sequences in SWISS-PROT 21.0, which was automatically reorganized into a comprehensive protein domain database, ProDom. We have constructed multiple sequence alignments for each domain family in ProDom, from which consensus sequences were generated. These nonreduntant domain consensuses are useful for fast homology searches. Domain organization in ProDom is exemplified for proteins of the phosphoenolpyruvate:sugar phosphotransferase system (PEP:PTS) and for bacterial 2-component regulators. We provide 2 examples of previously unrecognized domain arrangements discovered with the help of ProDom.

Full Text

The Full Text of this article is available as a PDF (1.6M).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Albright LM, Huala E, Ausubel FM. Prokaryotic signal transduction mediated by sensor and regulator protein pairs. Annu Rev Genet. 1989;23:311–336. [PubMed]
  • Altschul SF. Amino acid substitution matrices from an information theoretic perspective. J Mol Biol. 1991 Jun 5;219(3):555–565. [PubMed]
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. [PubMed]
  • Argos P, Vingron M. Sensitivity comparison of protein amino acid sequences. Methods Enzymol. 1990;183:352–365. [PubMed]
  • Aricò B, Scarlato V, Monack DM, Falkow S, Rappuoli R. Structural and genetic analysis of the bvg locus in Bordetella species. Mol Microbiol. 1991 Oct;5(10):2481–2491. [PubMed]
  • Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1992 May 11;20 (Suppl):2013–2018. [PMC free article] [PubMed]
  • Baron M, Norman DG, Campbell ID. Protein modules. Trends Biochem Sci. 1991 Jan;16(1):13–17. [PubMed]
  • Bengio Y, Pouliot Y. Efficient recognition of immunoglobulin domains from amino acid sequences using a neural network. Comput Appl Biosci. 1990 Oct;6(4):319–324. [PubMed]
  • Bork P. Recognition of functional regions in primary structures using a set of property patterns. FEBS Lett. 1989 Oct 23;257(1):191–195. [PubMed]
  • Bork P. Shuffled domains in extracellular proteins. FEBS Lett. 1991 Jul 29;286(1-2):47–54. [PubMed]
  • Bork P, Ouzounis C, Sander C, Scharf M, Schneider R, Sonnhammer E. Comprehensive sequence analysis of the 182 predicted open reading frames of yeast chromosome III. Protein Sci. 1992 Dec;1(12):1677–1690. [PMC free article] [PubMed]
  • Chiang GG, Schaefer MR, Grossman AR. Complementation of a red-light-indifferent cyanobacterial mutant. Proc Natl Acad Sci U S A. 1992 Oct 15;89(20):9415–9419. [PMC free article] [PubMed]
  • Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992 Jun 18;357(6379):543–544. [PubMed]
  • Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 1988 Nov 25;16(22):10881–10890. [PMC free article] [PubMed]
  • Doolittle RF. Stein and Moore Award address. Reconstructing history with amino acid sequences. Protein Sci. 1992 Feb;1(2):191–200. [PMC free article] [PubMed]
  • Dorit RL, Schoenbach L, Gilbert W. How big is the universe of exons? Science. 1990 Dec 7;250(4986):1377–1382. [PubMed]
  • Feng DF, Doolittle RF. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351–360. [PubMed]
  • Henikoff S, Henikoff JG. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 1991 Dec 11;19(23):6565–6572. [PMC free article] [PubMed]
  • Henikoff S, Wallace JC, Brown JP. Finding protein similarities with nucleotide sequence databases. Methods Enzymol. 1990;183:111–132. [PubMed]
  • Higgins DG. Sequence ordinations: a multivariate analysis approach to analysing large sequence data sets. Comput Appl Biosci. 1992 Feb;8(1):15–22. [PubMed]
  • Kahn D, Ditta G. Modular structure of FixJ: homology of the transcriptional activator domain with the -35 binding domain of sigma factors. Mol Microbiol. 1991 Apr;5(4):987–997. [PubMed]
  • Karlin S, Altschul SF. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. [PMC free article] [PubMed]
  • Karlin S, Bucher P, Brendel V, Altschul SF. Statistical methods and insights for protein and DNA sequences. Annu Rev Biophys Biophys Chem. 1991;20:175–203. [PubMed]
  • Kenan DJ, Query CC, Keene JD. RNA recognition: towards identifying determinants of specificity. Trends Biochem Sci. 1991 Jun;16(6):214–220. [PubMed]
  • Miklos GL, Campbell HD. The evolution of protein domains and the organizational complexities of metazoans. Curr Opin Genet Dev. 1992 Dec;2(6):902–906. [PubMed]
  • Parkinson JS, Kofoid EC. Communication modules in bacterial signaling proteins. Annu Rev Genet. 1992;26:71–112. [PubMed]
  • Pongor S, Skerl V, Cserzö M, Hátsági Z, Simon G, Bevilacqua V. The SBASE domain library: a collection of annotated protein segments. Protein Eng. 1993 Jun;6(4):391–395. [PubMed]
  • Saier MH, Jr, Reizer J. Proposed uniform nomenclature for the proteins and protein domains of the bacterial phosphoenolpyruvate: sugar phosphotransferase system. J Bacteriol. 1992 Mar;174(5):1433–1438. [PMC free article] [PubMed]
  • Smith RF, Smith TF. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. [PMC free article] [PubMed]
  • Taylor WR. Hierarchical method to align large numbers of biological sequences. Methods Enzymol. 1990;183:456–474. [PubMed]
  • Utsumi R, Katayama S, Ikeda M, Igaki S, Nakagawa H, Miwa A, Taniguchi M, Noda M. Cloning and sequence analysis of the evgAS genes involved in signal transduction of Escherichia coli K-12. Nucleic Acids Symp Ser. 1992;(27):149–150. [PubMed]
  • van Heel M. A new family of powerful multivariate statistical sequence analysis techniques. J Mol Biol. 1991 Aug 20;220(4):877–887. [PubMed]
  • Wu C, Whitson G, McLarty J, Ermongkonchai A, Chang TC. Protein classification artificial neural system. Protein Sci. 1992 May;1(5):667–677. [PMC free article] [PubMed]
  • Wu LF, Tomich JM, Saier MH., Jr Structure and evolution of a multidomain multiphosphoryl transfer protein. Nucleotide sequence of the fruB(HI) gene in Rhodobacter capsulatus and comparisons with homologous genes from other organisms. J Mol Biol. 1990 Jun 20;213(4):687–703. [PubMed]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...