• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of prosciprotein sciencecshl presssubscriptionsetoc alertsthe protein societyjournal home
Protein Sci. Mar 1995; 4(3): 506–520.
PMCID: PMC2143076

An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins.


With a growing number of structures available in the Brookhaven Protein Data Bank, automatic methods for domain identification are required for the construction of databases. Domains are considered to be clusters of secondary structure elements. Thus, helices and strands are first clustered using intersecondary structural distances between C alpha positions, and dendrograms based on this distance measure are used to identify domains. Individual domains are recognized by a disjoint factor, which enables the automatic identification and classification into disjoint, interacting, and conjoint domains. Application to a database of 83 protein families and 18 unique structures shows that the approach provides an effective delineation of boundaries and identifies those proteins that can be considered as a single domain. A quantitative estimate of the interaction between domains has been proposed. The database of protein domains is a useful tool for understanding protein folding, for recognizing protein folds, and for understanding structure-activity relationships.

Full Text

The Full Text of this article is available as a PDF (8.7M).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Argos P. An investigation of oligopeptides linking domains in protein tertiary structures and possible candidates for general gene fusion. J Mol Biol. 1990 Feb 20;211(4):943–958. [PubMed]
  • Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. [PubMed]
  • Berry MB, Meador B, Bilderback T, Liang P, Glaser M, Phillips GN., Jr The closed conformation of a highly flexible protein: the structure of E. coli adenylate kinase with bound AMP and AMPPNP. Proteins. 1994 Jul;19(3):183–198. [PubMed]
  • Blundell TL, Jenkins JA, Sewell BT, Pearl LH, Cooper JB, Tickle IJ, Veerapandian B, Wood SP. X-ray analyses of aspartic proteinases. The three-dimensional structure at 2.1 A resolution of endothiapepsin. J Mol Biol. 1990 Feb 20;211(4):919–941. [PubMed]
  • Crippen GM. The tree structural organization of proteins. J Mol Biol. 1978 Dec 15;126(3):315–332. [PubMed]
  • Dixon MM, Nicholson H, Shewchuk L, Baase WA, Matthews BW. Structure of a hinge-bending bacteriophage T4 lysozyme mutant, Ile3-->Pro. J Mol Biol. 1992 Oct 5;227(3):917–933. [PubMed]
  • Evans SV. SETOR: hardware-lighted three-dimensional solid model representations of macromolecules. J Mol Graph. 1993 Jun;11(2):134–128. [PubMed]
  • Fitch WM, Margoliash E. Construction of phylogenetic trees. Science. 1967 Jan 20;155(3760):279–284. [PubMed]
  • Gerstein M, Lesk AM, Chothia C. Structural mechanisms for domain movements in proteins. Biochemistry. 1994 Jun 7;33(22):6739–6749. [PubMed]
  • Go M. Correlation of DNA exonic regions with protein structural units in haemoglobin. Nature. 1981 May 7;291(5810):90–92. [PubMed]
  • Holm L, Sander C. Parser for protein folding units. Proteins. 1994 Jul;19(3):256–268. [PubMed]
  • Hurley JH, Thorsness PE, Ramalingam V, Helmers NH, Koshland DE, Jr, Stroud RM. Structure of a bacterial enzyme regulated by phosphorylation, isocitrate dehydrogenase. Proc Natl Acad Sci U S A. 1989 Nov;86(22):8635–8639. [PMC free article] [PubMed]
  • Janin J, Chothia C. Domains in proteins: definitions, location, and structural principles. Methods Enzymol. 1985;115:420–430. [PubMed]
  • Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. [PubMed]
  • Kamphuis IG, Kalk KH, Swarte MB, Drenth J. Structure of papain refined at 1.65 A resolution. J Mol Biol. 1984 Oct 25;179(2):233–256. [PubMed]
  • Karplus PA, Schulz GE. Refined structure of glutathione reductase at 1.54 A resolution. J Mol Biol. 1987 Jun 5;195(3):701–729. [PubMed]
  • Kikuchi T, Némethy G, Scheraga HA. Prediction of the location of structural domains in globular proteins. J Protein Chem. 1988 Aug;7(4):427–471. [PubMed]
  • Lesk AM, Chothia C. Elbow motion in the immunoglobulins involves a molecular ball-and-socket joint. Nature. 1988 Sep 8;335(6186):188–190. [PubMed]
  • Levitt M, Chothia C. Structural patterns in globular proteins. Nature. 1976 Jun 17;261(5561):552–558. [PubMed]
  • Louie GV, Brownlie PD, Lambert R, Cooper JB, Blundell TL, Wood SP, Warren MJ, Woodcock SC, Jordan PM. Structure of porphobilinogen deaminase reveals a flexible multidomain polymerase with a single catalytic site. Nature. 1992 Sep 3;359(6390):33–39. [PubMed]
  • Overington JP, Zhu ZY, Sali A, Johnson MS, Sowdhamini R, Louie GV, Blundell TL. Molecular recognition in protein families: a database of aligned three-dimensional structures of related proteins. Biochem Soc Trans. 1993 Aug;21(3):597–604. [PubMed]
  • Phillips DC. The three-dimensional structure of an enzyme molecule. Sci Am. 1966 Nov;215(5):78–90. [PubMed]
  • Rao ST, Rossmann MG. Comparison of super-secondary structures in proteins. J Mol Biol. 1973 May 15;76(2):241–256. [PubMed]
  • Rose GD. Hierarchic organization of domains in globular proteins. J Mol Biol. 1979 Nov 5;134(3):447–470. [PubMed]
  • Rufino SD, Blundell TL. Structure-based identification and clustering of protein families and superfamilies. J Comput Aided Mol Des. 1994 Feb;8(1):5–27. [PubMed]
  • Sali A, Veerapandian B, Cooper JB, Moss DS, Hofmann T, Blundell TL. Domain flexibility in aspartic proteinases. Proteins. 1992 Feb;12(2):158–170. [PubMed]
  • Sternberg MJ, Thornton JM. On the conformation of proteins: towards the prediction of strand arrangements in beta-pleated sheets. J Mol Biol. 1977 Jun 25;113(2):401–418. [PubMed]
  • Weiss MS, Schulz GE. Structure of porin refined at 1.8 A resolution. J Mol Biol. 1992 Sep 20;227(2):493–509. [PubMed]
  • Wetlaufer DB. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci U S A. 1973 Mar;70(3):697–701. [PMC free article] [PubMed]
  • Wodak SJ, Janin J. Location of structural domains in protein. Biochemistry. 1981 Nov 10;20(23):6544–6552. [PubMed]
  • Zehfus MH. Binary discontinuous compact protein domains. Protein Eng. 1994 Mar;7(3):335–340. [PubMed]
  • Zehfus MH, Rose GD. Compact units in proteins. Biochemistry. 1986 Sep 23;25(19):5759–5765. [PubMed]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...