• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of prosciprotein sciencecshl presssubscriptionsetoc alertsthe protein societyjournal home
Protein Sci. Feb 1998; 7(2): 445–456.
PMCID: PMC2143933

Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins.


We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.

Full Text

The Full Text of this article is available as a PDF (3.1M).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Altschul SF, Boguski MS, Gish W, Wootton JC. Issues in searching molecular sequence databases. Nat Genet. 1994 Feb;6(2):119–129. [PubMed]
  • Argos P, Rossmann MG. Structural comparisons of heme binding proteins. Biochemistry. 1979 Oct 30;18(22):4951–4960. [PubMed]
  • Brenner SE, Chothia C, Hubbard TJ, Murzin AG. Understanding protein structure: using scop for fold interpretation. Methods Enzymol. 1996;266:635–643. [PubMed]
  • Brenner SE, Hubbard T, Murzin A, Chothia C. Gene duplications in H. influenzae. Nature. 1995 Nov 9;378(6553):140–140. [PubMed]
  • Chothia C, Gerstein M. Protein evolution. How far can sequences diverge? Nature. 1997 Feb 13;385(6617):579–581. [PubMed]
  • Chothia C, Lesk AM. Evolution of proteins formed by beta-sheets. I. Plastocyanin and azurin. J Mol Biol. 1982 Sep 15;160(2):309–323. [PubMed]
  • Chothia C, Lesk AM. Canonical structures for the hypervariable regions of immunoglobulins. J Mol Biol. 1987 Aug 20;196(4):901–917. [PubMed]
  • Falicov A, Cohen FE. A surface of minimum area metric for the structural comparison of proteins. J Mol Biol. 1996 May 24;258(5):871–892. [PubMed]
  • Feng ZK, Sippl MJ. Optimum superimposition of protein structures: ambiguities and implications. Fold Des. 1996;1(2):123–132. [PubMed]
  • Frishman D, Argos P. Knowledge-based protein secondary structure assignment. Proteins. 1995 Dec;23(4):566–579. [PubMed]
  • Gerstein M. A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J Mol Biol. 1997 Dec 12;274(4):562–576. [PubMed]
  • Gerstein M, Levitt M. Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures. Proc Int Conf Intell Syst Mol Biol. 1996;4:59–67. [PubMed]
  • Gerstein M, Schulz G, Chothia C. Domain closure in adenylate kinase. Joints on either side of two helices close like neighboring fingers. J Mol Biol. 1993 Jan 20;229(2):494–501. [PubMed]
  • Gerstein M, Sonnhammer EL, Chothia C. Volume changes in protein evolution. J Mol Biol. 1994 Mar 4;236(4):1067–1078. [PubMed]
  • Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996 Jun;6(3):377–385. [PubMed]
  • Godzik A. The structural alignment between two proteins: is there a unique answer? Protein Sci. 1996 Jul;5(7):1325–1338. [PMC free article] [PubMed]
  • Godzik A, Skolnick J. Flexible algorithm for direct multiple alignment of protein structures and sequences. Comput Appl Biosci. 1994 Dec;10(6):587–596. [PubMed]
  • Gotoh O. Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J Mol Biol. 1996 Dec 13;264(4):823–838. [PubMed]
  • Graves BJ, Crowther RL, Chandran C, Rumberger JM, Li S, Huang KS, Presky DH, Familletti PC, Wolitzky BA, Burns DK. Insight into E-selectin/ligand interaction from the crystal structure and mutagenesis of the lec/EGF domains. Nature. 1994 Feb 10;367(6463):532–538. [PubMed]
  • Grindley HM, Artymiuk PJ, Rice DW, Willett P. Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. J Mol Biol. 1993 Feb 5;229(3):707–721. [PubMed]
  • Harpaz Y, Chothia C. Many of the immunoglobulin superfamily domains in cell adhesion molecules and surface receptors belong to a new structural set which is close to that containing variable domains. J Mol Biol. 1994 May 13;238(4):528–539. [PubMed]
  • Hobohm U, Scharf M, Schneider R, Sander C. Selection of representative protein data sets. Protein Sci. 1992 Mar;1(3):409–417. [PMC free article] [PubMed]
  • Hogue CW, Ohkawa H, Bryant SH. A dynamic look at structures: WWW-Entrez and the Molecular Modeling Database. Trends Biochem Sci. 1996 Jun;21(6):226–229. [PubMed]
  • Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J Mol Biol. 1993 Sep 5;233(1):123–138. [PubMed]
  • Holm L, Sander C. The FSSP database of structurally aligned protein fold families. Nucleic Acids Res. 1994 Sep;22(17):3600–3609. [PMC free article] [PubMed]
  • Holm L, Sander C. Mapping the protein universe. Science. 1996 Aug 2;273(5275):595–603. [PubMed]
  • Holm L, Sander C. New structure--novel fold? Structure. 1997 Feb 15;5(2):165–171. [PubMed]
  • Hubbard TJ, Murzin AG, Brenner SE, Chothia C. SCOP: a structural classification of proteins database. Nucleic Acids Res. 1997 Jan 1;25(1):236–239. [PMC free article] [PubMed]
  • Joshua-Tor L, Xu HE, Johnston SA, Rees DC. Crystal structure of a conserved protease that binds DNA: the bleomycin hydrolase, Gal6. Science. 1995 Aug 18;269(5226):945–950. [PubMed]
  • Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. [PubMed]
  • Laurents DV, Subbiah S, Levitt M. Different protein sequences can give rise to highly similar folds through different stabilizing interactions. Protein Sci. 1994 Nov;3(11):1938–1944. [PMC free article] [PubMed]
  • Leahy DJ, Axel R, Hendrickson WA. Crystal structure of a soluble form of the human T cell coreceptor CD8 at 2.6 A resolution. Cell. 1992 Mar 20;68(6):1145–1162. [PubMed]
  • Lee B, Richards FM. The interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971 Feb 14;55(3):379–400. [PubMed]
  • Lesk AM, Chothia C. Evolution of proteins formed by beta-sheets. II. The core of the immunoglobulin domains. J Mol Biol. 1982 Sep 15;160(2):325–342. [PubMed]
  • Lesk AM, Chothia C. How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J Mol Biol. 1980 Jan 25;136(3):225–270. [PubMed]
  • Lesk AM, Levitt M, Chothia C. Alignment of the amino acid sequences of distantly related proteins using variable gap penalties. Protein Eng. 1986 Oct-Nov;1(1):77–78. [PubMed]
  • Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995 Apr 7;247(4):536–540. [PubMed]
  • Orengo CA, Jones DT, Thornton JM. Protein superfamilies and domain superfolds. Nature. 1994 Dec 15;372(6507):631–634. [PubMed]
  • Orengo CA, Swindells MB, Michie AD, Zvelebil MJ, Driscoll PC, Waterfield MD, Thornton JM. Structural similarity between the pleckstrin homology domain and verotoxin: the problem of measuring and evaluating structural similarity. Protein Sci. 1995 Oct;4(10):1977–1983. [PMC free article] [PubMed]
  • Overington JP, Zhu ZY, Sali A, Johnson MS, Sowdhamini R, Louie GV, Blundell TL. Molecular recognition in protein families: a database of aligned three-dimensional structures of related proteins. Biochem Soc Trans. 1993 Aug;21(3):597–604. [PubMed]
  • Pearson WR. Effective protein sequence comparison. Methods Enzymol. 1996;266:227–258. [PubMed]
  • Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. [PMC free article] [PubMed]
  • Russell RB, Barton GJ. Multiple protein sequence alignment from tertiary structure comparison: assignment of global and residue confidence levels. Proteins. 1992 Oct;14(2):309–323. [PubMed]
  • Sali A, Blundell TL. Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. J Mol Biol. 1990 Mar 20;212(2):403–428. [PubMed]
  • Sali A, Overington JP. Derivation of rules for comparative protein modeling from a database of protein structure alignments. Protein Sci. 1994 Sep;3(9):1582–1596. [PMC free article] [PubMed]
  • Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996;266:141–162. [PubMed]
  • Subbiah S, Laurents DV, Levitt M. Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr Biol. 1993 Mar;3(3):141–148. [PubMed]
  • Tanimura R, Kidera A, Nakamura H. Determinants of protein side-chain packing. Protein Sci. 1994 Dec;3(12):2358–2365. [PMC free article] [PubMed]
  • Taylor WR, Orengo CA. Protein structure alignment. J Mol Biol. 1989 Jul 5;208(1):1–22. [PubMed]
  • Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. [PMC free article] [PubMed]
  • Vingron M, Waterman MS. Sequence alignment and penalty choice. Review of concepts, case studies and implications. J Mol Biol. 1994 Jan 7;235(1):1–12. [PubMed]
  • Vogt G, Etzold T, Argos P. An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol. 1995 Jun 16;249(4):816–831. [PubMed]
  • Zhu ZY, Sali A, Blundell TL. A variable gap penalty function and feature weights for protein 3-D structure comparisons. Protein Eng. 1992 Jan;5(1):43–51. [PubMed]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...