• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of prosciprotein sciencecshl presssubscriptionsetoc alertsthe protein societyjournal home
Protein Sci. Dec 1994; 3(12): 2366–2377.
PMCID: PMC2142768

Residue-residue contact substitution probabilities derived from aligned three-dimensional structures and the identification of common folds.


We report the derivation of scores that are based on the analysis of residue-residue contact matrices from 443 3-dimensional structures aligned structurally as 96 families, which can be used to evaluate sequence-structure matches. Residue-residue contacts and the more than 3 x 10(6) amino acid substitutions that take place between pairs of these contacts at aligned positions within each family of structures have been tabulated and segregated according to the solvent accessibility of the residues involved. Contact maps within a family of structures are shown to be highly conserved (approximately 75%) even when the sequence identity is approaching 10%. In a comparison involving a globin structure and the search of a sequence databank (> 21,000 sequences), the contact probability scores are shown to provide a very powerful secondary screen for the top scoring sequence-structure matches, where between 69% and 84% of the unrelated matches are eliminated. The search of an aligned set of 2 globins against a sequence databank and the subsequent residue contact-based evaluation of matches locates all 618 globin sequences before the first non-globin match. From a single bacterial serine proteinase structure, the structural template approach coupled with residue-residue contact substitution data lead to the detection of the mammalian serine proteinase family among the top matches in the search of a sequence databank.

Full Text

The Full Text of this article is available as a PDF (1.0M).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Barton GJ, Sternberg MJ. Flexible protein sequence patterns. A sensitive method to detect weak structural similarities. J Mol Biol. 1990 Mar 20;212(2):389–402. [PubMed]
  • Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. [PubMed]
  • Bowie JU, Lüthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164–170. [PubMed]
  • Fermi G, Perutz MF, Shaanan B, Fourme R. The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. J Mol Biol. 1984 May 15;175(2):159–174. [PubMed]
  • Fujinaga M, Delbaere LT, Brayer GD, James MN. Refined structure of alpha-lytic protease at 1.7 A resolution. Analysis of hydrogen bonding and solvent structure. J Mol Biol. 1985 Aug 5;184(3):479–502. [PubMed]
  • Godzik A, Kolinski A, Skolnick J. Topology fingerprint approach to the inverse protein folding problem. J Mol Biol. 1992 Sep 5;227(1):227–238. [PubMed]
  • Hubbard TJ, Blundell TL. Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. Protein Eng. 1987 Jun;1(3):159–171. [PubMed]
  • Johnson MS, Overington JP, Blundell TL. Alignment and searching for common protein folds using a data bank of structural templates. J Mol Biol. 1993 Jun 5;231(3):735–752. [PubMed]
  • Kuntz ID. An approach to the tertiary structure of globular proteins. J Am Chem Soc. 1975 Jul 23;97(15):4362–4366. [PubMed]
  • Lüthy R, McLachlan AD, Eisenberg D. Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins. 1991;10(3):229–239. [PubMed]
  • Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. [PubMed]
  • Nishikawa K, Ooi T. Comparison of homologous tertiary structures of proteins. J Theor Biol. 1974 Feb;43(2):351–374. [PubMed]
  • Overington J, Donnelly D, Johnson MS, Sali A, Blundell TL. Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci. 1992 Feb;1(2):216–226. [PMC free article] [PubMed]
  • Rodionov MA, Galaktionov SG, Akhrem AA. Predskazanie stepeni éksponirovannosti aminokislotnykh ostatkov v globuliarnykh belkakh. Dokl Akad Nauk SSSR. 1981;261(3):756–759. [PubMed]
  • Sippl MJ. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol. 1990 Jun 20;213(4):859–883. [PubMed]
  • Steigemann W, Weber E. Structure of erythrocruorin in different ligand states refined at 1.4 A resolution. J Mol Biol. 1979 Jan 25;127(3):309–338. [PubMed]
  • Sali A, Overington JP, Johnson MS, Blundell TL. From comparisons of protein sequences and structures to protein modelling and design. Trends Biochem Sci. 1990 Jun;15(6):235–240. [PubMed]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...