Modeling of globular proteins. A distance-based data search procedure for the construction of insertion/deletion regions and Pro----non-Pro mutations

J Mol Biol. 1990 Dec 20;216(4):991-1016. doi: 10.1016/S0022-2836(99)80016-3.

Abstract

A distance-based database search scheme is proposed for modeling Pro----in non-Pro and insertion/deletion regions of homologous globular proteins up to six residues in length. In the first step, geometric descriptors, the number of residues involved and target distances corresponding to the separation of C alpha atom positions adjacent to the "missing" segment, are chosen. In the second step, a database of high-resolution X-ray structures is scanned for segments with similar descriptors and selected segments are binned according to conformational type. In the third and fourth steps, the selected conformations are docked into the protein, and geometric and energetic criteria are used to determine their viability as segment models. The fifth step consists of an interaction scheme in which the geometric descriptors are redefined. This compensates for the use of a limited database and/or for the use of a poor original protein model adjacent to the missing segment. The procedure has been tested on Pro----non-Pro mutations in the homologous proteins penicillopepsin and endothiapepsin, and on the insertion/deletion regions of the homologs penicillopepsin and endothiapepsin, trypsin and gamma-chymotrypsin and hen and human lysozyme. The test cases represent a wide variety of secondary structural elements (helix, sheet, turn and coil) and insertion/deletion lengths (0 to 4 residues). It is shown that 79% of the test cases are accurately modeled (within 0.54 A root-mean-square (r.m.s.) deviation for main-chain atoms) using the proposed scheme. Failure of the scheme (main-chain atom r.m.s. deviations greater than 1.29 A) in 21% of the cases appears to be related to the presence of infrequently observed conformations or locally unique folds of the target proteins with respect to the database (18% of the test cases); the remaining 3% are unexplained. Geometric and energetic criteria are able to discriminate between trial conformations that correspond to the X-ray structures and those that are different in 97% of the conformations generated by the distance-weighted database search scheme. The scheme is shown to be relatively insensitive to uncertainty in the template co-ordinates, since the geometric descriptors were taken from the homologous protein (r.m.s. deviations in the position of descriptors range from 0.18 to 1.35 A for the accurately modeled test cases). It is demonstrated that the scheme can be used to correct local sequence misalignments.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Aspartic Acid Endopeptidases
  • Chymotrypsin
  • Crystallography
  • Dipeptides
  • Hydrogen Bonding
  • Models, Molecular
  • Models, Structural
  • Molecular Sequence Data
  • Molecular Structure
  • Mutation
  • Proline / physiology*
  • Protein Conformation*
  • Proteins / chemistry*
  • Structure-Activity Relationship
  • Thermodynamics
  • Trypsin

Substances

  • Dipeptides
  • Proteins
  • Proline
  • Chymotrypsin
  • Trypsin
  • Aspartic Acid Endopeptidases
  • Endothia aspartic proteinase