Defining and computing optimum RMSD for gapped and weighted multiple-structure alignment

IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct-Dec;5(4):525-33. doi: 10.1109/TCBB.2008.92.

Abstract

Pairwise structure alignment commonly uses root mean square deviation (RMSD) to measure the structural similarity, and methods for optimizing RMSD are well established. We extend RMSD to weighted RMSD for multiple structures. By using multiplicative weights, we show that weighted RMSD for all pairs is the same as weighted RMSD to an average of the structures. Thus, using RMSD or weighted RMSD implies that the average is a consensus structure. Although we show that in general, the two tasks of finding the optimal translations and rotations for minimizing weighted RMSD cannot be separated for multiple structures like they can for pairs, an inherent difficulty and a fact ignored by previous work, we develop a near-linear iterative algorithm to converge weighted RMSD to a local minimum. 10,000 experiments of gapped alignment done on each of 23 protein families from HOMSTRAD (where each structure starts with a random translation and rotation) converge rapidly to the same minimum. Finally we propose a heuristic method to iteratively remove the effect of outliers and find well-aligned positions that determine the structural conserved region by modeling B-factors and deviations from the average positions as weights and iteratively assigning higher weights to better aligned atoms.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Data Interpretation, Statistical
  • Least-Squares Analysis
  • Molecular Sequence Data
  • Pattern Recognition, Automated / methods*
  • Proteins / chemistry*
  • Proteins / ultrastructure*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins