Ferreting out correlations from trajectory data

J Chem Phys. 2011 Dec 14;135(22):225103. doi: 10.1063/1.3666007.

Abstract

Thermally driven materials characterized by complex energy landscapes, such as proteins, exhibit motions on a broad range of space and time scales. Principal component analysis (PCA) is often used to extract modes of motion from protein trajectory data that correspond to coherent, functional motions. In this work, two other methods, maximum covariance analysis (MCA) and canonical correlation analysis (CCA) are formulated in a way appropriate to analyze protein trajectory data. Both methods partition the coordinates used to describe the system into two sets (two measurement domains) and inquire as to the correlations that may exist between them. MCA and CCA provide rotations of the original coordinate system that successively maximize the covariance (MCA) or correlation (CCA) between modes of each measurement domain under suitable constraint conditions. We provide a common framework based on the singular value decomposition of appropriate matrices to derive MCA and CCA. The differences between and strengths and weaknesses of MCA and CCA are discussed and illustrated. The application presented here examines the correlation between the backbone and side chain of the peptide met-enkephalin as it fluctuates between open conformations, found in solution, to closed conformations appropriate to when it is bound to its receptor. Difficulties with PCA carried out in Cartesian coordinates are found and motivate a formulation in terms of dihedral angles for the backbone atoms and selected atom distances for the side chains. These internal coordinates are a more reliable basis for all the methods explored here. MCA uncovers a correlation between combinations of several backbone dihedral angles and selected side chain atom distances of met-enkephalin. It could be used to suggest residues and dihedral angles to focus on to favor specific side chain conformers. These methods could be applied to proteins with domains that, when they rearrange upon ligand binding, may have correlated functional motions or, for multi-subunit proteins, may exhibit correlated subunit motions.