New measurement for correlation of co-evolution relationship of subsequences in protein

Interdiscip Sci. 2015 Feb 6. doi: 10.1007/s12539-014-0221-1. Online ahead of print.

Abstract

Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues, and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's Correlation Coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) is used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.