- Journal List
- BMC Struct Biol
- v.9; 2009
- PMC2670300

# Analysing the origin of long-range interactions in proteins using lattice models

^{1}Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel

^{2}The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, 52900, Israel

^{}Corresponding author.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## Abstract

### Background

Long-range communication is very common in proteins but the physical basis of this phenomenon remains unclear. In order to gain insight into this problem, we decided to explore whether long-range interactions exist in lattice models of proteins. Lattice models of proteins have proven to capture some of the basic properties of real proteins and, thus, can be used for elucidating general principles of protein stability and folding.

### Results

Using a computational version of double-mutant cycle analysis, we show that long-range interactions emerge in lattice models even though they are not an input feature of them. The coupling energy of both short- and long-range pairwise interactions is found to become more positive (destabilizing) in a linear fashion with increasing 'contact-frequency', an entropic term that corresponds to the fraction of states in the conformational ensemble of the sequence in which the pair of residues is in contact. A mathematical derivation of the linear dependence of the coupling energy on 'contact-frequency' is provided.

### Conclusion

Our work shows how 'contact-frequency' should be taken into account in attempts to stabilize proteins by introducing (or stabilizing) contacts in the native state and/or through 'negative design' of non-native contacts.

## Background

There is a wealth of information that indicates that distant sites in proteins are often coupled to each other energetically. Evidence for such coupling initially emerged through studies of allosteric regulation of proteins [1] when it became clear that allosteric control is often achieved by ligand binding-induced conformational changes that are propagated from one ligand binding site to other distant sites. Later, it became possible to identify distant sites in proteins that are coupled to each other energetically by protein engineering through the use of the double-mutant cycle (DMC) method [for review see ref. [2]]. It has become clear from many such DMC studies that distant sites in proteins are often coupled to each other in a weak but significant manner [for review see ref. [3]]. More recently, it has become possible to demonstrate long-range coupling experimentally also by employing NMR methods [4]. Finally, computational methods have also indicated the presence of long-range communication in proteins. One class of computational methods is based on detection of co-evolving residues in multiple sequence alignment data. Such methods were originally developed in order to detect residues that are in physical contact [5,6] but, more recently, have been used to reveal long-range pathways of energetic connectivity in proteins [7-9]. Long-range communication in proteins has also been revealed in computational studies based on normal mode analysis and its coarse-grained versions in which correlations between fluctuations of distant residues are detected [10-13].

Despite the wealth of evidence indicating that long-range communication is extremely common in proteins, the physical basis of this phenomenon is still unclear. In addition, there are some uncertainties associated with many of the computational and experimental methods used to detect such long-range interactions. For example, it is not clear whether correlated mutations at distant positions reflect long-range coupling or common ancestry [14-17]. In the case of the DMC method, there is always a concern that the calculated coupling energy reflects a reorganization energy in one or more of the mutants in the cycle and not the true pairwise interaction energy [18]. Given these reasons, we decided to explore whether long-range interactions exist in 2D and 3D lattice models of proteins although such interactions are not an input feature of them. Simple lattice models of proteins have proven to capture some of the basic properties of real proteins and, although they ignore many important details, they have been used successfully for elucidating general principles of protein folding and stability [19-26]. Here, we show by invoking computational DMC analysis that long-range interactions are also common in lattice models of proteins. Hence, our results indicate that long-range communication in proteins may also occur as a result of interactions in the non-native states and not just *via *pathways by which information is transmitted through the native state structure as other computational methods suggest [7,12]. Our analysis also shows that the values of the coupling energies of both short- and long-range interactions have a linear dependence on their respective contact frequencies in the conformational ensemble.

### Theory

The energy of a sequence in a specific lattice conformation, E(C), is calculated by summing all the pairwise contact energies, e_{ij }(see Table Table1),1), between neighboring lattice points excluding consecutive residues in the sequence, as follows:

where |r_{i }- r_{j}| is the distance in lattice units between residues i and j that are separated in sequence by at least two residues and $\delta (x)=\{\begin{array}{l}\begin{array}{cc}1& x=1\end{array}\\ \begin{array}{cc}0& otherwise\end{array}\end{array}$. The free energy of folding, Δ*G*, of the native conformation of a sequence was calculated using [21]:

where *P*_{N }is the probability that the chain is in its native state. This probability is given by: ${P}_{N}=\frac{{e}^{-E(N)/kT}}{Q}$, where $\text{Q}={\displaystyle \sum _{\text{C}\in \text{Z}}{\text{e}}^{-\text{E}(\text{C})/kT}}$ (Z is the ensemble of all possible conformations on the relevant lattice), *E*(N) is the energy of the native conformation, *T *is the temperature and *k *is the Boltzmann constant. Eq. (2) can be written as follows: $\Delta G=-kT\mathrm{ln}\left(\frac{{\text{e}}^{-\text{E}(\text{N})/kT}}{Q-{\text{e}}^{-\text{E}(\text{N})/kT}}\right)$. It, therefore, follows that:

*G*=

*E*(N) +

*kT*ln(

*Q*- e

^{-E(N)/kT})

We designate the sum over all the non-native conformations by *Q*' where *Q*' = *Q *- e^{-E(N)/kT}.

The strength of a pairwise interaction can be estimated from DMC calculations or by computing the perturbation energy, ΔΔ*G*_{per }= Δ*G*_{wt }- Δ*G*_{m}, where Δ*G*_{wt }and Δ*G*_{m }are the respective free energies of the wild-type native conformation before and after a particular pairwise interaction is removed ('turned off') without affecting any other interactions. For simplicity, the derivation that follows is for this measure termed 'perturbation energy' and not for the coupling energy calculated from DMC that involves more algebraic terms (see Methods). It is important to note, however, that the perturbation energy of a pairwise interaction is almost equal to the coupling energy calculated from DMC for that interaction since in the DMC method the effects of the different mutations on other interactions tend to cancel out [18]. We show in the Results that our derivation holds for perturbation energies as well as for coupling energies that, in contrast with the perturbation energies, can be determined in experiments. The perturbation energy can be expressed, as follows:

*G*

_{per }=

*E*

_{c }-

*kT*ln(

*Q*'

_{m}/

*Q*'

_{wt})

where E_{c }is the energy of the contact that was removed. It is convenient to partition the sum of all the non-native conformations of the mutant, Q'_{m}, into the sets of C_{1 }and C_{2 }conformations (|C_{1}| + |C_{2}| = N) in which the interaction being targeted is either absent or present, respectively, as follows: Q'_{m }= $\sum _{\text{C}\in {\text{C}}_{\text{1}}}{\text{e}}^{-\text{E(C)/}kT}}+{\displaystyle \sum _{\text{C}\in {\text{C}}_{2}}{\text{e}}^{-\text{(E(C)}-\lambda )/kT}$, where *λ *is the contact energy of the perturbed interaction (Table (Table1).1). The expression for Q'_{m }can be rewritten, as follows:

$\begin{array}{c}{\text{Q'}}_{\text{m}}={\displaystyle \sum _{\text{C}\in {\text{C}}_{\text{1}}}{\text{e}}^{-\text{E(C)/}kT}}+{\displaystyle \sum _{\text{C}\in {\text{C}}_{2}}{\text{e}}^{-\text{E(C)/}kT}}({\text{e}}^{\lambda /kT}+1-1)\\ ={\displaystyle \sum _{\text{i}=1}^{\text{N}}{\text{e}}^{-{\text{E}}_{\text{i}}/kT}}+{\displaystyle \sum _{\text{C}\in {\text{C}}_{2}}{\text{e}}^{-\text{E}(\text{C})/kT}}({\text{e}}^{\lambda /kT}-1)\end{array}$ .

Eq.(4) can, therefore, be rewritten as:

Taylor series expansion (ln(1+x) ≈ x for |*x*| < 1) of Eq. (5) and multiplication of the resulting expression by $\frac{1}{\text{Q}\left|\text{Z}\right|}/\frac{1}{\text{Q}\left|\text{Z}\right|}$ (= 1) yields:

The Boltzmann weighted contact frequency, BWCF(i, j), is defined as: $({\displaystyle \sum _{\text{C}\in \text{Z}}\frac{{e}^{-\text{E}(\text{C})/\text{kT}}}{\text{Q}}}\delta ({\left|{\text{r}}_{\text{i}}-{\text{r}}_{\text{j}}\right|}_{\text{c}}))/\left|\text{Z}\right|$, where i and j are two positions in the sequence and each occurrence of a contact is multiplied by the Boltzmann weight of the conformation (C) in which it occurs. Hence, inspection of Eq. (6) shows that plots of the perturbation energy (or coupling energy) as a function of BWCF(i, j) are expected to be approximately linear with a slope that is a function of λ.

## Results and discussion

DMC have been used extensively to determine experimentally the strengths of various pairwise interactions in proteins [2]. Here, DMC were invoked in order to evaluate, for the first time to the best of our knowledge, coupling energies between all possible pairs of positions in 2D and 3D lattice models of proteins (Figure (Figure1).1). Evidence for correlations between distant sites in lattice models has been reported before in the context of protein aggregation [27]. The distributions of the values of the coupling energies for all possible pairs of positions in the different native states of 10 sequences with 16 residues on a 2D lattice with full enumeration and 10 sequences with 27 residues on a 3 × 3 × 3 3D lattice are shown in Figure Figure2a2a and and2b,2b, respectively. It can be seen that the values of the coupling energies for pairs in contact are mostly negative whereas the values of the coupling energies for pairs that are not in contact are mostly (but not exclusively) positive and smaller in absolute terms. Pairs that are in contact in a given native conformation could, therefore, be identified with high confidence using this procedure.

**Scheme of a double-mutant cycle for a 2D lattice model protein**. Two residues, i and j, are mutated (the mutations are designated by B on a dark background) either singly or in combination. Δ

*G*(i,j→B,j) and Δ

*G*(i,B→B,B) are

**...**

**Distributions of the values of the pairwise coupling energies for all possible pairs of positions in sequences with different native states on 2D and 3D lattices**. The values of the coupling energies for all possible pairs of positions in 10 sequences

**...**

The fraction of conformations in the ensemble in which residues at two positions in a sequence are in contact is termed the 'contact frequency'. The 'contact frequency' is not defined for pairs of consecutive positions in a sequence since the interaction energy of such pairs is by definition zero (see Eq. (1)). It is also not defined for pairs of even or odd positions in a sequence since they cannot interact on a square or cubic lattice and, thus, have a contact-frequency of zero. Therefore, only pairs of residues with non-zero values of contact-frequency are considered here. A more accurate measure of the frequency of a contact in a conformational ensemble is the 'Boltzmann weighted contact frequency', BWCF, where the occurrence of each contact is multiplied by the Boltzmann weight of the conformation (C) in which it is found (see Theory). In the Theory section it was shown that the strength of a pairwise interaction is expected to have a linear dependence on its BWCF. Such linear plots of different measures of the strength of pairwise interactions as a function of BWCF are depicted in Figure Figure33 for several representative examples of lattice models.

**Plots of different measures of the strength of pairwise interactions as a function of measures of contact frequency for several representative examples of lattice models**. In panels a and b, the average coupling energies, <ΔΔ

*G*

_{int}

**...**

In the first example (Figure (Figure3a3a and and3b),3b), a set of sequences with a length, L, of 30 residues that have the same native structure was constructed (such structure-based sequence sets are designated SBSS) and the coupling energy was determined for every possible pair of positions in each sequence. The average value of the coupling energy for each pair of positions in the SBSS was then calculated in order to improve the signal-to-noise ratio. In this example, only conformations that fit into a 5 × 6 lattice were considered. It may be seen that a strong linear correlation is found between the average coupling energy for each pair of positions in the SBSS and the corresponding average BWCF index. This correlation holds for pairs of residues that form native contacts (Figure (Figure3a,3a, *r *= 0.78; *P*-value = 5.5 × 10^{-5}) and also, surprisingly, for pairs of residues that are not in contact in this particular native conformation (Figure (Figure3b,3b, *r *= 0.87; *P*-value = 1.3 × 10^{-55}). Such linear correlations (with average correlation coefficients of about 0.84 (± 0.05) for the non-contacting pairs and 0.62 (± 0.15) for the pairs in contact) were also found for SBSS that correspond to 8 other native conformations (i.e. 2 SBSS for sequences with L = 30 on a 5 × 6 lattice, 4 SBSS for sequences with L = 25 on a 5 × 5 lattice and 2 SBSS for sequences with L = 25 on a 5 × 6 lattice) when only the conformations that fit into the lattice were considered.

In the second example, the coupling (Figure (Figure3c)3c) and perturbation (Figure (Figure3d)3d) energies for all residue pairs not in contact in the native state of a sequence with L = 20 on a 2D lattice are plotted as a function of their BWCF. Here, values of the BWCF were calculated for the entire conformational ensemble (|Z| = 41,889,578) and not just for the relatively compact states as in Figure Figure3a3a and and3b.3b. The color-coding designates the different contacts that have a given value of λ (Table (Table1).1). It may be seen (Figure (Figure3d)3d) that almost perfect correlations (*r *≈ 1) are found between the perturbation energies and the BWCF for each given value of λ as predicted by Eq. (6). The correlations between the coupling energies and the BWCF for each given value of λ (except for λ = 0) are also excellent (Figure (Figure3c,3c, *r *> 0.97; P-value < 10^{-6}) but not perfect as those in Figure Figure3d3d for the perturbation energies. Plots for residue pairs in contact in the native state are not shown since the number of such pairs is small and the correlations are, thus, not significant.

In the third example, the coupling (Figure (Figure3e)3e) and perturbation (Figure (Figure3f)3f) energies for all residue pairs not in contact in the native state of a sequence with L = 27 on a 3 × 3 × 3 lattice are plotted as a function of their BWCF. Here, too, almost perfect correlations (*r *≈ 1) are found between the perturbation energies and the BWCF for each given value of λ (Figure (Figure3f)3f) whereas the correlations for the coupling energies (Figure (Figure3e)3e) are excellent (*r *= 0.92, 0.85, 0.92, 0.58 and 0.97 for λ values of -1.25, -1, -0.75, -0.25 and 1, respectively, with *P*-values < 2 × 10^{-3 }except for λ = -1.25 where the number of data points, n, is small (n = 4)) but not perfect as those in Figure Figure3f.3f. In summary, therefore, the data depicted in Figure Figure33 for different types of lattice models (2D or 3D lattices with or without full enumeration of all the conformational states in the ensemble and for single sequences or averaged for a SBSS) support the general result described by Eq. (6) that the free energies of both direct (in contact in the native state) and indirect pairwise interactions are linearly dependent on their Boltzmann-weighted contact frequencies. It should be pointed out that only weak or no correlations are observed when pairwise energies taken directly from Table Table11 are plotted against the BWCF, thereby providing further justification for the approach in this study that is based on the coupling or perturbation energies. The correlations in Figure Figure33 indicate that rare native contacts have more negative coupling energies than abundant native contacts. Likewise, rare non-contacting pairs have less positive coupling energies than abundant non-contacting pairs. Therefore, one may infer that native states can be stabilized by stabilizing contacts with low contact-frequency and destabilizing non-contacting pairs with a high contact-frequency.

Given that the interaction energy of a sequence in a specific lattice conformation is calculated by summing over all pairwise interactions between neighboring lattice points, it may seem surprising that non-direct interactions with significant positive coupling energies are found to exist (Figure (Figure3).3). However, it has been pointed out that the strengths of pairwise interactions in the native state determined by DMC are always relative to the unfolded state [28]. Hence, the positive coupling energies observed here in the case of non-contacting pairs reflect, to a large extent, pairwise interactions in the non-native conformations in the ensemble. Surprisingly, however, positive coupling energies are also observed in the case of residue pairs such as P, H that have interaction energies of zero (Table (Table1)1) and, therefore, should not be coupled even when they are in contact in non-native conformations. These non-zero coupling energies arise owing to non-additivity in entropy calculations [29].

The correlations shown in Figure Figure33 can be understood more intuitively by considering several extreme cases and keeping in mind that the free energy of the native state is a function of both the energy of the native conformation and the energies of all the other non-native conformations in the ensemble (see Eq. (3)). For simplicity, the Boltzmann weights of the different states will be neglected in the discussion that follows and we will, therefore, refer to the contact-frequency (and not the BWCF) of residue pairs. The following four extreme cases of perturbations will be considered: (i) elimination of a native contact with a contact-frequency of 1/|Z|; (ii) elimination of a native contact with a contact-frequency that approaches one; (iii) elimination of a non-native contact with a contact frequency of 1/|Z|; and (iv) elimination of a non-native contact with a contact-frequency that approaches one.

In the first case, a contact that exists only in the native state is perturbed and, therefore, only the energy of the native state is affected. Hence, the gap between the energy of the native conformation and the energies of the non-native conformations is reduced (Figure (Figure4,4, case (i)). Such a perturbation reduces Δ*H *by the value of the contact energy, *E*_{c}, and has no effect on Δ*S *(which is a function of the sum, Q', over all the non-native states). The perturbation energy, ΔΔ*G*_{per}, in this case is, therefore, equal to *E*_{c}.

**Effects of different perturbations on the energy spectrum of the native state and the ensemble of non-native conformations**. The effects of four different extreme cases of perturbations are depicted. In case (i), a native contact with a contact-frequency

**...**

In the second case, a contact that exists in both the native state and in most of the non-native states is perturbed and, therefore, the gap between the energy of the native conformation and the energies of the non-native conformations hardly changes (Figure (Figure4,4, case (ii)). In this case, *Q*'_{m}/*Q*'_{wt }< 1 and the perturbation energy, ΔΔ*G*_{per }= *E*_{c }- *kT*ln(*Q*'_{m}/*Q*'_{wt}), therefore, increases (note that *E*_{c }is negative) in accordance with the plot in Figure Figure3a.3a. Native contacts with a low contact-frequency, therefore, contribute more than those with a large contact-frequency to the gap between the energy of the native state and the energies of the non-native conformations, thereby explaining why they have more negative coupling energies (Figure (Figure3a3a).

In the third case of a perturbation of a non-native contact with a low contact frequency, it is clear that the energies of the native state and most of the non-native states do not change and, therefore, the energy gap also remains unchanged (Figure (Figure4,4, case (iii)). In the fourth case of a perturbation of a non-native contact with a high contact-frequency, most of the non-native conformations are destabilized but the energy of the native state is not affected and the gap between the energy of the native conformation and the energies of the non-native conformations, therefore, becomes larger (Figure (Figure4,4, case (iv)). In cases such as (iii) and (iv), when a pairwise interaction between residues that are not in contact in the native state is removed, there is no effect on Δ*H *and the perturbation energy is given by: ΔΔ*G*_{per }= - *kT*ln(*Q*'_{m}/*Q*'_{wt}). If the contact-frequency of the removed interaction is low (case (iii)), then *Q*'_{m }≈ *Q*'_{wt }and the perturbation energy will be equal to approximately zero. If the contact-frequency of the removed interaction is high (case (iv)), then *Q*'_{m}/*Q*'_{wt }< 1 and the value of the perturbation energy will increase in accordance with the plots in Figure Figure3.3. Non-native contacts with a high contact-frequency, therefore, contribute more than those with a low contact-frequency to the gap between the energy of the native state and the energies of the non-native conformations, thereby explaining why they have more positive coupling energies (Figure (Figure3).3). The effects shown schematically in Figure Figure44 almost always result in an increase of the energy of either the native state (case (i)), the non-native states (case (iv)) or both (case (ii)) since non-favorable pairwise interactions (Table (Table1)1) are rare given the amino acid composition we used. It is clear, however, that protein evolution might favor non-favorable interactions in non-native conformations that would destabilize them relative to the native state. Such an evolutionary process termed 'negative design' [30-32] would be reflected in negative (favorable) coupling energies between residues that are not in contact in the native state.

How important is contact-frequency for protein stability? In order to obtain some insight into this question, we compared the stabilization achieved when optimizing a sequence for a particular native conformation using two different functions: (i) F1 (Eq. (8)) that minimizes the energy of native contacts and maximizes the energy of non-native contacts ('negative design'); and (ii) F2 (Eq. (9)) in which the contributions of native and non-native contacts is weighted by their contact-frequency. Both functions have an adjustable parameter, W_{c}, which determines the relative weight of the contributions of the native vs. non-native interactions to stability. It can be seen (Figure (Figure5)5) that for sequences with L = 30 on a 5 × 6 lattice, greater stability is achieved when contact-frequency is taken into account across the entire range of W_{c }values. Similar results were obtained in cases of other lattice dimensions and sequence lengths when only the most compact conformations were considered. A more general scoring function will be needed for efficient design when the entire conformational space is considered.

## Conclusion

It is shown in this study that long-range pairwise interactions are also present in simple lattice models of proteins despite the fact that the interaction energy of a sequence in a specific conformation is based solely on direct interactions (Eq. (1)). Double-mutant cycle analysis of these lattice models and a mathematical analysis show that the strength of both direct and indirect native interactions increases (i.e. their coupling free energy becomes more negative) in a linear fashion with decreasing contact-frequency that is an entropic term. Hence, proteins can be stabilized by introducing (or stabilizing) contacts in the native state with a low contact-frequency and removing (or destabilizing) contacts in non-native states with a high contact-frequency, as shown in Figure Figure5.5. Although manifestations of the latter strategy of 'negative design' have been recognized before [32] it has not been fully appreciated how the choice of interactions to be introduced (stabilized) or removed (destabilized) affects the extent of stabilization. Our findings are not dependent on sequence length and lattice dimensions that determine the conformational ensemble size and are, thus, likely to be relevant to the selection of folding pathways, folding rates and the design of real proteins. It may be possible to implement our findings using ensembles that are derived computationally (such as with COREX [33]) before experimentally characterized conformational ensembles become available. The new approach described here, that involves combining DMC analysis with lattice models, may also pave the way for a rigorous analysis of other complex aspects of protein behavior. For example, simulation of protein evolution by subjecting lattice models to rounds of mutagenesis followed by selection can be used to assess the contribution of correlated mutations at distant positions to protein folding, stability and allosteric communication. Employing lattice models to address this issue has the distinct advantage that it renders possible separating between correlated mutations due to common ancestry and those due to biophysical factors. Such studies may reveal relationships between contact-frequency, correlated mutations and other properties of proteins such as contact-order [34].

## Methods

### The lattice model of proteins

2D or 3D lattice models that are similar to the one described by Jacob and Unger [35] were used. In brief, the protein sequence consists of an alphabet of five amino acids: hydrophobic (H), neutral polar (P), positively charged (+), negatively charged (-) and blank (B) for the use of mutations. The pairwise interaction energies (e_{ij}) are taken from Table Table11 and reflect in a qualitative manner the strengths of interactions between different types of amino acids. Similar results were obtained using other contact interaction matrices. The energies of all possible conformations of a given sequence on a particular lattice were calculated and the conformation with the lowest energy, if a single such one exists, was considered as its native conformation. A value of 1 was used for *kT*. It is important to note that the size of the ensemble, |Z|, is determined by the lattice dimensions and the same conformation of a given sequence may, therefore, have different values of Δ*G *due to different lattice dimensions.

Sequences of length (L) 16, 20, 25 and 30 were used for the 2D models and sequences with L = 27 for the 3D models. In the case of sequences with L = 16 or 20, all the respective 802,075 and 41,889,578 non-symmetric conformations were enumerated. In the case of sequences with L = 25 or L = 30 where the total number of conformations is too large to enumerate, we considered only the conformations that could be fitted into 5 × 5 or 5 × 6 lattices. Likewise, only the conformations that could be fitted into a 3 × 3 × 3 lattice were considered in the case of the 3D lattice models for sequences with L = 27. The numbers of all compact non-symmetric conformations of sequences with L = 25 on 5 × 5 and 5 × 6 lattices are 1081 and 377,779, respectively. The numbers of all compact non-symmetric conformations of sequences with L = 30 on a 5 × 6 lattice and L = 27 on a 3 × 3 × 3 lattice are 6431 and 103,346, respectively. The sequences were generated by random rearrangements of L residues with compositions of 44% H, 31% P, 12.5% (+) and 12.5% (-) in the case of sequences with L = 16, 40% H, 28% P, 16% (+) and 16% (-) in the case of sequences with L = 25, 42% H, 30% P, 14% (+) and 14% (-) in the case of sequences with L = 30 and 40% H, 30% P, 15% (+) and 15% (-) in the case of sequences with L = 20 or 27 (these compositions correspond roughly to those in the PDB).

### Generation of structure-based sequence sets (SBSS)

SBSS that contained more than 40 different sequences of the same length and with the same native conformation were generated. These SBSS have a mean sequence identity that is only between 0.29–0.34 since (as described above) the sequences were generated by random rearrangements and, thus, represent a random sample of sequence space. Nine different SBSS corresponding to different native conformations were examined.

### Calculation of coupling energies using double-mutant cycles

The strength of a pairwise interaction between residues i and j in the native conformation of a given sequence was evaluated by constructing a DMC that comprises the original wild-type sequence, two single mutants in which either residue i or j are replaced with the blank (B) residue and the corresponding double mutant in which both residues are replaced with this residue. The blank residue corresponds to alanine which is usually chosen as a reference state in experimental DMC since it is assumed that (i) replacement by this residue tends, in general, to reduce structural perturbations upon mutation and that (ii) interactions between alanine at one position and any other type of residue at the second position are minimal. The coupling energy, ΔΔ*G*_{int}, which is a measure of the strength of the pairwise interaction between residues i and j was calculated, as follows:

*G*

_{int }= Δ

*G*

_{i,j }- Δ

*G*

_{i,B }- Δ

*G*

_{B,j }+ Δ

*G*

_{B,B}

where Δ*G*_{i,j}, Δ*G*_{i,B}, Δ*G*_{B,j }and Δ*G*_{B,B }are the respective free energies of folding of the wild-type protein, the two single mutants and the double mutant that are calculated using Eq. (2). The coupling energy is equal to the difference in the free energies of two parallel processes in the cycle, Δ *G*(i,j→B,j) and Δ*G*(i,B→B,B), that correspond to the effect of mutating residue i (or j) when the other residue is present or absent, respectively (Figure (Figure1).1). In these calculations, negative and positive coupling energies reflect interactions that stabilize or destabilize the native state, respectively. We implemented such an experiment for each given pair of positions so that a coupling energy could be calculated for every possible pair of positions in each sequence.

### Calculation of perturbation energies

We also calculated a perturbation energy, ΔΔ*G*_{per}, = Δ*G*_{wt }- Δ*G*_{m}, for every possible pair of positions in each sequence where Δ*G*_{wt }and Δ*G*_{m }are the respective free energies of the wild-type native conformation before and after a particular pairwise interaction is 'turned off' but without affecting any other interactions. Under ideal circumstances [18], the coupling energy, which can be determined experimentally or calculated as described above, provides a good estimate of the perturbation energy that can only be determined by computation.

### Contact frequency-based protein stabilization

Sequences with a specific native conformation were generated by a Monte Carlo (MC) process that maximizes two design scores, F_{1 }and F_{2}, that either ignore the contact frequency or take it into account, respectively. The expressions for the scores are:

where W_{c }is the contact weight, N_{c }and N_{non }are the total number of contacts and non-contacts in the specific conformation, respectively, and f_{c }is the contact-frequency. The values of W_{c} were varied between 0.05–0.95. For each value of W_{c}, 100 designed sequences were generated in 10,000 MC steps and the average free energy of folding was then calculated.

## Authors' contributions

ON carried out all the calculations. AH wrote the paper. All the authors analysed the data, helped draft the paper and read and approved the final manuscript.

## Acknowledgements

We thank Profs. Gilad Haran, Michael Levitt and John Moult for useful comments on an earlier draft of this paper and Etai Jacob for providing us with source codes for lattice model calculations. This work was supported by grant 1339/08 of the Israel Science Foundation to R.U. O.N.-B. was supported in part by a Fellowship from the Kahn Family Research Center for Systems Biology of the Human Cell and the Kimmelman Center for Macromolecular Assembly.

## References

- Perutz MF. Mechanisms of co-operativity and allosteric regulation in proteins. Q Rev Biophys. 1989;22:139–236. doi: 10.1017/S0033583500003826. [PubMed] [Cross Ref]
- Horovitz A. Double-mutant cycles: a powerful tool for analysing protein structure and function. Fold & Des. 1996;1:R121–R126. doi: 10.1016/S1359-0278(96)00056-9. [PubMed] [Cross Ref]
- LiCata VJ, Ackers GK. Long-range, small magnitude nonadditivity of mutational effects in proteins. Biochemistry. 1995;34:3133–3139. doi: 10.1021/bi00010a001. [PubMed] [Cross Ref]
- Clarkson MW, Gilmore SA, Edgell MH, Lee AL. Dynamic coupling and allosteric behavior in a nonallosteric protein. Biochemistry. 2006;45:7693–7699. doi: 10.1021/bi060652l. [PMC free article] [PubMed] [Cross Ref]
- Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins: Struct Funct Genet. 1994;18:309–317. doi: 10.1002/prot.340180402. [PubMed] [Cross Ref]
- Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci USA. 1994;91:98–102. doi: 10.1073/pnas.91.1.98. [PMC free article] [PubMed] [Cross Ref]
- Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286:295–299. doi: 10.1126/science.286.5438.295. [PubMed] [Cross Ref]
- Kass I, Horovitz A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins: Struct Funct Genet. 2002;48:611–617. doi: 10.1002/prot.10180. [PubMed] [Cross Ref]
- Dima RI, Thirumalai D. Determination of network of residues that regulate allostery in protein families using sequence analysis. Protein Sci. 2006;15:258–268. doi: 10.1110/ps.051767306. [PMC free article] [PubMed] [Cross Ref]
- Ichiye T, Karplus M. Collective motions in proteins: a covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins: Struct Funct Genet. 1991;11:205–217. doi: 10.1002/prot.340110305. [PubMed] [Cross Ref]
- Rod TH, Radkiewicz JL, Brooks CL., III Correlated motion and the effect of distal mutations in dihydrofolate reductase. Proc Natl Acad Sci USA. 2003;100:6980–6985. doi: 10.1073/pnas.1230801100. [PMC free article] [PubMed] [Cross Ref]
- Chennubhotla C, Rader AJ, Yang LW, Bahar I. Elastic network models for understanding biomolecular machinery: from enzymes to supramolecular assemblies. Phys Biol. 2005;2:S173–S180. doi: 10.1088/1478-3975/2/4/S12. [PubMed] [Cross Ref]
- Ma J. Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure. 2005;13:373–380. doi: 10.1016/j.str.2005.02.002. [PubMed] [Cross Ref]
- Pollock DD, Taylor WR, Goldman N. Coevolving protein residues: maximum likelihood identification and relationship to structure. J Mol Biol. 1999;287:187–198. doi: 10.1006/jmbi.1998.2601. [PubMed] [Cross Ref]
- Wollenberg KR, Atchley WR. Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc Natl Acad Sci USA. 2000;97:3288–3291. doi: 10.1073/pnas.070154797. [PMC free article] [PubMed] [Cross Ref]
- Larson SM, Di Nardo AA, Davidson AR. Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. J Mol Biol. 2000;303:433–446. doi: 10.1006/jmbi.2000.4146. [PubMed] [Cross Ref]
- Noivirt O, Eisenstein M, Horovitz A. Detection and reduction of evolutionary noise in correlated mutation analysis. Protein Eng Des Sel. 2005;18:247–253. doi: 10.1093/protein/gzi029. [PubMed] [Cross Ref]
- Serrano L, Horovitz A, Avron B, Bycroft M, Fersht AR. Estimating the contribution of engineered surface electrostatic interactions to protein stability by using double-mutant cycles. Biochemistry. 1990;29:9343–9352. doi: 10.1021/bi00492a006. [PubMed] [Cross Ref]
- Sali A, Shakhnovich E, Karplus M. How does a protein fold? Nature. 1994;369:248–251. doi: 10.1038/369248a0. [PubMed] [Cross Ref]
- Hinds DA, Levitt M. Exploring conformational space with a simple lattice model for protein structure. J Mol Biol. 1994;243:668–682. doi: 10.1016/0022-2836(94)90040-X. [PubMed] [Cross Ref]
- Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS. Principles of protein folding-a perspective from simple exact models. Protein Sci. 1995;4:561–602. [PMC free article] [PubMed]
- Onuchic JN, Socci ND, Luthey-Schulten Z, Wolynes PG. Protein folding funnels: the nature of the transition state ensemble. Fold & Des. 1996;1:441–450. doi: 10.1016/S1359-0278(96)00060-0. [PubMed] [Cross Ref]
- Unger R, Moult J. Local interactions dominate folding in a simple protein model. J Mol Biol. 1996;259:988–994. doi: 10.1006/jmbi.1996.0375. [PubMed] [Cross Ref]
- Govindarajan S, Goldstein RA. On the thermodynamic hypothesis of protein folding. Proc Natl Acad Sci USA. 1998;95:5545–5549. doi: 10.1073/pnas.95.10.5545. [PMC free article] [PubMed] [Cross Ref]
- Mirny L, Shakhnovich E. Protein folding theory: from lattice to all-atom models. Annu Rev Biophys Biomol Struct. 2001;30:361–396. doi: 10.1146/annurev.biophys.30.1.361. [PubMed] [Cross Ref]
- Vendruscolo M, Mirny LA, Shakhnovich EI, Domany E. Comparison of two optimization methods to derive energy parameters for protein folding: perceptron and Z score. Proteins: Struct Funct Genet. 2000;41:192–201. doi: 10.1002/1097-0134(20001101)41:2<192::AID-PROT40>3.0.CO;2-3. [PubMed] [Cross Ref]
- Bratko D, Blanch HW. Effect of secondary structure on protein aggregation: A replica exchange simulation study. J Chem Phys. 2003;118:5185–5194. doi: 10.1063/1.1546429. [Cross Ref]
- Horovitz A, Fersht AR. Strategy for analysing the co-operativity of intramolecular interactions in peptides and proteins. J Mol Biol. 1990;214:613–617. doi: 10.1016/0022-2836(90)90275-Q. [PubMed] [Cross Ref]
- Mark AE, van Gunsteren WF. Decomposition of the free energy of a system in terms of specific interactions. Implications for theoretical and experimental studies. J Mol Biol. 1994;240:167–176. doi: 10.1006/jmbi.1994.1430. [PubMed] [Cross Ref]
- Hecht MH, Richardson JS, Richardson DC, Ogden RC. De novo design, expression, and characterization of Felix: a four-helix bundle protein of native-like sequence. Science. 1990;249:884–891. doi: 10.1126/science.2392678. [PubMed] [Cross Ref]
- Hellinga HW. Rational protein design: combining theory and experiment. Proc Natl Acad Sci USA. 1997;94:10015–10017. doi: 10.1073/pnas.94.19.10015. [PMC free article] [PubMed] [Cross Ref]
- Berezovsky IN, Zeldovich KB, Shakhnovich EI. Positive and negative design in stability and thermal adaptation of natural proteins. PLoS Comput Biol. 2007;3:498–507. doi: 10.1371/journal.pcbi.0030052. [PMC free article] [PubMed] [Cross Ref]
- Vertrees J, Barritt P, Whitten S, Hilser VJ. COREX/BEST server: a web browser-based program that calculates regional stability variations within protein structures. Bioinformatics. 2005;21:3318–3319. doi: 10.1093/bioinformatics/bti520. [PubMed] [Cross Ref]
- Plaxco KW, Simons KT, Baker D. Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol. 1998;277:985–994. doi: 10.1006/jmbi.1998.1645. [PubMed] [Cross Ref]
- Jacob E, Unger R. A tale of two tails: why are terminal residues of proteins exposed? Bioinformatics. 2007;23:e225–e230. doi: 10.1093/bioinformatics/btl318. [PubMed] [Cross Ref]

**BioMed Central**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (815K) |
- Citation

- Trade-off between positive and negative design of protein stability: from lattice models to real proteins.[PLoS Comput Biol. 2009]
*Noivirt-Brik O, Horovitz A, Unger R.**PLoS Comput Biol. 2009 Dec; 5(12):e1000592. Epub 2009 Dec 11.* - Properties of contact matrices induced by pairwise interactions in proteins.[Phys Rev E Stat Nonlin Soft Matter Phys. 20...]
*Miyazawa S, Kinjo AR.**Phys Rev E Stat Nonlin Soft Matter Phys. 2008 May; 77(5 Pt 1):051910. Epub 2008 May 14.* - Molecular mechanisms for cooperative folding of proteins.[J Mol Biol. 1998]
*Hao MH, Scheraga HA.**J Mol Biol. 1998 Apr 10; 277(4):973-83.* - Predicting 3D structures of protein-protein complexes.[Curr Pharm Biotechnol. 2008]
*Vakser IA, Kundrotas P.**Curr Pharm Biotechnol. 2008 Apr; 9(2):57-66.* - Liaison amid disorder: non-native interactions may underpin long-range coupling in proteins.[J Biol. 2009]
*Chan HS, Zhang Z.**J Biol. 2009; 8(3):27. Epub 2009 Mar 13.*

- Biophysics of protein evolution and evolutionary protein biophysics[Journal of the Royal Society Interface. 201...]
*Sikosek T, Chan HS.**Journal of the Royal Society Interface. 2014 Nov 6; 11(100)20140419* - A Firefly-Inspired Method for Protein Structure Prediction in Lattice Models[Biomolecules. ]
*Maher B, Albrecht AA, Loomes M, Yang XS, Steinhöfel K.**Biomolecules. 4(1)56-75* - Detecting Selection on Protein Stability through Statistical Mechanical Models of Folding and Evolution[Biomolecules. ]
*Bastolla U.**Biomolecules. 4(1)291-314* - On the Characterization and Software Implementation of General Protein Lattice Models[PLoS ONE. ]
*Bechini A.**PLoS ONE. 8(3)e59504* - Evidence for Participation of Remote Residues in the Catalytic Activity of Co-type Nitrile Hydratase from Pseudomonas putida[Biochemistry. 2011]
*Brodkin HR, Novak WR, Milne AC, D’Aquino JA, Karabacak NM, Agar JN, Payne MS, Petsko GA, Ondrechen MJ, Ringe D.**Biochemistry. 2011 Jun 7; 50(22)4923-4935*

- CompoundCompoundPubChem chemical compound records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records. Multiple substance records may contribute to the PubChem compound record.
- PubMedPubMedPubMed citations for these articles
- SubstanceSubstancePubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.

- Analysing the origin of long-range interactions in proteins using lattice modelsAnalysing the origin of long-range interactions in proteins using lattice modelsBMC Structural Biology. 2009; 9()4

Your browsing activity is empty.

Activity recording is turned off.

See more...