• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of geneticsGeneticsCurrent IssueInformation for AuthorsEditorial BoardSubscribeSubmit a Manuscript
Genetics. Jan 2007; 175(1): 255–266.
PMCID: PMC1775007

Thermodynamics of Neutral Protein Evolution

Abstract

Naturally evolving proteins gradually accumulate mutations while continuing to fold to stable structures. This process of neutral evolution is an important mode of genetic change and forms the basis for the molecular clock. We present a mathematical theory that predicts the number of accumulated mutations, the index of dispersion, and the distribution of stabilities in an evolving protein population from knowledge of the stability effects (ΔΔG values) for single mutations. Our theory quantitatively describes how neutral evolution leads to marginally stable proteins and provides formulas for calculating how fluctuations in stability can overdisperse the molecular clock. It also shows that the structural influences on the rate of sequence evolution observed in earlier simulations can be calculated using just the single-mutation ΔΔG values. We consider both the case when the product of the population size and mutation rate is small and the case when this product is large, and show that in the latter case the proteins evolve excess mutational robustness that is manifested by extra stability and an increase in the rate of sequence evolution. All our theoretical predictions are confirmed by simulations with lattice proteins. Our work provides a mathematical foundation for understanding how protein biophysics shapes the process of evolution.

PROTEINS evolve largely through the slow accumulation of amino acid substitutions. Over evolutionary time, this process of sequence divergence creates homologous proteins that differ at the majority of their residues, yet still fold to similar structures that often perform conserved biochemical functions (Lesk and Chothia 1980). The maintenance of structure and function during sequence divergence suggests that much of protein evolution is neutral in the sense that observed sequence changes frequently do not alter a protein's ability to fold and adequately perform the biochemical function necessary to enable its host organism to survive. This comparative evidence for neutrality in protein evolution has been corroborated by experimental studies showing that the mutations separating diverged sequences often have no effect other than modest and additive changes to stability (Serrano et al. 1993) and that a large fraction of random mutations do not detectably alter a protein's structure or function (Shortle and Lin 1985; Pakula et al. 1986; Loeb et al. 1989; Guo et al. 2004; Bloom et al. 2005, 2006a). In this respect, it seems that protein evolution should be well described by Kimura's neutral theory of evolution, which holds that most genetic change is due to the stochastic fixation of neutral mutations (Kimura 1983). One of the key predictions of the neutral theory is that assuming a constant mutation rate, the number of mutations separating two proteins should be proportional to the time since their divergence (Kimura 1983). Indeed, the observation by Zuckerkandl and Pauling (1965) that proteins are “molecular clocks” that accumulate mutations at a roughly constant rate has long been taken as one of the strongest pieces of evidence supporting the neutral theory (Ohta and Kimura 1971).

However, mutations that are neutral with respect to a protein's capacity to perform its biological function often affect protein thermodynamics. The biological functions of most proteins depend on their ability to fold to thermodynamically stable native structures (Anfinsen 1973). Yet natural proteins are typically only marginally stable, with free energies of folding (ΔGf) between −5 and −15 kcal/mol (Fersht 1999). Most random mutations to proteins are destabilizing (Pakula et al. 1986; Matthews 1993; Godoy-Ruiz et al. 2004; Kumar et al. 2006), and their effects on stability (measured as ΔΔG, the ΔGf of the mutant protein minus the ΔGf of the wild-type protein) are frequently of the same magnitude as a protein's net stability. The impact of a mutation on a protein's function can therefore depend on the protein's stability: a moderately destabilizing mutation that is easily tolerated by a stable parent protein may completely disrupt the folding of a less stable parent. This effect of protein stability on mutational tolerance has been verified by experiments demonstrating that more stable protein variants are markedly more robust to random mutations (Bloom et al. 2005, 2006a).

The fact that mutations that are neutral with respect to direct selection for protein function can affect a protein's tolerance to subsequent mutations is not consistent with the simplest formulation of the neutral theory of evolution, which tends to assume that the fraction of mutations that is neutral remains constant in time. Kimura (1987) himself recognized the possibility that the neutrality might change, and Takahata (1987) mathematically treated the consequences of a “fluctuating neutral space.” In particular, Takahata showed that fluctuating neutrality could explain the observed overdispersion in the molecular clock (Cutler 2000b) (the tendency for the variance in the number of fixed mutations to exceed the expectation for the Poisson process predicted by the neutral theory) long considered troublesome for the neutral theory. However, further progress on this topic was stymied by the lack of a specific model for how or why protein neutrality might fluctuate.

More recently, researchers have preferred to describe neutral evolution using the concept of “neutral networks,” which are networks in the space of possible protein sequences in which each functional protein is linked to all other functional proteins that differ by only a single mutation (Smith 1970; Huynen et al. 1996; Govindarajan and Goldstein 1997; Bornberg-Bauer and Chan 1999; van Nimwegen et al. 1999; Tiana et al. 2000; Bastolla et al. 2002). A neutrally evolving protein population is then envisioned as moving on the neutral network, and the neutrality of the population may fluctuate if the nodes on the network differ in their connectivities. A general theoretical treatment of evolution on neutral networks by van Nimwegen et al. (1999) has shown that if the product of the population size and mutation rate is small then members of the population are equally likely to occupy any node, while if this product is large then the population will preferentially occupy highly connected nodes (see also Bornberg-Bauer and Chan 1999; Taverna and Goldstein 2002b; Xia and Levitt 2004). Simulations with simplified lattice models of proteins have attempted to provide insight into the specific features of protein neutral networks. These simulations have shown that lattice protein neutral networks are centered around highly connected nodes occupied by stable proteins (Bornberg-Bauer and Chan 1999; Broglia et al. 1999; Xia and Levitt 2004; Wingreen et al. 2004), a finding consistent with the experimental observation (Bloom et al. 2005, 2006a) that stable proteins are more mutationally robust. Lattice protein studies also suggest that protein structures differ in their “designabilities” (defined as the total number of sequences that fold into a structure) and that sequences that fold into more designable structures will neutrally evolve at a faster rate due to the increased size and connectivity of their neutral networks (Li et al. 1996; Govindarajan and Goldstein 1997; Chan and Bornberg-Bauer 2002; England and Shakhnovich 2003; Wingreen et al. 2004). Finally, simulations have demonstrated that fluctuations in neutrality as a protein population moves along its neutral network can lead to an overdispersion of the molecular clock (Bastolla et al. 2002), as originally suggested by Takahata (1987). However, an extension of these lattice protein simulations of evolution on neutral networks into a quantitative theory has been difficult because protein neutral networks are far too large to be computed for all but the simplest lattice models.

Here we present a mathematical treatment of neutral protein evolution that describes the evolutionary dynamics in terms of the ΔΔG values for single mutations, which are experimentally measurable. Our treatment is based on the experimentally verified (Bloom et al. 2005, 2006a) connection between protein stability and mutational robustness, as well as a few biophysically supported assumptions about ΔΔG values for random mutations. By linking a protein's tolerance to mutations with stability, we are able to quantitatively describe neutral evolution without a full description of the neutral network. We can then compute the average number of accumulated mutations, the average fraction of neutral mutations, the index of dispersion, and the distribution of stabilities in a neutrally evolving population solely from knowledge of the ΔΔG values for single mutations. In addition, we follow the formalism of van Nimwegen et al. (1999) to calculate all four of these properties in the limit when the product of the population size and mutation rate is much less than one and in the limit when this product is much greater than one. In demonstrating that these properties are different in these two limits, we show that the rate of fixation of neutral mutations can vary with population size in violation of one of the standard predictions of Kimura's neutral theory (Kimura 1987). Our work presents a unified view of neutral protein evolution that is grounded in measurable thermodynamic quantities.

MATERIALS AND METHODS

Lattice protein simulations:

We performed simulations with lattice proteins of L = 20 monomers of 20 types corresponding to the natural amino acids. The proteins could occupy any of the 41,889,578 possible compact or noncompact conformations on a two-dimensional lattice. The energy of a conformation C is the sum of the nonbonded nearest-neighbor interactions, equation M1, where equation M2 is one if residues i and j are nearest neighbors in conformation C and zero otherwise, and equation M3 is the interaction energy between residue types equation M4 and equation M5, given by Table 5 of Miyazawa and Jernigan (1985). We computed the stability of a conformation equation M6 as equation M7, where equation M8 is the partition sum, made tractable by noting that there are only 910,972 unique contact sets. All simulations were performed at a reduced temperature of T = 1.0.

We used adaptive walks to find sequences that folded into each of the three arbitrarily chosen conformations shown in Figure 2 with ΔGf ≤ 0, and then neutrally evolved these sequences for 104 generations with a population size of N = 100. Our evolutionary algorithm was as follows: at each generation we randomly chose a protein that folded to the parental structure with ΔGf ≤ 0 from the population and mutated each residue to some other randomly chosen residue with probability 5 × 10−4 and continued doing this until we had filled the new population with proteins. At the end of this equilibration evolution, we chose the most abundant sequence in the population as the starting point for further analysis and for the computation of the distribution of ΔΔG values for all 380 point mutations (sequences shown in Figure 2). In principle, computing the distribution of ΔΔG values over all sequences in the population rather than just the most abundant one should give a more accurate representation of the true form of this distribution, and indeed we found that doing this slightly increased the accuracy of the predictions shown in Figure 2. However, the resulting improvement in accuracy was small, since the approximate constancy of the ΔΔG distribution during neutral evolution (discussed below) means that the distribution computed over a single sequence is representative of that computed over all sequences in the population. Therefore, we chose to compute the ΔΔG distribution over just the most abundant sequence since this choice more closely tracks what would be experimentally feasible with real proteins. (It is experimentally tractable to compute ΔΔG values for a single protein, but would be unmanageable to do so for all proteins in a natural population.)

Figure 2.
The theory gives accurate predictions for the evolution of model lattice proteins. Each row of graphs corresponds to a different lattice protein. The left graphs show the starting protein and the distribution of ΔΔG values for all point ...

To collect data for the case when the product Nμ of the population size N and the per protein per generation mutation rate μ is equation M9, we first equilibrated 1000 replicates by evolving each of them with a population size of N = 10 and for 5000 generations starting with a clonal population of the initial sequence described above. The remainder of the evolutionary algorithm was as described above: the mutation rate stayed at 5 × 10−4 per residue per generation (corresponding to a per protein per generation mutation rate of μ = 10−2), and at each generation all proteins that folded to the target native structure with ΔGf ≤ 0 reproduced with equal probability. We then evolved each of these equilibrated populations for a further 5000 generations to collect data. We combined the data for all the folded proteins in the final populations of all the replicates to calculate the average number of mutations equation M10 after T generations, the corresponding index of dispersion RT, and the distribution of stabilities shown in Figure 2. If we instead simply randomly chose a single folded protein from the final population of each replicate, we obtained results that were identical within the precision shown in Figure 2. We emphasize that equation M11 and RT were computed by keeping track of the actual number of mutations that had occurred during the evolutionary history of each protein, not simply by counting the number of amino acid differences between the ancestral and the final sequences (the two quantities may differ if a single site undergoes multiple mutations, as discussed in more detail in later sections).

To generate the data for equation M12, we used the same procedure but with N = 105 and performed only 10 replicates. We again computed the statistics shown in Figure 2 by combining the data for all of the folded proteins in the final populations of all 10 replicates. Similar results were obtained if we instead computed equation M13 and RT over all of the folded proteins in the final population of a single replicate (average values of equation M14 were identical while the RT values of 1.03, 0.95, and 0.94 were extremely similar to those shown from top to bottom in Figure 2). This outcome is expected since the probability distributions for equation M15 evolve deterministically.

Lattice protein predictions:

The numerical predictions for the lattice proteins given in Figure 2 were computed by constructing the matrix W described in the first section of results with a bin size of b = 0.005 and truncating the matrix by assuming that no proteins would have stabilities < −5.0. For the case when equation M16, equation M17 was calculated using Equation 6 and RT was calculated using Equation 11. For equation M18, equation M19 was calculated using Equation 18 and RT was calculated using Equation 19.

RESULTS

Assumptions and mathematical background:

Here we describe the physical view of protein evolution that motivates our work. We begin with the basic observations that evolution selects for protein function and that most proteins must stably fold to function (Anfinsen 1973), meaning that protein stability is under evolutionary pressure only insofar as it must be sufficient to allow a protein to fold and function. In taking this view, we ignore those proteins (estimated at 10% of prokaryotic and 30% of eukaryotic proteins) that are intrinsically disordered (Uversky et al. 2005), as well as those rare proteins that are only kinetically stable (Jaswal et al. 2002). Natural selection for function requires a protein to fold with some minimal stability equation M20, since proteins that lack this minimal stability will be unable to reliably adopt their native structure and perform their biochemical task. A protein's extra stability beyond this minimal threshold is quantified as equation M21, meaning that all functional proteins must have equation M22 (more negative values of ΔGf indicate increased stability). We further assume that as long as equation M23, natural selection for protein function is indifferent to the exact amount of extra stability a protein possesses. This assumption is at odds with the persistent speculation that high stability inherently impairs protein function and so is selected against by evolution (DePristo et al. 2005; Somero 1995). But the circular argument most commonly advanced to support this speculation—that the observed marginal stability of natural proteins indicates that higher stability is detrimental to protein function—has now been contradicted both by experiments that have dramatically increased protein stability without sacrificing function (Serrano et al. 1993; Giver et al. 1998; van den Burg et al. 1998; Zhao and Arnold 1999) and by demonstrations that marginal stability is a simple consequence of the fact that most mutations are destabilizing (Arnold et al. 2001; Taverna and Goldstein 2002a; this work). There is a possibility, however, that certain regulatory proteins must be marginally stable to facilitate rapid degradation (Huntzicker et al. 2006). To summarize, current biochemical evidence supports our assumption that (with certain well-defined exceptions) the only requirement imposed on protein stability by natural selection for protein function is that stability must meet or surpass some minimal threshold (a protein must have equation M24).

A mutation to a protein changes its stability by an amount ΔΔG, and experimental measurements of ΔΔG values have shown that most mutations are destabilizing (have ΔΔG > 0) (Pakula et al. 1986; Matthews 1993; Godoy-Ruiz et al. 2004; Kumar et al. 2006). A mutation is neutral with respect to selection for stability if equation M25 since the mutant protein still satisfies the minimal stability threshold; otherwise the mutant does not stably fold and is culled by natural selection. Of course, mutations can also have specific effects on protein function (such as altering an enzyme's activity), but experiments have shown that such mutations are rare compared to the large number of mutations that affect stability (Shortle and Lin 1985; Pakula et al. 1986; Loeb et al. 1989; Bloom et al. 2006a). Mutations can also have effects unrelated to the functioning of the individual protein molecule: they can affect its propensity to aggregate (Chiti et al. 2000), alter its codon usage (Akashi 2003), change its mRNA stability (Chamary and Hurst 2005), affect the efficiency or accuracy of translation (Akashi 2003; Rocha and Danchin 2004), or change the fraction of mistranslated proteins that fold (Drummond et al. 2005). These higher-level effects are probably most apparent in the evolution of highly expressed proteins (Pal et al. 2001; Drummond et al. 2005). However, here we ignore such effects and assume that the evolutionary impact of a mutation is determined mostly by its effect on protein stability (an assumption in agreement with a recent bioinformatics analysis by Sanchez et al. 2006). The view we present therefore describes the impact of a mutation solely by its ΔΔG value and the equation M26 of the wild-type protein and is summarized graphically in Figure 1. We have previously used a similar view to successfully describe experimental protein mutagenesis results (Bloom et al. 2005, 2006a).

Figure 1.
A thermodynamic view of protein evolution. A mutant protein stably folds if and only if it possesses some minimal stability, equation M125 (in this case −5 kcal/mol). The stability of the wild-type protein is equation M126, meaning that it has equation M127 of extra stability. The ...

To use the view of Figure 1 to construct a useful description of neutral protein evolution, we make one major assumption: that the overall distribution of ΔΔG values for random mutations stays roughly constant as the protein sequence evolves. Actually, this assumption is stronger than is strictly needed for the mathematical theory presented below—the theory can be developed simply by assuming that all proteins with the same ΔGf have the same distribution of ΔΔG values (in this case the matrix elements Wij introduced below depend on j in addition to the difference ij). However, we make the stronger assumption that the ΔΔG distribution remains constant during sequence evolution, since we believe that this assumption is consistent with existing evidence. We emphasize that this assumption does not imply that we are arguing that the ΔΔG distribution is identical for every possible protein sequence. Clearly, any given structure has a most stable sequence (with all ΔΔG values positive), a least stable sequence (with all ΔΔG values negative), and a vast range of sequences in between. However, most of these sequences fall within a stability range that is never populated by evolution, since simulations (Taverna and Goldstein 2002a) and experiments (Davidson et al. 1995; Keefe and Szostak 2001) clearly show that the vast majority of protein sequences do not stably fold into any structure (meaning the least stable folded protein is still far more stable than the typical random sequence). Among the subset of sequences that do stably fold, the simple statistical reality that marginally stable sequences are far more abundant than highly stable sequences causes evolution to further confine itself mostly to sequences with stabilities far less than that of the most stable sequence (Arnold et al. 2001; Taverna and Goldstein 2002a; this work). This fact is amply demonstrated by engineering experiments that have greatly increased the stability of natural proteins without sacrificing any of their functional properties (Serrano et al. 1993; Giver et al. 1998; van den Burg et al. 1998; Zhao and Arnold 1999). Therefore, although the distribution of ΔΔG values certainly varies widely among all sequences, it is still reasonable to assume that it is relatively constant among those sequences visited by natural evolution. This assumption of a constant ΔΔG distribution among evolved sequences is explicitly supported by simulations (Broglia et al. 1999; Bloom et al. 2005, 2006a; Wilke et al. 2005; this work) and is consistent with the observation that the number of neighbors on a protein's neutral network is approximately determined by its stability (Bornberg-Bauer and Chan 1999; Xia and Levitt 2004). Furthermore, protein mutagenesis experiments indicate that the ΔΔG values for random mutations are usually additive (Wells 1990; Serrano et al. 1993), meaning that any given mutation to a protein of length L will alter only ≈1/L of the other ΔΔG values, leaving the ΔΔG distribution mostly unchanged. Finally, the assumption of a constant ΔΔG distribution has been shown to explain the experimentally observed exponential decline in the fraction of functional proteins with increasing numbers of mutations (Bloom et al. 2005). However, we acknowledge that at present the assumption of a roughly constant ΔΔG distribution among neutrally evolving proteins can be verified only for lattice proteins—for real proteins the most we can say is that it is consistent with existing experimental evidence.

We begin our mathematical treatment by conceptually dividing the continuous variable of protein stability into small discrete bins of width b. This discretization of stability allows us to treat mutations as moving a protein from one bin to another—the bins can be made arbitrarily small to eliminate any numerical effects of the binning. The stability of each folded protein in the evolving population (the folded proteins are all those with equation M27) can be described by specifying its stability bin. Specifically, a protein is in bin i if it has equation M28 between equation M29 and −ib, where i = 1, 2,  . Let Wij be the probability that a random mutation has a ΔΔG value such that it moves a protein's stability from bin j to bin i, where i and j both are in the range 1, 2,  . Then Wij is easily computed as the fraction of ΔΔG values between equation M30 and equation M31. Since Wij describes only transitions between folded proteins, and since we have assumed that a protein's mutational tolerance is determined by its stability, then the fraction of folded mutants (neutrality) of a protein in bin j is equation M32. Clearly, more stable proteins will have larger values of νj.

In the following two sections, we use the matrix W with elements Wij to calculate the distribution of stabilities in an evolving protein population of constant size N, the mean number of mutations equation M33 after T generations, the corresponding index of dispersion equation M34, and the average fraction of mutations equation M35 that do not destabilize the proteins past the minimal stability threshold. We assume that W is computed from the distribution of ΔΔG values for all random single-amino-acid mutations, although in principle it could be for any type of mutation. We also assume that the per-protein-per-generation mutation rate μ is small, so that at each generation a protein undergoes at most one mutation. Our calculations at first follow and then extend the theoretical treatment by van Nimwegen et al. (1999) of evolution on a neutral network. In particular, we follow their lead in separately treating the two limiting cases where the product Nμ of the population size and mutation rate is [double less-than sign]1 and [dbl greater-than sign]1. We emphasize that all of the equations derived in the following two sections depend only on the mutation rate μ, the number of generations T, and the matrix W, which can be computed from the single-mutant ΔΔG values. The population size N determines the applicable limiting case, but otherwise drops out of all final results.

Limit when equation M36:

When equation M37, the evolving population is usually clonal, since each mutation either is lost or goes to fixation before the next mutation occurs. If a mutation destabilizes a protein in the population beyond the stability cutoff, then it is immediately culled by natural selection. If a mutation does not destabilize a protein beyond the stability cutoff, it will be lost to genetic drift with probability equation M38 and go to fixation with probability 1/N (Kimura 1983). Since mutations occur rarely (equation M39), the loss or fixation of the mutant will occur before the next mutant appears in the population. The entire population therefore moves as one entity along its neutral network. The population can thus be described by the column vector equation M40, with element equation M41 giving the probability that the population is in stability bin i at time t.

If the population is initially in stability bin j, at each generation there is a probability NμWij that a protein experiences a mutation that changes its stability to bin i, and if such a mutation occurs, then there is a probability of 1/N that it is eventually fixed in the population. Therefore, at each generation there is a probability μWij that the population experiences a mutation that eventually causes it to move from stability bin j to bin i. If we define the matrix V so that the diagonal elements are given by Vii = νi and all other elements are zero, then p evolves according to

equation M42
(1)

where I is the identity matrix. Note that this equation treats lethal mutations (those that destabilize a protein beyond the cutoff) as immediately being lost to natural selection and so leaving the population in its original stability bin (hence the population accumulates a mutation with probability μV rather than probability μ). Equation 1 describes a Markov process with the nonnegative, irreducible, and acyclic transition matrix A = I − μV + μW, and so p approaches the unique stationary distribution po, satisfying

equation M43
(2)

This equation gives the expected distribution of protein stabilities solely in terms of the single-mutant ΔΔG values.

We now calculate the average number of mutations equation M44 that accumulate in an equilibrated population after T generations and the corresponding index of dispersion RT,o. We emphasize that equation M45 represents the average number of accumulated mutations during the course of the evolutionary process. When the number of accumulated mutations m is small compared to the length of the protein sequence L (equation M46), then m is just equal to the number of residues differing from those in the parent protein sequence (the Hamming distance). However, when m becomes substantial relative to L, m becomes larger than the Hamming distance since some sites will undergo multiple mutations (Jukes and Cantor 1969). In this case it is necessary to use a substitution model to infer m from the observed Hamming distance. In the treatment that follows, we calculate the expected value of m; application of these formulas to actual protein sequences requires use of one of the well-established statistical techniques for inferring m from the Hamming distance (Jukes and Cantor 1969; Goldman and Yang 1994). We begin the calculation of equation M47 by defining equation M48 to be the column vector with element i giving the probability that at time t the population has accumulated m mutations and is in stability bin i. The time evolution of equation M49 is given by

equation M50
(3)

The kth moment of the number of mutations at time t is

equation M51
(4)

where equation M52 is the unit row vector. We can write a recursive equation for equation M53 in the long-time limit (steady state) by multiplying both sides of Equation 3 by m, summing over m, and left multiplying by e to obtain

equation M54
(5)

where we have used the property eA = e, noted that in the long-time limit equation M55 and equation M56, and defined the average neutrality as equation M57 = eWpo = eVpo. Summing the recursion yields the steady-state value for the number of accumulated mutations,

equation M58
(6)

To calculate the index of dispersion equation M59, we need to find the second moment equation M60. In a fashion analogous to the construction of Equation 5, we can write a recursive expression for the long-time limit of equation M61 as

equation M62
(7)

where we have used the property (implicit in Equation 5) that in the long-time limit, equation M63. Summing the recursion yields the following value for the long-time limit,

equation M64
(8)

where we have made the substitution eWQWpo = equation M65 and noted that equation M66 since A is an irreducible, aperiodic, stochastic matrix (Ewens and Grant 2005). This yields a value for the index of dispersion in the long-time limit of

equation M67
(9)

The above equation is consistent with the generic equation for the index of dispersion given by Cutler (2000a,b), where we now give concrete expressions for the variables ρ and equation M68 in Cutler's formula in terms of measurable quantitites, namely ρ = μequation M69 and equation M70.

We can further simplify Equation 9 by performing spectral decompositions of A and Q. Let λ1,  , λK be the eigenvalues of VW, and let r1,  , rK and l1,  , lK be the corresponding right and left eigenvectors, normalized so that lirj = 1 if i = j and 0 otherwise. These eigenvectors are also eigenvectors of the irreducible, acyclic, stochastic A, and the corresponding eigenvalues are 1 − μλ1,  , 1 − μλK, with Perron–Frobenius theorems guaranteeing that one eigenvalue (chosen here to be 1 − μλ1) = 1 and all other eigenvalues have absolute values <1. Then r1 and l1 are right and left eigenvectors of Q with eigenvalue 1 (i.e., r1 = po and l1 = e), and all other eigenvalues of Q are 0. The spectral decompositions are therefore Q = r1l1 and equation M71. Inserting these spectral decompositions into Equation 9, we find for the index of dispersion a value of

equation M72
(10)

since equation M73 (Ewens and Grant 2005). In the limit of large T and small μ, the value of RT,o given by the above equation approaches the value

equation M74
(11)

where the μequation M75 term drops out because μ is small and the equation M76 term drops out because T is large and equation M77. This equation shows that RT,o approaches a constant value independent of T and μ. Although we could not prove that the value of RT,o given by Equation 11 is necessarily >1 (since some of the eigenvalues λi could be complex), in all of our simulations we observed RT,o > 1, suggesting that when equation M78, fluctuations in protein stability tend to overdisperse the molecular clock.

Limit when Nμ[dbl greater-than sign]1:

When equation M79, the population is spread across many nodes of the neutral network rather than converged on a single sequence (van Nimwegen et al. 1999). In this limit, we treat the evolutionary dynamics of the population deterministically (i.e., we assume an infinite population size) and describe the distribution of stabilities in the population by the column vector equation M80, with element equation M81 giving the fraction of proteins in the population at time t that have stabilities in bin i. At generation t, the fraction of mutated proteins that continue to fold is equation M82. These folded proteins reproduce, and, to maintain a constant population size, this reproduction must balance the removal of proteins by death, meaning that each folded sequence must produce an average of equation M83 offspring. The population therefore evolves according to

equation M84
(12)

After the population has evolved for a sufficient period of time, x approaches an equilibrium distribution of x. The corresponding equilibrium neutrality is equation M85 = eWx, and the equilibrium reproduction rate is equation M86, so

equation M87
(13)

This equation can be rewritten to show that x is the principal eigenvector of W,

equation M88
(14)

We note that equation M89 approximates the asymptotic neutrality for the decline in the fraction of folded proteins upon random mutagenesis (Bloom et al. 2005; Wilke et al. 2005).

We now determine the average number of accumulated mutations equation M90 and the corresponding index of dispersion RT,∞ by treating the forward evolutionary process. As described in the text immediately prior to Equation 3, our calculations describe the actual number of mutations accumulated during the evolutionary process, which may differ from the number of sequence differences relative to the ancestor if a single site undergoes multiple mutations. When equation M91, it is not a priori obvious that the average number of mutations present in the population is equivalent to the number of fixed substitutions along the line of descent. Therefore, in the appendix, we show that identical results are obtained by tracing a randomly chosen protein backward in time along its ancestor distribution, proving that the treatment we give below is mathematically equivalent to treating the time-reversed process. We define equation M92 as the column vector with element i giving the fraction of the population at time t that has accumulated m mutations and is in stability bin i. Once the population has reached the equilibrium distribution of stabilities, the time evolution of equation M93 is

equation M94
(15)

The recursion can be solved to obtain

equation M95
(16)

as can be verified by direct substitution. Since we are assuming that the population has equilibrated at time 0 and no mutations have accumulated at that time, equation M96 is x for m = 0 and 0 otherwise. Furthermore, x satisfies Equation 14, so multiplying Equation 16 by e yields

equation M97
(17)

where equation M98 gives the fraction of the population that has accumulated m mutations after t generations. The average number of accumulated mutations after T generations is the mean of this binomial distribution,

equation M99
(18)

Using the well-known result for the variance of the binomial distribution, we find that the index of dispersion is

equation M100
(19)

It is important to reiterate that the above equation was derived under the assumption that there is at most one mutation per sequence per generation. For realistic distributions of mutations (i.e., Poisson), this means that equation M101. In this regime, RT,∞ is close to 1.

Lattice protein simulations:

We tested our theory's predictions on the evolutionary dynamics of lattice proteins. Lattice proteins are simple protein models that are useful tools for studying protein folding and evolution (Chan and Bornberg-Bauer 2002). Our lattice proteins were chains of 20 amino acids that folded on a two-dimensional lattice. The energy of a lattice protein conformation was equal to the sum of the pairwise interactions between nonbonded amino acids (Miyazawa and Jernigan 1985). Each lattice protein has 41,889,578 possible conformations, and by summing over all of these conformations we could exactly determine the partition sum and calculate ΔGf. We set a minimal stability threshold for the lattice proteins of equation M102, meaning that we considered all proteins that folded to the target structure with ΔGf ≤ 0 to be folded and functional, while all proteins with ΔGf > 0 were considered to be nonfunctional. We note that this stability threshold is equivalent to requiring a lattice protein to spend at least half of its time in the target native structure at equilibrium. We began by generating lattice proteins that stably folded to each of the three different structures shown in Figure 2. For each of these three proteins, we determined the distribution of ΔΔG values for all 380 single mutations (these distributions are shown in Figure 2). These distributions were used to construct the matrix W and to predict the equilibrium distribution of stabilities, the average number of mutations, and the indexes of dispersion for both the equation M103 and the equation M104 cases, using the equations presented in the preceding sections.

To test the accuracy of these predictions, we then simulated evolving populations of the lattice proteins with a standard evolutionary algorithm using Wright–Fisher sampling. Briefly, the populations were held at a constant size of either N = 10 or N = 105. At each generation, a new population was created by choosing parents with equal probability from all folded proteins in the previous generation's population and copying these parents into the new population with a mutation rate of 5 × 10−4 mutations per residue per generation. Since the proteins have a length of 20 amino acids, this mutation rate corresponds to a per-protein-per-generation mutation rate of μ = 10−2. Therefore, the product Nμ is either 0.1 or 103, corresponding to equation M105 or equation M106, respectively. We emphasize that the lattice protein evolutionary algorithm is the same for both population sizes. When N = 10 the population naturally follows dynamics approximating those presented for equation M107, while when N = 105 it naturally follows dynamics approximating those presented for equation M108 (as evidenced by the excellent agreement of the predictions with the simulations). For N = 10, we performed 1000 replicates for each different structure. For N = 105, computational constraints limited us to 10 replicates for each structure (however, the evolutionary dynamics are nearly deterministic in this case, so all replicates yielded similar results). We note that during the simulations we recorded the number of mutations that actually accumulated rather than simply computing the number of differences (Hamming distance) from the original sequence.

Figure 2 shows the theoretical predictions and simulation results for each of the three structures. The theoretical predictions are in good agreement with the simulation results. Figure 2 clearly shows that when equation M109, the proteins tend to be more stable than when equation M110. This extra stability is a biophysical manifestation of the neutrally evolved mutational robustness predicted by van Nimwegen et al. (1999). This increase in stability leads to a substantial increase in the number of accumulated mutations. In accordance with the theoretical predictions, when equation M111 the index of dispersion is elevated above one by fluctuations in protein stability. Another clear result from the simulations is that proteins of different structure show markedly different distributions of stabilities and rates of sequence evolution due to the differences in their ΔΔG distributions. Overall, the simulations offer strong support for the validity of the theoretical predictions in the preceding sections.

DISCUSSION

We have presented a theory that offers quantitative predictions about the distribution of stabilities, the average number of fixed mutations, and the index of dispersion for an evolving protein population in terms of the ΔΔG values for individual mutations. We have demonstrated that these predictions are accurate for simple lattice proteins and have used existing biophysical evidence to argue that the basic theoretical assumptions should also be accurate for real proteins. Here, we give qualitative interpretations of the mathematical results and discuss their implications for our understanding of protein evolution.

One major result is to show that the effects of protein structure on the rate of sequence evolution can be quantitatively cast in terms of the ΔΔG values for single mutations. Numerous lattice protein simulations have shown that protein structure can dramatically affect the rate of sequence evolution, since structures that are more “designable” (encoded by more sequences) can evolve their sequences more rapidly (as can be seen in Figure 2) (Li et al. 1996; Govindarajan and Goldstein 1997; Bornberg-Bauer and Chan 1999; Tiana et al. 2000; Chan and Bornberg-Bauer 2002; Wingreen et al. 2004; Xia and Levitt 2004). Unfortunately, these simulations typically measure structural designability by enumerating a large number of lattice protein sequences, meaning that their findings cannot be extended to real proteins for which such extensive enumeration is impossible. However, recent theoretical work by England and Shakhnovich (2003) has made progress in connecting designability to observable structural properties, and a bioinformatics analysis based on this theoretical measure of designability indicates that structure indeed influences the evolutionary rate of real proteins (Bloom et al. 2006b). Our work provides a way to quantitatively relate the structural influences on protein evolution to experimentally measureable ΔΔG values, opening the door to further connecting structural designability and sequence evolution to laboratory stability measurements. Although thousands of ΔΔG values have been measured experimentally (Kumar et al. 2006), at present there are no large sets of measurements for truly random mutations to a single protein. When such sets of measurements become available, it should be possible to use them in conjunction with the theory that we have presented to predict the neutralities of real proteins with different structures.

A second important result is to show that protein evolutionary dynamics can depend on the product of population size and mutation rate, Nμ. When equation M112, the evolving protein population is polymorphic in stability and subject to frequent mutations, so the more stable (and thus more mutationally tolerant) proteins produce more folded offspring. In contrast, when equation M113, the population is usually monomorphic in stability and so all members of the population are equally likely to produce folded offspring. The general tendency for populations to neutrally evolve mutational robustness when equation M114 has previously been treated mathematically by van Nimwegen et al. (1999), and a variety of lattice protein simulations have noted the tendency of evolving protein populations to preferentially occupy highly connected neutral network nodes (Bornberg-Bauer and Chan 1999; Taverna and Goldstein 2002b; Xia and Levitt 2004). Our work shows that for proteins, in the limiting cases when equation M115 or equation M116, this process can be rigorously described by considering only protein stability, rather than requiring a full analysis of the neutral network (provided, as we have argued is likely to be the case, that the assumption of a roughly constant ΔΔG distribution holds for real proteins as well as it holds for our lattice proteins). In addition, we prove that the number of accumulated mutations depends on whether Nμ [double less-than sign] 1 or [dbl greater-than sign]1. This finding is at odds with the standard prediction (Kimura 1987) of Kimura's neutral theory that the rate of evolution is independent of population size. The reason for this discrepancy is that the standard neutral theory fails to account for the possibility that increasing the population size so that equation M117 can systematically increase the fraction of mutations that are neutral.

A third important contribution of our theory is to use the distribution of ΔΔG values for single mutations to predict the distribution of protein stabilities in an evolving population. Several researchers have pointed out that evolved proteins will be marginally stable simply because most mutations are destabilizing (Arnold et al. 2001; Taverna and Goldstein 2002a); we have described this process quantitatively. In addition, we have shown how the neutral evolution of mutational robustness when equation M118 will shift the proteins toward higher stabilities (as shown in Figure 2), although this increase in stability is limited by the counterbalancing pressure of predominantly destabilizing mutations. The formulas we provide can in principle be combined with experimentally measured ΔΔG values to predict the expected range of stabilities for evolved proteins.

Our work also weds Takahata's concept that fluctuating neutral spaces might overdisperse the molecular clock (Takahata 1987; Cutler 2000b; Bastolla et al. 2002) to a concrete description of how protein neutrality fluctuates during evolution. When equation M119, fluctuations in protein stability can cause an overdispersion in the number of accumulated substitutions that can be calculated from the single-mutant ΔΔG distribution. Furthermore, given our assumption of a roughly constant ΔΔG distribution, we show that the index of dispersion will approach a constant value that is independent of time or mutation rate, but will depend on whether equation M120 or equation M121. Previous simulations have indicated that overdispersion indeed depends on the population size (Bastolla et al. 2002; Wilke 2004)—we have explained this dependence by showing that stability-induced overdispersion does not occur when equation M122 since the population's distribution of stabilities equilibrates as it spreads across many sequences. Mathematically, the difference in the cases equation M123 and equation M124 is that, assuming the ΔΔG distribution remains relatively constant, when the population size is sufficiently large, the distribution of protein stabilities no longer fluctuates in a manner that influences the probability of a substitution (Equation 3 contains μV in the first term on the right side, while Equation 15 does not).

In summary, we have presented a mathematical theory of how thermodynamics shape neutral protein evolution. A major strength of our theory is that it makes quantitative predictions using single-mutant ΔΔG values, which can be experimentally measured. Our work also suggests how neutral and adaptive protein evolution may be coupled through protein thermodynamics. Protein stability represents an important hidden dimension in the evolution of new protein function, since extra stability that is itself neutral can allow a protein to tolerate mutations that confer new or improved functions (Bloom et al. 2006a). Our theory describes the dynamics of protein stability during neutral evolution—adaptive protein evolution is superimposed on these stability dynamics, with proteins most likely to acquire beneficial mutations when they are most stable.

Acknowledgments

We thank Frances H. Arnold for helpful comments and discussion. J.D.B. is supported by a Howard Hughes Medical Institute predoctoral fellowship. C.O.W. is supported by the National Institutes of Health grant AI 065960. A.R. is supported by the National Science Foundation grants CCF 0523643 and FIBR 0527023.

APPENDIX

Here we calculate the properties of the evolving population when equation M131 by analyzing the time-reversed process to compute the mean and variation in the number of mutations in a single randomly chosen protein over time. We show that the results so obtained are identical to those found in the main text, where we analyzed the forward-time process to compute the mean and variation in the number of mutations across the population of evolving proteins.

When equation M132, the population is now never converged to a single sequence, so it is not a priori obvious that the average number of mutations present in the population is equivalent to the expected number of fixed substitutions along the line of descent. In fact, in the limit of very large population sizes there may not even be a common line of descent in relevant time frames, since many new mutations will occur before any given mutation goes to fixation. In the main text we calculated the average number of mutations equation M133, a sequence in the population that has accumulated over the last T generations by treating the forward evolution of the population. Here we trace a randomly chosen protein in the population back in time and show that the average number of substitutions equation M134 that it has accumulated over the last T generations is equal to equation M135. We also show that indexes of dispersion of equation M136 and equation M137 have the same value of RT,∞.

To calculate equation M138, we first define a vector a giving the ancestor distribution (Hermisson et al. 2002): element i of equation M139 gives the probability that a randomly chosen sequence from the population at time T had a predecessor with stability in bin i at time Tt. The transition probabilities of equation M140 when the population is in equilibrium are the discrete time analogue of those computed by Hermisson et al. (2002). From Equation 15, it follows that the fraction of sequences in bin i at time t + 1 that had as their ancestor in the previous generation a sequence in bin j is equation M141. To obtain the probability that a sequence in bin i at time t + 1 had an ancestor in bin j, we have to divide this fraction by the total number of sequences in bin i at time t + 1. When the population is at equilibrium, αt = α and equation M142, where xi is the element from x. Hence, the probability that a sequence in bin i had an ancestor in bin j is equation M143, where we have defined

equation M144
(A1)

The time evolution of a is therefore

equation M145
(A2)

where the matrix H is defined by Equation A1. Equation A2 can be solved to show that the equilibrium value of a is a, satisfying

equation M146
(A3)

If we define equation M147 as the vector with element i giving the probability that a randomly chosen sequence at time T had a predecessor at time Tt in stability bin i and with s substitutions relative to the sequence at time T, then the time evolution for an equilibrated population is

equation M148
(A4)

We can solve Equations A4 and A3 in a manner analogous to the forward process to obtain

equation M149
(A5)

Again defining equation M150 as the probability of having accumulated s substitutions as one moves back t generations from time T, we obtain the binomial distribution

equation M151
(A6)

Comparison of Equation 17 and Equation A6 shows that they are identical. Therefore, all moments computed from the two distributions must be equal. In particular, this proves that equation M152 = equation M153 and that the corresponding indexes of dispersion have the same value of RT,∞ defined by Equation 19. This shows that when equation M154, we expect equivalent results regardless of whether we average over the number of mutations in all sequences present in the population or randomly choose a single sequence and trace back along its ancestor distribution.

References

  • Akashi, H., 2003. Translational selection and yeast proteome evolution. Genetics 164: 1291–1303. [PMC free article] [PubMed]
  • Anfinsen, C. B., 1973. Principles that govern the folding of protein chains. Science 181: 223–230. [PubMed]
  • Arnold, F. H., P. L. Wintrode, K. Miyazaki and A. Gershenson, 2001. How enzymes adapt: lessons from directed evolution. Trends Biochem. Sci. 26: 100–107. [PubMed]
  • Bastolla, U., M. Porto, H. E. Roman and M. Vendruscolo, 2002. Lack of self-averaging in neutral evolution of proteins. Phys. Rev. Lett. 89: 208101. [PubMed]
  • Bloom, J. D., J. J. Silberg, C. O. Wilke, D. A. Drummond, C. Adami et al., 2005. Thermodynamic prediction of protein neutrality. Proc. Natl. Acad. Sci. USA 102: 606–611. [PMC free article] [PubMed]
  • Bloom, J. D., S. T. Labthavikul, C. R. Otey and F. H. Arnold, 2006. a Protein stability promotes evolvability. Proc. Natl. Acad. Sci. USA 103: 5869–5874. [PMC free article] [PubMed]
  • Bloom, J. D., D. A. Drummond, F. H. Arnold and C. O. Wilke, 2006. b Structural determinants of the rate of protein evolution in yeast. Mol. Biol. Evol. 23: 1751–1761. [PubMed]
  • Bornberg-Bauer, E., and H. S. Chan, 1999. Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc. Natl. Acad. Sci. USA 96: 10689–10694. [PMC free article] [PubMed]
  • Broglia, R. A., G. Tiana, H. E. Roman, E. Vigezzi and E. Shakhnovich, 1999. Stability of designed proteins against mutations. Phys. Rev. Lett. 82: 4727–4730.
  • Chamary, J. V., and L. D. Hurst, 2005. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 6: R75. [PMC free article] [PubMed]
  • Chan, H. S., and E. Bornberg-Bauer, 2002. Perspectives on protein evolution from simple exact models. Appl. Bioinformatics 1: 121–144. [PubMed]
  • Chiti, F., N. Taddei, M. Bucciantini, P. White, G. Ramponi et al., 2000. Mutational analysis of the propensity for amyloid formation by a globular protein. EMBO J. 19: 1441–1449. [PMC free article] [PubMed]
  • Cutler, D. J., 2000. a The index of dispersion of molecular evolution: slow fluctuations. J. Theor. Popul. Biol. 57: 177–186. [PubMed]
  • Cutler, D. J., 2000. b Understanding the overdispersed molecular clock. Genetics 154: 1403–1417. [PMC free article] [PubMed]
  • Davidson, A. R., K. J. Lumb and R. T. Sauer, 1995. Cooperatively folded proteins in random sequence libraries. Nat. Struct. Biol. 2: 856–864. [PubMed]
  • DePristo, M. A., D. M. Weinreich and D. L. Hartl, 2005. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat. Rev. Genet. 6: 678–687. [PubMed]
  • Drummond, D. A., J. D. Bloom, C. Adami, C. O. Wilke and F. H. Arnold, 2005. Why highly expressed proteins evolve slowly. Proc. Natl. Acad. Sci. USA 102: 14338–14343. [PMC free article] [PubMed]
  • England, J. L., and E. I. Shakhnovich, 2003. Structural determinant of protein designability. Phys. Rev. Lett. 90: 218101. [PubMed]
  • Ewens, W. J., and G. R. Grant, 2005. Statistical Methods in Bioinformatics, Ed. 2. Springer, New York.
  • Fersht, A. R., 1999. Structure and Mechanism in Protein Science.W. H. Freeman, New York.
  • Giver, L., A. Gershenson, P. O. Freskgard and F. H. Arnold, 1998. Directed evolution of a thermostable esterase. Proc. Natl. Acad. Sci. USA 95: 12809–12813. [PMC free article] [PubMed]
  • Godoy-Ruiz, R., R. Perez-Jimenez, B. Ibarra-Molero and J. M. Sanchez-Ruiz, 2004. Relation between protein stability, evolution and structure as probed by carboxylic acid mutations. J. Mol. Biol. 336: 313–318. [PubMed]
  • Goldman, N., and Z. Yang, 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11: 725–736. [PubMed]
  • Govindarajan, S., and R. A. Goldstein, 1997. The foldability landscape of model proteins. Biopolymers 42: 427–438. [PubMed]
  • Guo, H. H., J. Choe and L. A. Loeb, 2004. Protein tolerance to random amino acid change. Proc. Natl. Acad. Sci. USA 101: 9205–9210. [PMC free article] [PubMed]
  • Hermisson, J., O. Redner, H. Wagner and E. Baake, 2002. Mutation-selection balance: ancestry, load, and maximum principle. Theor. Popul. Biol. 62: 9–46. [PubMed]
  • Huntzicker, E. G., I. S. Estay, H. Zhen, L. A. Lokteva, P. K. Jackson et al., 2006. Dual degradation signals control Gli protein stability and tumor formation. Genes Dev. 20: 276–281. [PMC free article] [PubMed]
  • Huynen, M. A., P. F. Stadler and W. Fontana, 1996. Smoothness within ruggedness: the role of neutrality in adaptation. Proc. Natl. Acad. Sci. USA 93: 397–401. [PMC free article] [PubMed]
  • Jaswal, S. S., J. L. Sohl, J. H. Davis and D. A. Agard, 2002. Energetic landscape of α-lytic protease optimizes longevity through kinetic stability. Nature 415: 343–347. [PubMed]
  • Jukes, T. H., and C. R. Cantor, 1969. Evolution of protein molecules, pp. 21–132 in Mammalian Protein Metabolism, Vol. III. Academic Press, New York.
  • Keefe, A. D., and J. W. Szostak, 2001. Functional proteins from a random-sequence library. Nature 410: 715–718. [PubMed]
  • Kimura, M., 1983. The Neutral Theory of Molecular Evolution.Cambridge University Press, Cambridge, UK.
  • Kimura, M., 1987. Molecular evolutionary clock and the neutral theory. J. Mol. Evol. 26: 24–33. [PubMed]
  • Kumar, M. D., K. A. Bava, M. M. Gromiha, P. Parabakaran, K. Kitajima et al., 2006. ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res. 34: D204–D206. [PMC free article] [PubMed]
  • Lesk, A. M., and C. Chothia, 1980. How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J. Mol. Biol. 136: 225–270. [PubMed]
  • Li, H., R. Helling, C. Tang and N. Wingreen, 1996. Emergence of preferred structures in a simple model of protein folding. Science 273: 666–669. [PubMed]
  • Loeb, D. D., R. Swanstrom, L. Everitt, M. Manchester, S. E. Stamper et al., 1989. Complete mutagenesis of the HIV-1 protease. Nature 340: 397–400. [PubMed]
  • Matthews, B. W., 1993. Structural and genetic analysis of protein stability. Annu. Rev. Biochem. 62: 139–160. [PubMed]
  • Miyazawa, S., and R. L. Jernigan, 1985. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18: 534–552.
  • Ohta, T., and M. Kimura, 1971. On the constancy of the evolutionary rate of cistrons. J. Mol. Evol. 1: 18–25. [PubMed]
  • Pakula, A. A., V. B. Young and R. T. Sauer, 1986. Bacteriophage λ cro mutations: effects on activity and intracellular degradation. Proc. Natl. Acad. Sci. USA 83: 8829–8833. [PMC free article] [PubMed]
  • Pal, C., B. Papp and L. D. Hurst, 2001. Highly expressed genes in yeast evolve slowly. Genetics 158: 927–931. [PMC free article] [PubMed]
  • Rocha, E. P. C., and A. Danchin, 2004. An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol. Biol. Evol. 21: 108–116. [PubMed]
  • Sanchez, I. E., J. Tejero, C. Gomez-Moreno, M. Medina and L. Serrano, 2006. Point mutations in protein globular domains: contributions from function, stability, and misfolding. J. Mol. Biol. 363: 422–432. [PubMed]
  • Serrano, L., A. G. Day and A. R. Fersht, 1993. Step-wise mutation of barnase to binase: a procedure for engineering increased stability of proteins and an experimental analysis of the evolution of protein stability. J. Mol. Biol. 233: 305–312. [PubMed]
  • Shortle, D., and B. Lin, 1985. Genetic analysis of staphylococcal nuclease: identification of three intragenic “global” suppressors of nuclease-minus mutations. Genetics 110: 539–555. [PMC free article] [PubMed]
  • Smith, J. M., 1970. Natural selection and the concept of a protein space. Nature 225: 563–564. [PubMed]
  • Somero, G. N., 1995. Proteins and temperature. Annu. Rev. Physiol. 57: 43–68. [PubMed]
  • Takahata, N., 1987. On the overdispersed molecular clock. Genetics 116: 169–179. [PMC free article] [PubMed]
  • Taverna, D. M., and R. A. Goldstein, 2002. a Why are proteins marginally stable? Proteins 46: 105–109. [PubMed]
  • Taverna, D. M., and R. A. Goldstein, 2002. b Why are proteins so robust to site mutations? J. Mol. Biol. 315: 479–484. [PubMed]
  • Tiana, G., R. A. Broglia and E. I. Shakhnovich, 2000. Hiking in the energy landscape in sequence space: a bumpy road to good folders. Proteins 39: 244–251. [PubMed]
  • Uversky, V. N., C. J. Oldfield and A. K. Dunker, 2005. Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling. J. Mol. Recognit. 18: 343–384. [PubMed]
  • van den Burg, B., G. Vriend, O. R. Veltman, G. Venema and V. G. H. Eusink, 1998. Engineering an enzyme to resist boiling. Proc. Natl. Acad. Sci. USA 95: 2056–2060. [PMC free article] [PubMed]
  • van Nimwegen, E., J. P. Crutchfield and M. Huynen, 1999. Neutral evolution of mutational robustness. Proc. Natl. Acad. Sci. USA 96: 9716–9720. [PMC free article] [PubMed]
  • Wells, J. A., 1990. Additivity of mutational effects in proteins. Biochemistry 29: 8509–8517. [PubMed]
  • Wilke, C. O., 2004. Molecular clock in neutral protein evolution. BMC Genet. 5: 25. [PMC free article] [PubMed]
  • Wilke, C. O., J. D. Bloom, D. A. Drummond and A. Raval, 2005. Predicting the tolerance of proteins to random amino acid substitution. Biophys. J. 89: 3714–3720. [PMC free article] [PubMed]
  • Wingreen, N. S., H. Li and C. Tang, 2004. Designability and thermal stability of protein structures. Polymer 45: 699–705.
  • Xia, Y., and M. Levitt, 2004. Funnel-like organization in sequence space determines the distributions of protein stability and folding rate preferred by evolution. Proteins 55: 107–114. [PMC free article] [PubMed]
  • Zhao, H., and F. H. Arnold, 1999. Directed evolution converts subtilisin E into a functional equivalent of thermitase. Protein Eng. 12: 47–53. [PubMed]
  • Zuckerkandl, E., and L. Pauling, 1965. Evolutionary divergence and convergence in proteins, pp. 97–166 in Evolving Genes and Proteins.Academic Press, New York.

Articles from Genetics are provided here courtesy of Genetics Society of America

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...