• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Phys Chem B. Author manuscript; available in PMC Sep 2, 2009.
Published in final edited form as:
PMCID: PMC2736680
NIHMSID: NIHMS112367

One-Dimensional Barrier-Preserving Free-Energy Projections of β-sheet Miniprotein: New Insights into the Folding Process

Abstract

The conformational space of a 20-residue three-stranded antiparallel β-sheet peptide (double hairpin) was sampled by equilibrium folding/unfolding molecular dynamics simulations for a total of 20 µs. The resulting one-dimensional free-energy profiles (FEPs) provide a detailed description of the free-energy basins and barriers for the folding reaction. The similarity of the FEPs obtained using the probability of folding before unfolding (p fold) or the mean first passage time supports the robustness of the procedure. The folded state and the most populated free-energy basins in the denatured state are described by the one-dimensional FEPs, which avoid the overlap of states present in the usual one- or two-dimensional projections. Within the denatured state, a basin with fluctuating helical conformations and a heterogeneous entropic state are populated near the melting temperature at about 11% and 33%, respectively. Folding pathways from the helical basin or enthalpic traps (with only one of the two hairpins formed) reach the native state through the entropic state, which is on-pathway and is separated by a low barrier from the folded state. A simplified equilibrium kinetic network based on the FEPs shows the complexity of the folding reaction and indicates, as augmented by additional analyses, that the basins in the denatured state are connected primarily by the native state. The overall folding kinetics shows single-exponential behavior because barriers between the non-native basins and the folded state have similar heights.

Introduction

Protein and peptide folding from the very broad ensemble of denatured conformations to the well-defined native state is a very complex unimolecular reaction because of the many degrees of freedom of the system.1 The conformational transitions involved, like those of other chemical reactions, are governed by the free-energy surface.2 During the folding process, the loss of configurational entropy of the protein chain is approximately counterbalanced by the more favorable interactions among the protein atoms, modulated by the effect of the solvent. Thus, although the enthalpy and entropy of folding can be large at physiological temperatures, the free energy stabilizing the native state is normally only 10 kcal/mol or less, independent of the size of the protein.3 The loss of configurational entropy during folding is thought to be primarily responsible for the experimental activation barrier, observed in many so-called two-state proteins.4,5 Consequently, the major role played by the entropic contributions in protein folding, in contrast to most simple reactions,6 requires an analysis of the free-energy surface; that is, knowledge of the potential energy surface is not sufficient.610

Although a protein has many degrees of freedom (with the ϕ and ψ backbone dihedral angles of the amino acids being particularly important), the common way to investigate the free-energy surface is to display it as a function of a small (usually only one to two) number of order parameters. A commonly used coordinate is the fraction of native contacts, Q.11 Q appears to be a satisfactory approximate reaction coordinate for Go-model proteins12,13 because favorable interactions occur only between residues in contact in the folded state.14 On the other hand, for transferable potentials (e.g., those based on physicochemical principles, such as AMBER, CHARMM, and OPLS) or statistical potentials,15,16 Q is adequate only for the fully folded (Q = 1) state.17 For example, for a structured peptide simulated by a transferable force field, some conformations with Q ≈ 0.7 belong to the denatured state ensemble, and conformations with Q ≈ 0.3 belong to the folded state.18,19

The multidimensionality of the system makes the choice of order parameters for presenting the free-energy surface very important and often leads investigators to compare the results from several different sets. Moreover, the likelihood of hiding essential information concerning the free-energy surface by the commonly used projections has led to a search for alternative methods that are useful for studying protein folding, as well as other complex reactions. One approach that was introduced just 10 years ago is based on disconnectivity graphs20 (also see ref 21). The unprojected free-energy surface is represented by a disconnectivity graph calculated from an equilibrium folding trajectory with the minimum cut (mincut) or balanced minimum cut (bmincut) procedure.22 The idea of the method is to group the coordinate sets into free-energy minima, according not to the standard geometric characteristics, but rather to the equilibrium dynamics; that is, the trajectory is used to determine the populations of the states, which provide the relative free energies and the rates of the transition between the states, which yield the free-energy barriers. Application of the method to the β-hairpin of protein G demonstrated that the free-energy surface has multiple low free-energy basins in the denatured state, in addition to the native basin, results that complement the analysis of the experimental observation of two-state folding.23 The same simulation data as were used to reveal the complexity of the denatured state show a relatively smooth free-energy landscape when projected onto a few geometrical coordinates.22 This result demonstrates that, to obtain a projection that gives an accurate description of the essential aspects of the free-energy surface, different progress coordinates are required.

Projected free-energy surfaces are most useful if they preserve the barriers and minima in the order in which they are encountered during folding/unfolding events. Recently, a new progress coordinate that has some of the desired properties was introduced.24 It uses the (normalized) partition function of a given region as the progress coordinate and determines the free-energy barriers as a function of the coordinate by a method based on pfold, defined as the probability of reaching the folded state before an unfolded conformation.25 The result is a one-dimensional projected free-energy profile (FEP) that preserves the barriers between the free-energy basins; given the barriers, the minima can be determined.24 The method was applied to the β-hairpin of protein G, using root-mean-square deviation (rmsd) clustering, and the conclusions concerning the multiminimum character of the free-energy surface obtained from the disconnectivity graph analysis22 were confirmed.

It is of interest to apply the methodology just described to a system that is more complex than the β-hairpin. An excellent candidate is the β-sheet miniprotein, called Beta3s.26 Its structure corresponds to a three-stranded antiparallel β-sheet consisting of two β-hairpins.27 It has been shown to fold to the native structure determined by NMR spectroscopy27 in molecular dynamics simulations with a polar hydrogen molecular mechanics potential function modulated by a simple implicit solvent model.26 Because folding simulations of this system are very fast (for example, close to the melting temperature, the folding time is about 100 ns and requires about 24 h on an Athlon 1.7 GHz computer), many studies have been conducted to elucidate the folding mechanism. Two main folding pathways were observed: one begins with formation of the C-terminal β-hairpin, followed by association of the N-terminal strand on the preformed β-hairpin, and the other follows the symmetry-related pathway (first formation of N-terminal β-hairpin).26 Conformations in the denatured state of Beta3s were shown to contain a significant amount of non-native contacts,28 and the folding mechanism of Beta3s had a weak temperature dependence.29 Moreover, because of the very efficient implicit solvent model, multiple simulations of Beta3s and 32 single-point mutants (totaling 0.65 ms) were performed to directly calculate ϕ values30 from folding and unfolding rates extracted from equilibrium folding/unfolding trajectories.31 More recently, an earlier network analysis of Beta3s32 was extended to determine free-energy basins.19 Secondary structure was used to coarse-grain and label the conformations visited in the simulations. The folded state and the most populated free-energy basins in the denatured state were isolated by grouping conformations according to fast relaxation in equilibrium trajectories, a procedure called kinetic grouping analysis (KGA).19 The comparative application of KGA to Beta3s and a central-strand mutant thereof (W10V) revealed how a single-point mutation can alter the population of native and non-native basins, as well as the relative accessibility of parallel folding pathways.19 It was shown that only one parameter is required for grouping, namely, the commitment time τcommit, which is chosen as a typical relaxation time within the basins of the investigated system. Any two conformations are grouped into the same basin if they interconvert within the time τcommit with a probability, pcommit, of ≥0.5. In other words, two conformations are said to be separated by a short kinetic distance if the interconversion between them is fast, which implies that the free-energy barrier between them is low. Kinetic distance is used here, in analogy to the previously introduced term of kinetic closeness,11 to distinguish what is being described from structural distance.

In the present work, several different, but related, approaches for determining one-dimensional FEPs were applied to Beta3s and the W10V mutant using 20 µs of sampling for each peptide at 330 K, which is slightly above the melting temperature, to obtain adequate sampling of the native and denatured regions of the free-energy surface. The method bpfold, described in ref 24, is based on pfold. As in the balanced mincut method,22 which finds exact barriers separating individual basins, an extra node is introduced to represent the unfolded state. The extra node is connected to all nodes in the network with a capacity proportional to a Lagrange multiplier λ. For different values of λ, different partitions into two basins with pi < 0.5 and pi > 0.5 are obtained in the bpfold procedure. One approximation to bpfold, referred to as pfoldf (which stands for pfold fast), requires only one value of λ.24 Two new related approaches were used in this work for comparison. One is called pfoldt and is based on pfoldcommit), which is defined as the probability of reaching the folded state within the time τcommit.3335 In the second procedure, the mean first passage time (mfpt) to the native state is used; the procedure is called mfpt. The main difference from pfoldf is that the calculations of the progress variables, pfoldcommit) and mfpt, depend only on the native node; that is, no extra node needs to be added to represent the unfolded state in these procedures. Conversely, only pfoldf is suitable for calculating the barrier between two existing nodes, because only one node can be specified in the mfpt and pfoldt procedures. Further, evaluation of the exact pfoldcommit) values is computationally more expensive than pfold or mfpt calculations, as detailed in the Methods section and the Supporting Information. All three approaches applied here have in common that they encode the kinetic distance to the folded (or any other representative) state and are therefore expected to give similar results. Indeed, the one-dimensional FEPs were found to be very similar and to approximate the exact mincut barrier equally well, underlining the robustness of the methods. The significance of this result is discussed. Further, we compare the results using secondary structure clustering with those obtained with rmsd clustering. It is shown that the barriers from the former tend to be lower than those obtained with the latter; with a proper choice of the rmsd value used for clustering (here found to be 2.5 Å for all-atoms), the resulting Monte Carlo (MC) kinetics agrees with that calculated directly from the trajectories.

Interestingly, a helical state with a statistical weight of about 11% is identified by the three procedures as a free-energy basin separated by a barrier from the rest of the denatured ensemble. The implications of non-native secondary structure content in the denatured state of a β-sheet peptide are briefly discussed. Further, both KGA and one-dimensional FEPs reveal a large and heterogeneous entropic region (weight of 33%) that is separated by a barrier of less than kBT from the native state (weight of 35%). The single-exponential behavior of Beta3s folding is shown to be due to the similar free-energy barriers to exit from the non-native enthalpic traps (total population of about 20%) or from the helical basin (weight of 11%), which is primarily stabilized by its entropy.

Methods

Molecular Dynamics Simulations

All simulations and most of the analysis of the trajectories were performed with the program CHARMM;36 the rest of the analysis was done with the program WORDOM,37 which is particularly efficient in handling large sets of trajectories. The designed 20-residue peptide Beta3s27 (Thr1-Trp2-Ile3-Gln4-Asn5-Gly6-Ser7-Thr8-Lys9-Trp10-Tyr11-Gln12-Asn13-Gly14-Ser15-Thr16-Lys17-Ile18-Tyr19-Thr20) and its W10V mutant were modeled by explicitly considering all heavy atoms and the hydrogen atoms bound to nitrogen or oxygen atoms (PARAM19 force field38 with a default cutoff of 7.5 Å for the nonbonding interactions). A mean-field approximation based on the solvent-accessible surface (SAS) was used to describe the main effects of the aqueous solvent.39 It has been shown previously that this model for the solvated Beta3s peptide yields reversible folding at 330 K to the NMR conformation, irrespective of the starting structure; 23 of the 26 nuclear Overhauser effect (NOE) constraints are satisfied.26 Moreover, despite the neglect of collisions with water molecules (frictional effects) in the simulations with the implicit solvent model, the relative rates of folding for different secondary structural elements are comparable to the values observed experimentally; i.e., helices fold in about 1 ns,40 β-hairpins in about 10 ns,40 and triple-stranded β-sheets in about 100 ns,31 compared to experimental values of ~0.1,41 ~1,41 and ~10 µs,27 respectively. For Beta3s and the W10V mutant, 10 molecular dynamics runs of 2 µs each with different initial distributions of velocities were performed with the Berendsen thermostat (coupling constant of 5 ps) at 330 K, which is slightly above the melting temperature of Beta3s.29 A time step of 2 fs was used, and the coordinates were saved every 20 ps, for a total of 106 snapshots for each system. This required three weeks on a 20-CPU cluster. Using explicit water simulations, it would have been much more time-consuming to obtain the 40 µs of simulation time required to sample a statistically significant number of equilibrium folding/unfolding transitions.

Coarse-Graining and Equilibrium Kinetic Network (EKN)

For the purpose of using the finite-time simulation data to obtain free-energy surfaces, it is necessary to coarse-grain the snapshots in some way because each conformation is visited only once; the trajectory, per se, is nothing but a long string of configurations. There are several meaningful methods for clustering individual coordinate sets from the trajectory to obtain coarsegrained conformations, and different approaches are likely to be most useful for different types of analysis. For a system such as Beta3s or a β-hairpin, rmsd and secondary structural coarse-graining are obvious possibilities.22,32,42 The coarse-graining used in this work is based on secondary structure strings.43 A “coarsegrained conformation” or node is a single string of secondary structure; for example, the most populated conformation of Beta3s, which corresponds to the native state, is —EEEESSEEEEEESSEEEE—.32 There are eight possible “letters” in the secondary structure “alphabet”: H, G, I, E, B, T, S, and —, standing for α-helix, 310-helix, π-helix, extended, isolated β-bridge, hydrogen-bonded turn, bend, and unstructured, respectively.43 Because the N- and C-terminal residues are always assigned as unstructured,43 a 20-residue peptide can, in principle, assume 818 ≈ 1016 conformations. We note that there is no relation between the Hamming distance (number of different entries, i.e., letters, in two strings of equal length) and the kinetic distance (see also the Results section). The secondary-structure-based coarse-graining is used to permit comparison with earlier work.19 It has the advantage over approaches based on the rmsd of the atomic coordinates of being more efficient because it scales with the number of snapshots whereas rmsd is a pairwise measure. Also, each node is uniquely defined by its secondary structure string, which serves as a useful conformational “label”. However, as we show here, secondary-structure-based coarse-graining can, in some cases, lead to overlapping of regions that are distant in terms of rmsd (i.e., far from each other in configuration space), which can result in barriers for the basins that are too low; this is the case for Beta3s and is referred to as “pseudotunneling” hereafter.

The number of snapshots with a given secondary structure string i is called the weight of the node and is denoted as w. The statistical weight w of a node is given by w = w/N, where N = 106 is the total number of snapshots. In the same way, the links, which are direct transitions sampled along the MD trajectory, are weighted by nji, defined as the number of times a snapshot in node i is followed by a snapshot in node j. As mentioned above, snapshots were saved every 20 ps, which is therefore the time interval of a direct transition. The resulting equilibrium kinetic network (EKN)24,44 is an undirected, weighted graph where the edge capacity from node j to node i in the network, cij, is proportional to the number of direct transitions from j to i at equilibrium. Detailed balance can be “imposed”, i.e., cij = cji = (nij + nji)/2. The transition probabilities can then be calculated as pij = cijk ckj.

For a node i in the EKN, the partition function is Zi = Σj cij. If the nodes of the network are partitioned into two groups A and B, then ZA = Σi[set membership]A Zi, ZB = Σi[set membership]B Zi, ZAB = Σi[set membership]A,j[set membership]B cij, and the free energy of the barrier between the two groups is — kT log(ZAB/Z), where Z is the partition function of the full network (Figure 1).

Figure 1
Schematic illustration of the one-dimensional FEP procedure using mfpt as the progress variable. Each of the four solid circles represents a free-energy basin, and the concentric dashed circles represent values of mfpt. For each value of mfptc between ...

One-Dimensional FEP

The pfoldf procedure to determine the one-dimensional FEP was published previously;24 two additional procedures, pfoldt and mfpt, are introduced here. Table 1 lists details of all of the procedures used to calculate FEPs in this study.

TABLE 1
Overview of the Four Procedures Discussed in the Text for Determining FEPsa

Pfoldf

Given the EKN and two nodes A and B, the pfold value of node i, pi, is found as the solution of the equation pi = Σj pjipj with boundary conditions pA = 1 and pB = 0 (A is considered to be the “native node” and B the “denatured node”). The system of equations can be solved efficiently numerically by iterative multiplication of the vector pj by the matrix pji.24

To determine the FEP relative to a chosen node A, node B is considered to be the representative node of everything not belonging to the basin of A. However, in many systems, a node such as B does not exist, because there are multiple basins and/or an entropic state that cannot be represented by a single node. Both occur in the peptides investigated in this study. Thus, as in the balanced minimum-cut procedure,22 an extra node B is introduced and connected to all nodes in the network with capacity λ w, where λ is a Lagrange multiplier. The pfold calculations are performed on the EKN with the extra node, and in the pfoldf procedure, the nodes are sorted according to their pfold values using only one value of λ. The assumption in this procedure is that the order of the nodes does not change for different values of λ.24 Each value pc between 0 and 1 can then be used to cut the network into set A containing all nodes with pfold > pc and set B containing the nodes with pfold < pc. For each cut, a point [x = ZA/Z, y = −kT ln(ZAB/Z)] of the FEP is obtained; ZA/Z is used as the progress coordinate, and ZAB is the number of EKN transitions between the two sets. Note that pfold is the progress variable used to divide the configuration space, with pfoldf evaluation based on this variable. Moreover, this procedure and those described below do not require any special treatment of low-population nodes (i.e., secondary structure strings with only one or a few snapshots) because they are automatically grouped on the same side of the barrier as the reference node A if they satisfy the condition pfold > pc.

Pfoldt

The extra node required by the pfoldf procedure is not necessary if pfold is calculated not between two representative nodes A and B, but rather with a commitment time τcommit, referred to as pfoldcommit) and defined as the probability of reaching A within τcommit.3335 The calculation of pfoldcommit) values for all nodes in the EKN (with the initial boundary condition pA = 1) is more complex than that for pfoldf; details are given in the Supporting Information. Once pfoldcommit) has been evaluated for all nodes, the procedure is the same as for pfoldf: The nodes are sorted according to pfoldcommit) values and split into sets with pfoldcommit) > pc and pfoldcommit) < pc. For each pc between 0 and 1, the pair [ZA/Z, −kT ln(ZAB/Z)] is a point on the FEP. The choice of τcommit has to be long enough to assign nonzero pfoldcommit) values to nodes that are kinetically very far from the node under consideration (i.e., the native node or any other node of interest) to resolve all other states. For this purpose, very long commitment times are appropriate. However, it is computationally most convenient to choose τcommit values as short as possible. Thus, for each basin, one starts with a short τcommit and increases it until the whole profile is covered, as illustrated in Figure S1 of the Supporting Information. For instance, 20 ns is long enough if pfoldcommit) is calculated with respect to the native node, but τcommit = 200 ns is needed if the values are calculated with respect to a node in the helical region. All pfoldt profiles shown in Figure 3 below were produced with values in this range (20–200 ns). Typically, the upper limit of τcommit is on the order of the overall relaxation time (τfolding + τunfolding) of the system, which is about 200 ns for Beta3s.

Figure 3
One-dimensional FEPs calculated using the kinetic distance from individual basins of Beta3s. Notably, the barriers separating the reference state from the rest are almost identical for pfoldf with λ = 0.0001 (black), pfoldt (red), and mfpt (green), ...

Mfpt

Another variable used in this work to project the free energy is the mean first passage time (mfpt; see Figure 1) to node A (representative nodes of significantly populated basins were used in Figure 3, but any node can be used as a reference).45 Given the original EKN (i.e., without an extra node), the mfpt of node i is the solution of the equation mfpti = Δt + Σ(pji × mfptj) with initial boundary condition mfptA = 0.46 The time step Δt corresponds to the saving frequency of 20 ps; that is, the mfpt of a node is defined as one time step plus the weighted average of the mfpt values of its adjacent nodes. In contrast to the other progress variables, the mfpt has an explicit time dependence through the occurrence of the time step in the equations. The resulting system of linear equations differs from that of pfoldf by the Δt constant and the boundary conditions; in pfoldf, the boundary conditions are pA = 1 and pB = 0, whereas there is only one condition pA = 1 or mfptA = 0 for pfoldt or mfpt, respectively. Therefore, both pfoldf and mfpt equations can be solved with the same efficiency by iterative multiplication. Similarly to pfoldt, mfpt does not require an extra node, because mfpt is defined not between a pair of nodes, but only with respect to one selected node. To calculate the FEP, the nodes are sorted according to their mfpt values. For any mfptc between 0 and max(mfpt), a point [ZA/Z, −kT 1n(ZAB/Z)] on the FEP can be calculated, where A is the set of all nodes with mfpti<mfptc and B is the set of nodes with mfpti > mfptc (Figure 1).

Implementation

In practice, the procedure to calculate the one-dimensional FEP consists of four steps: (1) Detailed balance is imposed on the equilibrium kinetic network (EKN), i.e., cij = cji = (nij+nji)/2, where nij is the number of direct transitions (i.e., transitions between two MD snapshots separated by the time interval Δt, which is the inverse of the MD saving frequency) from node j to node i. The transition probabilities are then calculated as pij = cijk ckj. (2) The system of equations with appropriate boundary condition(s) is solved numerically. (3) Nodes are sorted according to increasing values of mfpt or decreasing values of pfoldf or pfoldt; for each value of the progress variable, the relative partition function ZA and the cut ZAB are calculated. (4) The individual points on the profile are evaluated as [x = ZA/Z, y = −kT ln(ZAB/Z)].

Identification of Basins

The kinetic grouping analysis (KGA) groups conformations according to fast relaxation at equilibrium.19 More explicitly, two coarse-grained conformations are grouped if, along the molecular dynamics trajectory, their snapshots interconvert in more than 50% of the cases within a commitment time τcommit, which represents a typical relaxation time within basins of the investigated system; the value used in this study was 1 ns. The basins obtained by KGA can be compared to those isolated from FEPs. To isolate a basin with a FEP, the unfolding profile from a node in that basin (usually its most visited node) is plotted, as shown in Figure 3. In practice, the procedure is the same as that used with the native basin as the reference, except that the native node is replaced by the new node. All nodes lying on the left of the cut at the first barrier correspond to the basin. Basins lying on the right of the first barrier are potentially overlapping (Figure 1), so each basin requires a separate unfolding profile.

Transition Disconnectivity Graph (TRDG)

The TRDG is a variant of the free-energy disconnectivity graph, which provides an unprojected representation of the free-energy surface.44 The partition function of the free-energy barrier separating states i and j, Zij, is equal to the value of the mincut between the states in the network, which can be calculated by the Ford-Fulkerson algorithm.47 After the mincuts (i.e., the free-energy barriers) between every pair of nodes have been calculated, which can be done with only n – 1 total mincuts for n nodes by using the Gomory-Hu algorithm,48 the TRDG is constructed to obtain a detailed representation of the free-energy surface. Following Becker and Karplus20 and using the relation Fij = −kT ln(Zij), one starts with the largest Zij value (smallest Fij value) and successively connects states in order of decreasing Zij (increasing Fij). The TRDG is useful for visualizing basins containing representative nodes (enthalpic basins), but basins that have no such nodes (entropic basins) are not visible. To resolve such basins, one has to use either the balanced mincut procedure22 or the free-energy profiles discussed above.

Results

All analyses are based on a set of of 10 2.5−µs equilibrium simulations at 330 K started from the folded state. The first 0.5 µs of each run was neglected so that a total simulation time of 20 µs was sampled for each of the two peptides (see Methods). The wild-type Beta3s peptide visited 262 433 conformations (unique strings of secondary structure) with a total of 534 383 direct transitions. The W10V mutant visited 245 032 conformations with a total of 476 721 direct transitions. In Beta3s (W10V), only 62 446 (56 118) conformations were visited more than once. The most populated conformation of Beta3s and its W10V mutant had statistical weights of 5.6% and 8.8%, respectively. It was the three-stranded antiparallel β-sheet with type II′ turns at residues 6–7 and 14–15 (secondary structure string –EEEESSEEEEEESSEEEE–), which corresponds to the native state determined by NMR spectroscopy.27 Totals of 120 and 105 folding events (i.e., visits to the native node) were observed for Beta3s and W10V, respectively, with an average folding time of about 0.1 µs for both peptides.

FEP of Beta3s and W10V

The pfoldf FEP, based on pfold values, was calculated by projecting the free energy on the relative partition function ZA/Z, which is a progress coordinate that increases monotonically with the distance from the reference state;24 see Methods. pfoldf finds approximate barriers between the reference state and the denatured state, whereas exact values can be obtained by the mincut procedure.22 An essential attribute of ZA/Z is that it takes into account all routes from the initial state to the final state without any prejudgment as to the geometric coordinates or pathways involved, so that the FEP is determined by an unbiased procedure. Figure 2 shows the results for both Beta3s and W10V. Three main regions are identified on the FEP of Beta3s when the most populated (native) node is used as reference: native state (ZA/Z < 0.35), denatured state with several enthalpic subbasins (0.35 ≤ ZA/Z ≤ 0.88), and helical basin (ZA/Z > 0.88). The pfoldf FEP procedure yields an accurate description of the reference basin (native state in Figure 2), whereas there can be some overlap between different basins after the first barrier (i.e., for ZA/Z > 0.35). Overlap occurs whenever nodes belonging to different basins have similar pfold values (Figure 1). Unfolding profiles (see Methods) are able to fully resolve all the basins. Such unfolding profiles from representative nodes in the subbasins of Figure 2 are plotted in Figure 3 to accurately characterize each basin by eliminating overlap with other regions. This makes possible the determination of accurate barriers between the respective basins and the exact population of each basin. In each unfolding profile of Figure 3, the exact value of the barrier, calculated by mincut,22 is indicated by an open circle. The plots show that pfoldf approximates the barriers very well for the Beta3s system. (For a description of the pfoldt and mfpt results, see below.) Notably, the most populated basins isolated by this unbiased procedure24 correspond to those in Table 1 of ref 19 and are quantitatively compared in Table 2 of the present work. Figure 4 qualitatively illustrates the nodes, the basins, and their connectivity in the conformational state network.32 The basins determined by pfoldf are discriminated by different colors and shapes in the network, where brown nodes belong to the entropic state. Note that the colors in the network (i.e., basins defined by pfoldf) are in good agreement with the network in Figure 1 of ref 19 for KGA results. Table 3 contains effective and free-energy values of the six basins and the entropic state. Clearly, there are low-enthalpy, low-entropy basins (native, Ns-or, Cs-or, and the two curls), as well as high-enthalpy, high-entropy basins (helical and entropic states); the origin of the high entropy of the helical basin is discussed below. Except for the helical basin, the agreement between the FEP procedures and the KGA19 is very good, and thus, the two approaches validate each other. Furthermore, essentially identical basins are isolated by either of the two procedures using secondary structure or all-atom 2.5-Å rmsd coarse-graining (as shown in Table S-I of the Supporting Information for pfoldf).

Figure 2
pfoldf-calculated FEP of Beta3s and its single-point mutant W10V, whose plot is shifted by −3 kcal/mol to avoid overlap of the curves. The progress coordinate is the relative partition function for different values of pc (see text and Table 1 ...
Figure 4
Conformational space network of Beta3s. Each node (i.e., conformation) of the network represents a secondary structure string. The surface of each node is proportional to its statistical weight, and only the 1430 nodes containing at least 40 snapshots ...
TABLE 2
Results of the KGA and pfoldf Procedure for the Most Populated Basins of Beta3sa
TABLE 3
Energetic and Entropic Contributions to Basin Stability

Although the Beta3s and W10V profiles are very similar, there are subtle differences between them. Importantly, the native state is less stable in Beta3s than in W10V (35% vs 39.5%), the helical basin is slightly more populated in Beta3s than in W10V (11.2% vs 8.8%), and there is a difference in the relative stability of nonhelical misfolded species such as Ns-or (N-terminal strand out of register and folded C-terminal hairpin, basin populations of 6.2% vs 3.6%) and Cs-or (C-terminal strand out of register and folded N-terminal hairpin, basin populations of 2.6% vs 4.9%). These statistical weights were calculated by defining the basins as described in the Methods section. Note that corresponding basins in the two systems can occur at different ZA/Z positions in the profiles of Figure 2. Also, the FEPs reveal basins that are visited in only one of the two peptides. Such basins are illustrated in white in Figure 4.

Figure 5 shows the negative logarithm of the probability of the first passage time (fpt) to the native node (i.e., the folding time). The plot has a minimum at about 100 ns, which corresponds to the folding time. The fast folding values correspond to configurations that start in the native state (fpt < 1 ns), and the slower folding ones are configurations that start in the denatured state (fpt > 1 ns). The fpt plot shows a very simple behavior, such as is expected for a two-state system, despite the multiminimum character of the free-energy surface.

Figure 5
Profile obtained using fpt as the progress variable and calculated as ΔG = −kBT ln[P(fpt)] on the bins, whose size increases exponentially (10 bins/decade) for better resolution of the different timescales.

FEPs Calculated with pfoldt and mfpt

The pfoldf analysis of systems without a representative node in the denatured state is based on the introduction of an extra node24 that is linked with a small capacity λ (typically 0.01 or lower) to all nodes in the network. Here, pfold with a commitment time of pfoldcommit) and the mean first passage time (mfpt) to the reference node as progress variables are introduced to plot the FEPs. The respective procedures, called pfoldt and mfpt, have the advantage that no additional node needs to be introduced.

The pfoldt- and mfpt-calculated FEPs are in good agreement with those obtained by the pfoldf procedure (Figure 3). The pfoldf, pfoldt, and mfpt barriers separating significantly populated basins are almost identical for all procedures and both peptides. Further, pfoldt and mfpt, as well as pfoldf (see above), yield good approximations to the exact barriers (open circles in Figure 3). The six basins shown in Figure 3 share more than 99% of their conformations when isolated by the three methods (not shown), indicating that the results are robust. This suggests that the choice between pfoldf, pfoldt, and mfpt can be made according to convenience. The cut between two well-defined regions with representative nodes has to be calculated by pfoldf, because it is the only procedure of the three discussed here in which two input nodes are used, but pfoldt and mfpt are the more straightforward choices if only one representative node exists, such as when an unfolding profile is calculated. Solving the system of equations for the pfoldf and mfpt procedures is of the same complexity, because the only differences are the boundary conditions and the use of the time constant Δt that is added in the mfpt equations (see Methods). On the other hand, the pfoldt method requires some precalculations before the iterative solution of the equation can be performed and is therefore more complex (see Supporting Information). An application of the mfpt analysis is that it can be used as the progress coordinate (instead of ZA/Z) to obtain a FEP with all basins separated from the native one by a distance in time units. The relation between the profiles projected on ZA/Z and on mfpt is a nonlinear transformation x → mfpt[x(ZA/Z)], where x(ZA/Z) assigns nodes to each position on ZA/Z; that is, the mfpt that was originally used to rank on the ZA/Z axis is now directly assigned to the nodes. Figure 6 clearly shows mfpt values of individual basins. Interestingly, it also illustrates the origin of the single-exponential behavior (see Single-Exponential Kinetics of Folding), which is a spread of only a factor of 3 in the mfpt values (from about 60–80 ns for Ns-or and Cs-or to about 190–200 ns for Ch-curl1, Ch-curl2, and helical). However, as can be seen by comparing Figure 3 with Figure 6, there is more overlap of non-native basins using mfpt rather than ZA/Z as the progress coordinate.

Figure 6
Beta3s unfolding FEP calculated for the directed network (see Methods) using mfpt as a progress coordinate and a progress variable, which can be obtained by the transformation x → mfpt[x(ZA/Z)]. As in Figure 3, individual basins are colored according ...

Helical Basin

A previous study32 suggested that the denatured state ensemble of Beta3s is highly heterogeneous and includes enthalpic traps as well as conformations with partial helical structure; the latter form the helical basin. Notably, in the FEPs of Figure 2 and Figure 3, the entire “helical” region is identified and shown to be separated by a high barrier (at ZA/Z of about 0.88) from the rest of the denatured state, which extends from 0.35 to 0.88 (see above). The helical region shows the main difference with respect to the KGA results19 and is indicated in Figure 4. KGA correctly identified the two most populated free-energy subbasins (-HHHHHHHHHHHHS------ and --TT--HHHHHHHSS----- with populations of 1.9% and 1.6%, respectively) within the helical state of Beta3s (Figure 2). The commitment time of 1 ns used in the previous work19 was too short to group all helical structures into one basin, because the helical basin is divided into various subbasins separated by barriers, as can be seen in the helical unfolding profile of Figure 3. These barriers prevent the system from rapid equilibration between all helical structures. A larger commitment time of 5 ns, however, is able to identify the entire helical basin (Supporting Information of ref 19). These results show that the definition of a basin involves the choice of “resolution”. Both the commitment time of KGA and the height of the barrier in the FEP analysis, above which one considers a basin as separated, correspond to the “lens” with which the free-energy surface is analyzed. For each choice of a minimum barrier in the FEP procedures, there exists a commitment time for the determination of the corresponding basins with KGA. However, defining a minimum barrier height is more transparent than choosing a commitment time, which is initially extracted from the fpt plot of Figure 5 and then varied to obtain the desired resolution. In either case, there is some arbitrariness in defining a basin per se.

The helical state is the right-most basin in all five unfolding profiles from the nonhelical basins in Figure 3 (see also Figure S2 in the Supporting Information). This observation is consistent with the high barrier that has to be overcome to enter the helical basin from the rest of the conformational space, as the barriers generally appear in increasing order along the ZA/Z progress coordinate.

Interestingly, as shown in the thermodynamic analysis presented in Table 3, the helical basin has a high energy and is entropically stabilized. This is because the strings associated with the helical basin have unstructured residues that do not make hydrogen bonds. In the helical basin, 78.4% and 7.7% of the snapshots have more than 5 and 10 unstructured residues, respectively; that is, they belong to strings with more than 5 and 10 “” letters. The corresponding percentage values for the entropic region are 79.7% and 15.2%, respectively. As a basis of comparison, the native state has only 2.9% and 0.003% of its snapshots in strings with more than 5 and 10 unstructured residues, respectively. Moreover, the numbers of different secondary structure strings in the helical, entropic, and native states are 57 134, 193 666, and 2672, respectively.

Simplified Network (SEKN) and the Role of the Entropic State

In previous analyses of folding simulations, it has been found useful22,24 to construct a highly simplified network that shows only the main basins and their connectivity as a complement of the detailed conformational space network (Figure 3). Figure 7 shows such a network with the free-energy basins isolated by pfoldf. It includes the heterogeneous entropic state (dashed surface in Figure 7), which is made up of all conformations not belonging to any of the 10 basins that appear as a part of the network; the latter all have barriers higher than 0.3 kcal and significant partition function (≥0.5%). The so-called entropic state, by contrast, is composed mainly of nodes that are visited only once, or at most a few times, so that it is not a “true” basin in the sense used for the other basins. Overall, at the chosen simulation temperature (330 K), the denatured state of Beta3s consists of the entropic state (populated at 33%), a helical basin (populated at 11%) and eight metastable enthalpic traps (populated at 0.5–6%). Although the free-energy basins were selected with the pfoldf procedure on the full EKN, the links in Figure 7 show the number of transitions, sampled along the molecular dynamics trajectory, between the most populated secondary structure string (bottom) of a given pfoldf basin i and the bottom of another basin j (or the same basin) through the entropic state only (i.e., without visiting another enthalpic basin). Each of the pfoldf profiles in Figure 7 was calculated using only the basin under consideration, the entropic state, and the native state while neglecting the other basins. In this way, barriers involved in the transitions between individual enthalpic basins and the native state are described accurately; see also the Supporting Information, section D. Note that the left-most profile in Figure 7 (i.e., pfoldf-calculated FEP from the entropic state) considers only the entropic and native states, which account for almost 70% of the total weight. All profiles show a barrier of only about 0.5 kcal/mol from the entropic state toward the native state, which is near the limit of what has been termed barrierless or downhill folding.4954

Figure 7
Simplified equilibrium kinetic network (SEKN) of Beta3s free-energy basins. The circles, ellipse, rectangle, and diamond are the 10 most populated free-energy basins of Beta3s with their respective statistical weight as isolated by pfoldf. Colors and ...

The complexity of the SEKN, particularly for the denatured state, suggests that a detailed analysis would be useful to obtain a more complete understanding of the folding behavior. Figure 7 shows the number of “direct” transitions (i.e., without visiting the native state) between identified basins, which can be compared with the number of transitions from each of the basins to the native state. In most cases (the pair cyan and Ns-or is an exception), the direct transitions are rather rare compared to the number of transitions connecting each basin to the native state. However, the total number of transitions connecting the non-native basins to other non-native basins without passing through the native state is of the same order of magnitude as the number of transitions connecting each of the non-native basins to the native state.

If the equilibrium trajectories are followed, they make clear that direct transitions overwhelmingly go through the entropic basin from one of the defined basins to another. This is in accord with the results in Figure 7, which show that a long time is spent in the entropic basin in nearly all transitions. However, in most cases, the trajectories go through the native basin and can be diagrammed as (i → entropic → native → entropic → j), often spending a long time in the native basin and making repeated transitions (i [long left and right arrow] entropic [long left and right arrow] native [long left and right arrow] entropic [long left and right arrow] j). There is essentially full equilibration in the native basin with rapid sampling of different conformations; there are 2672 nodes (different secondary structure strings) in the native basin. Further, if one examines the content of native secondary structure in the last 0.2 ns before the trajectory exits from the folded to the entropic state, with the condition that it will next visit a certain enthalpic basin (e.g., Ns-or or Cs-or), a significant structural bias toward that basin is already present (see Figure 8). Thus, the fate of the trajectory is biased already in the native state. This analysis is in accord with the conclusion that the system does not stay in the entropic state long enough to equilibrate. The latter is due in part to the aforementioned low barrier between the entropic state and the native state (see the left-most profile of Figure 7) and in part to the presence of significant barriers between different parts of the entropic basin. This is shown in section E of the Supporting Information by the reduced pfoldf FEPs of pairs of non-native basins and the entropic state (e.g., Ns-or, entropic, and Cs-or in Figure S6a). However, not all basins of the denatured state are separated by barriers within the entropic state (one example is in Figure S6c for the Ns-or and the cyan basins). These observations explain the origin of the barriers in the entropic region at 0.3 ≤ ZA ≤ 0.4 for the reduced FEPs (i.e., FEPs calculated taking into account only one basin and the entropic and native states) from the Cs-or or the helical basin in Figure 7. In other words, Figure 7 and Figure S6 are consistent because both show barriers in the entropic region mainly between Cs-or, Ns-or, and helical regions, but not among conformations with the C-terminal hairpin folded (i.e., Ns-or, Ch-curl1, Ch-curl2, cyan, and magenta basins in Figure 7).

Figure 8
Native secondary structure content of the trajectory in the folded state, 10 snapshots (0.2 ns) before the system leaves toward one of the two non-native enthalpic basins Ns-or or Cs-or. Notably, if the trajectory continues to Ns-or (Cs-or), the N-terminal ...

The number of folding transitions from non-native basins with a structured C-terminal hairpin is larger than the corresponding number from basins with the N-terminal hairpin formed (Cs-or). This observation is consistent with the analysis of folding transition state structures19 identified by a node- pfold = 0.5 criterion.34 The two main folding pathways of Beta3s are in agreement with the diffusion-collision model52 in which the folding process involves the encounter of marginally stable secondary structural elements.53 The SEKN also sheds light on the folding pathways from the helical state (Figure 7). Half of the transitions proceed via the entropic state, where the system spends considerable time, directly to the folded state. Less frequently, the trajectory visits Cs-or or Ns-or structures or even returns to the helical basin. We note that, even though the folding times from the Ns-or and Cs-or conformations are relatively large (110 and 70 ns, respectively) and the Ns-or and Cs-or conformations hardly interconvert, the difference between the most populated nodes of the three basins is only in one to three positions of the secondary structure string. This observation illustrates that a small structural change can result in a large kinetic distance. For instance, the structural change from folded to Cs-or, i.e., from two to three S letters at the second turn (see Table 2 for strings), involves a complete rearrangement of the side chains of the C-terminal strand.

For each of the basins identified in the SEKN, the analysis showed that the equilibration within the basin is fast, relative to transitions from or to the basin. However, this is not true for the entropic basin. The low barrier between the entropic state and the native state together with the high population of the former (about 33%) leads to fast transitions to the native state that prevent equilibration in the entropic basin. Therefore, plotting the entropic state as a single node would not only be misleading, but would also give an incorrect picture of the pathways. If the entropic state were replaced by one node, the native state would be connected to only the entropic node, and the picture would suggest that the entropic state acts as the hub. However, in the SEKN emerging from the pfoldf procedure, both the native and the entropic states can be considered to be hubs.

Transition Disconnectivity Graph (TRDG)

The TRDG44 of Beta3s (Figure 9) provides further evidence that the denatured state is heterogeneous and has several funnel-like basins with favorable effective energies as well as a helical basin. Disconnectivity graphs do not visualize the basins that lack representative nodes, i.e., the basins with high entropic contributions to the free energy. The split of the helical basin is consistent with the FEPs of Figure 2 and Figure 3 and explains the longer commitment time required in KGA to isolate the complete helical basin, as discussed in the Results section. An advantage of TRDG over one-dimensional FEPs and the SEKN is that it quantitatively depicts (mainly enthalpic) minima and barriers in a single plot.

Figure 9
Free-energy (transition) disconnectivity graph of Beta3s EKN obtained with secondary structure coarse-graining. Secondary structure strings of the most populated clusters in the basins are shown. The vertical axis shows the number of times the secondary ...

Single-Exponential Kinetics of Folding

The cumulative distribution of folding times of all configurations in the trajectory shows single-exponential behavior (Figure 10 top). This (apparently simple) behavior is consistent with the fact that the barriers to exit the individual basins are of similar heights in the FEPs from enthalpic traps or the helical basin (Figure 3 and Table 3). Together with the aforementioned low barrier from the entropic state to the native state, the similar barrier heights explain the single-exponential behavior. The similar free-energy barriers from Ns-or and Cs-or are likely to be a consequence of the high sequence identity (67%) between the N-terminal /j-hairpin (residues 1 – 12) and the C-terminal β-hairpin (residues 9–20). On the other hand, there is no straightforward explanation for the similar barrier height from the helical basin.

Figure 10
Single-exponential behavior of folding. (Top) Points show the cumulative distribution of the first passage time to the folded state f(t)=tp(τ)dτ, where p is the probability distribution of the first passage time. All ...

Model rate calculations for a photoswitchable peptide have shown that, for an SEKN similar to the present one, a spread in the rates on the order of a factor of 9 is required to observe significant deviations from single-exponential behavior.54

Because the entropic state cannot be represented as a single node in the SEKN (because of slow relaxation), the distribution of folding times was obtained by simulating MC kinetics on the network rather than on the SEKN. An initial point was picked arbitrarily among nodes that do not belong to the native basin with a probability proportional to the statistical weight, and MC simulations were performed until the system reached a node corresponding to the native basin. Figure 10 (bottom) shows the density function of the distribution obtained with 105 trajectories and a single-exponential distribution with the corresponding folding time. The curves are in reasonable agreement, which indicates that the kinetics can be approximately described as single-exponential.

Interestingly, the present analysis indicates that somewhat deceptive single-exponential behavior can emerge from completely different energy landscapes as for both Beta3s, which has a kinetically partitioned denatured state and a hub-like native state, and the β-hairpin of protein G, which shows fast equilibration within a multibasin denatured state.22

Concluding Discussion

Considerable progress has been made recently in experimental investigations of protein folding. Particularly, for the small fast-folding proteins, which have been studied most,4 a two-state description (folded and unfolded) is adequate to describe the measurements. This means that little, if any, information concerning the details of the folding pathways is obtained, although mutation studies have been used to provide a coarsegrained description of the transition state.55,56 Also, expirements supplementing the kinetic measurements by probes sensitive to structural details6,54 have yielded some insights into the folding pathways, particularly when intermediates are present.57 However, none of the experimental studies provide a detailed description of the structures that contribute significantly to the ensembles that are sampled along the folding pathways. Although one might hope for such information from future experiments, as of now, the only way to approach this problem is by computer simulations. In recent years, aided by faster, often massively parallel computational resources, the first steps toward this aim have been realized. Mostly, the studies have been limited to peptides22,32,42 and a few miniproteins (e.g., Trp cage) that fold on the simulation (nanosecond) time scale. Unfolding simulations at high temperature have been interpreted in the folding direction,58 but with few exceptions,59 the unfolding reaction has been followed only once in the stimulations. There are very few systems for which it has been possible to do the multiple folding simulations required to obtain statistically meaningful results for analysis. The Beta3s mini-protein, with an implicit solvent model, is one such system. The present analysis is based on equilibrium simulations 20 µs in length that show about 100 folding/unfolding events for a temperature (330 K) at which the native and denatured states are both significantly populated (35% native and 65% denatured).

One problem in using the simulation results is the difficulty of analyzing them to obtain an understanding of the folding reaction. The number of degrees of freedom (3 × 215 for a small system such as Beta3s in the polar hydrogen approximation, in which aliphatic and aromatic hydrogen atoms are not considered explicitly) makes a straightforward approach impossible, even though all of the details (hopefully representative of the actual folding process) are available from the trajectory. Use of the results requires a method for reducing the problem to one or only a few dimensions that are sufficient to describe the folding reaction in a meaningful way. The recently developed cut-based FEP procedures and complex network analyses have been shown to essentially solve this problem. They have demonstrated, among other conclusions, that the very simple picture of protein folding (e.g., one or at most two barriers between the denatured and native state), often obtained by projecting the free energy on an arbitrarily chosen progress variable(s), is not consistent with the complexity of the actual free-energy surface.18,22,32,42 Such complexity has spurred the development of more sophisticated computational procedures for determining free-energy basins and transitions among them. The essential element of both the pfold-/mfpt-based procedures for FEP calculation24 and kinetic grouping analysis19 is the identification of free-energy basins, not according to geometrical characteristics (such as the fraction of native contacts or rmsd from the folded structure) but rather according to the transitions that occur in folding/unfolding trajectories at equilibrium. From such an analysis, a meaningful one-dimensional projection of the free-energy surface (called FEP in this work) is obtained. It provides the basins on the surface and the barriers between them. Unlike the standard projections, which lead to overlap of the basins that smooth out the barriers and can make them disappear (as shown for the β-hairpin of protein G in ref 24), the progress variables used here do not result in such overlap if a valid clustering algorithm is used. The formulation requires a coarse-graining approach to group snapshots saved along equilibrium trajectories into nodes, so that adequate transition statistics can be obtained. Some evidence for the robustness upon changing the coarse-graining algorithm (based on rmsd or secondary structure string) was provided previously22,32 and is given in the Supporting Information, but this issue should be analyzed in more detail.

An important conclusion from the present study is that the pfold-/mfpt-based procedures24 find the same free-energy basins as the kinetic grouping approach described in a previous work.19 The very similar results for the free-energy surface, as well as the identification of the subtle differences between Beta3s and its W10V mutant, indicate that the two approaches are correct and complementary. One aspect of the pfold-/mfpt-based procedures is that they are able to separate a free-energy basin consisting of an ensemble of conformations with fluctuating helical (i.e., non-native secondary structure) content. The helical basin has a population of about 11%, and from this basin, the folded state is reached through a heterogeneous entropic state. The significant statistical weight of the helical basin is surprising if one considers that Beta3s is a peptide designed to assume a three-stranded antiparallelβ-sheet fold. Because of its entropic stabilization, the helical state of Beta3s is expected to be less populated at lower temperature. Interestingly, an a-helix-rich kinetic intermediate in the refolding (by chemical denaturant dilution) of the β-sandwich protein src SH3 has been reported on the basis of circular dichroism, fluorescence, and X-ray solution scattering experiments.60 Moreover, at pH 3, a helical equilibrium intermediate of the A45G mutant of src SH3 has been observed, and evidence has been provided that it corresponds to a kinetic intermediate.61

The differences between Beta3s and its W10V mutant are small but relevant. There is a shift of the equilibrium in the helix/β-sheet statistical-weight ratio from 11β5 for the wild type to 9/40 for W10V. This indicates that even a single-point mutation can have an influence on the relative propensity of secondary structure formation that plays a critical role in diseases related to protein misfolding and aggregation.6264

About one-third of the snapshots saved along the molecular dynamics trajectories belong to a heterogeneous entropic state that is visited during individual transitions between mainly enthalpic free-energy basins. There is no fast equilibration within the entropic state because of the low barrier of only around 0.5 kcal/mol toward the native state, but also because of barriers that split the entropic state. This explains the previously reported kinetically partitioned denatured state.19 As a consequence, the future development of the system depends strongly on the region where the system enters the entropic state or, equivalently, where it exits the native basin (Figure 8). Because the native and entropic states together make up almost 70% of the total weight, the free-energy surface of Beta3s seems to exhibit some of the features of a fast barrierless/low-barrier folder but with metastable enthalpic traps, each with a relatively low population and a total weight of about 20%, plus a helical region populated at about 11%.

The FEPs reveal that the barriers to exit the enthalpic traps have similar heights, and the SEKN shows that the times spent in the entropie state before reaching or after leaving the folded state are comparable. These results explain why the folding times from individual basins differ by no more than a factor of 3.19 Thus, in accord with a recent experimental analysis of a photoswitchable helical peptide,54 the single-exponential folding behavior originates from esentially equal folding times for multiple paths. This provides another scenario, different from that of the β-hairpin of protein G,22 by which the complexity of the folding reaction can be hidden from standard kinetic experiments.

Supplementary Material

supporting info

Supporting Information Available:

Additional calculations, figures, and tables. This material is available free of charge via the Internet at http://pubs.acs.org.

Acknowledgment

We thank F. Rao, E. Guarnera, and E. Paci for contributions to the initial stages of this work. We thank M. Seeber for help with the program WORDOM. The simulations were performed on the Matterhorn cluster of the University of Zurich. This work was supported by a Swiss National Science Foundation grant to A.C., and the portion done at Harvard University was supported, in part, by a grant from the National Institutes of Health. S.K. was supported by the CHARMM Development Project. Procedures for calculating the FEPs have been introduced into WORDOM (http://www.biochem-caflisch.unizh.ch/wordom), and the program for plotting the TRDG (Figure 9) is available upon request.

References and Notes

1. Karplus M. J. Phys. Chem. B. 2000;104:11–27.
2. Frauenfelder H, Sligar SG, Wolynes PG. Science. 1991;254:1598–1603. [PubMed]
3. Privalov PL, Makhatadze GI. J. Mol. Biol. 1993;232:660–679. [PubMed]
4. Maxwell KL, Wildes D, Zarrine-Afsar A, De Los Rios MA, Brown AG, Friel CT, Hedberg L, Horng D, Bona J-C, Miller EJ, Vallée-Bélisle A, Main ERG, Bemporad F, Qiu L, Teilum K, Vu N-D, Edwards AM, Ruczinski L, Poulsen FM, Kragelund BB, Mchnick SW, Chiti F, Bai Y, Hagen SJ, Serrano L, Oliveberg M, Raleigh DP, Wittung-Stafshede P, Radford SE, Jackson SE, Sosnick TR, Marqusee S, Davidson AR, Plaxco KW. Protein Sci. 2005;14:602–616. [PMC free article] [PubMed]
5. Jackson SE. Fold. Des. 1998;3:R81–R91. [PubMed]
6. Dobson CM, Sali A, Karplus M. Angew. Chem., Int. Ed. 1998;37:869–893.
7. Dinner AR, Sali A, Smith LJ, Dobson CM, Karplus M. Trends Biochem. Sci. 2000;25:331–339. [PubMed]
8. Mirny LA, Shakhnovich EI. Ann. Rev. Biophys. Biomolec. Struc. 2001;30:361–396. [PubMed]
9. Daggett V, Fersht AR. Nature Rev. Mol. Cell Biol. 2003;4:497–502. [PubMed]
10. Wolynes PG. Phil. Trans. R. Soc. A. 2005;363:453–467. [PubMed]
11. Chan HS, Dill KA. Proteins: Structure, Function, and Bioinformatics. 1998;30:2–33. [PubMed]
12. Best R, Hummer G. Proc. Natl. Acad. Sci. USA. 2005;102:6732–6737. [PMC free article] [PubMed]
13. Palyanov AY, Krivov SV, Karplus M, Chekmarev SF. Phys. Chem. B. 2007;111:2675–2687. [PubMed]
14. Go N, Abe H. Biopolymers. 1981;20:991–1011. [PubMed]
15. Schueler-Furman O, Wang Ch, Bradley P, Misura K, Baker D. Science. 2005;310:638–642. [PubMed]
16. Kussell E, Shimada J, Shakhnovich EI. Proc. Natl. Acad. Sci. USA. 2002;99:5343–5348. [PMC free article] [PubMed]
17. Paci E, Vendruscolo M, Karplus M. Proteins: Structure, Function, and Bioinformatics. 2002;47:379–392. [PubMed]
18. Caflisch A. Curr. Opin. Struct. Biol. 2006;16:71–78. [PubMed]
19. Muff S, Caflisch A. Proteins: Structure, Function, and Bioinformatics. 2008;70:1185–1195. [PubMed]
20. Becker OM, Karplus M. J. Chem. Phys. 1997;106:1495–1517.
21. Wales DJ. Energy Landscapes. Cambridge, U.K.: Cambridge Univ. Press; 2003.
22. Krivov SV, Karplus M. Proc. Natl. Acad. Sci. USA. 2004;101:14766–14770. [PMC free article] [PubMed]
23. Munoz V, Thompson PA, Hofrichter J, Eaton WA. Nature. 1997;390:196–199. [PubMed]
24. Krivov SV, Karplus M. J. Phys. Chem. B. 2006;110:12689–12698. [PubMed]
25. Du R, Pande VS, Grosberg AY, Tanaka T, Shakhnovich EI. J. Chem. Phys. 1998;108:334–350.
26. Ferrara P, Caflisch A. Proc. Natl. Acad. Sci. USA. 2000;97:10780–10785. [PMC free article] [PubMed]
27. De Alba E, Santoro J, Rico M, Jiménez MA. Protein Sci. 1999;8:854–865. [PMC free article] [PubMed]
28. Rao F, Caflisch A. J. Chem. Phys. 2003;119:4035–1042.
29. Cavalli A, Ferrara P, Caflisch A. Proteins: Structure, Function, and Bioinformatics. 2002;47:305–314. [PubMed]
30. Serrano L, Matouschek A, Fersht AR. J. Mol. Biol. 1992;224:805–818. [PubMed]
31. Settanni G, Rao F, Caflisch A. Proc. Natl. Acad. Sci. USA. 2005;102:628–633. [PMC free article] [PubMed]
32. Rao F, Caflisch A. J. Mol. Biol. 2004;342:299–306. [PubMed]
33. Hubner IA, Shimada J, Shakhnovich EI. J. Mol. Biol. 2004;336:745–761. [PubMed]
34. Rao F, Settanni G, Guarnera E, Caflisch A. J. Chem. Phys. 2005;122:184901. [PubMed]
35. Snow CD, M Rhee Y, Pande VS. Biophys. J. 2006;91:14–24. [PMC free article] [PubMed]
36. Brooks BR, Braccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M. J. Comput. Chem. 1983;4:187–217.
37. Seeber M, Cecchini M, Rao F, Settanni G, Caflisch A. Bioinformatics. 2007;23:2625–2627. [PubMed]
38. Neria E, Fischer S, Karplus M. J. Chem. Phys. 1996;105:1902–1921.
39. Ferrara P, Apostolakis J, Caflisch A. Proteins: Structure, Function, and Bioinformatics. 2002;46:24–33. [PubMed]
40. Ferrara P, Apostolakis J, Caflisch A. J. Phys. Chem. B. 2000;104:5000–5010.
41. Eaton WA, Munoz V, Hagen J, Jas SGS, Lapidus LJ, Henry ER, Hofrichter J. Ann. Rev. Biophys. Biomolec. Struc. 2000;29:327–359. [PubMed]
42. Hubner IA, Deeds EJ, Shakhnovich EI. Proc. Natl. Acad. Sci. USA. 2006;103:17747–17752. [PMC free article] [PubMed]
43. Andersen CAF, Palmer AG, Brunak S, Rost B. Structure. 2002;10:174–184.
44. Krivov SV, Karplus M. J. Chem. Phys. 2002;117:10894–10903.
45. Park S, Sener MK, Lu D, Schulten K. J. Phys. Chem. 2003;119:1313–1319.
46. Apaydin M, Brutlag D, Guesttin C, Hsu D, Latombe J. In International Conference on Computational Molecular Biology (RECOMB) 2002
47. Ford LR, Fulkerson DR. Canadian J. of Math. 1956;8:399–404.
48. Gomory RE, Hu TC. SIAM J. Applied Math. 1961;9:551–570.
49. Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG. Proteins: Structure, Function, and Bioinformatics. 1995;21:167–195. [PubMed]
50. Garcia-Mira MM, Sadqi M, Fischer N, Sanchez-Ruiz JM, Munoz V. Science. 2002;298:2191–2195. [PubMed]
51. Eaton WA, Munoz V, Thompson PA, Chan CK, Hofrichter J. Curr. Opin. Struct. Biol. 1997;7:10–14. [PubMed]
52. Karplus M, Weaver DL. Biopolymers. 1979;18:1421–1437.
53. Pappu RV, Weaver DD. Protein Sci. 1998;7(2):480–490. [PMC free article] [PubMed]
54. Ihalainen JA, Bredenbeck J, Pfister R, Helbing J, Woolley GA, Hamm P. Proc. Natl. Acad. Sci. USA. 2007;104:5383–5388. [PMC free article] [PubMed]
55. Matouschek A, Kellis JT, Jr, Serrano L, Fersht AR. Nature. 1989;340:122–126. [PubMed]
56. Vendruscolo M, Paci E, Dobson CM, Karplus M. Nature. 2001;409:641–645. [PubMed]
57. Radford SE, Dobson CM, Evans PA. Nature. 1992;358:302–307. [PubMed]
58. Fersht AR, Daggett V. Cell. 2002;108:573–582. [PubMed]
59. Day R, Daggett V. Proc. Natl. Acad. Sci. USA. 2005;102:13445–13450. [PMC free article] [PubMed]
60. Li J, Shinjo M, Matsumura Y, Morita M, Baker D, Ikeguchi M, Kihara H. Biochemistry. 2007;46:5072–5082. [PubMed]
61. Li J, Matsumura Y, Shinjo M, Kojima M, Kihara H. J. Mol. Biol. 2007:747–755. [PubMed]
62. Chiti F, Stefani M, Taddei N, Ramponi G, Dobson CM. Nature. 2003;424:805–808. [PubMed]
63. Tartaglia GG, Cavalli A, Pellarin R, Caflisch A. Protein Sci. 2004;13:1939–1941. [PMC free article] [PubMed]
64. Tartaglia GG, Cavalli A, Pellarin R, Caflisch A. Protein Sci. 2005;14:2723–2734. [PMC free article] [PubMed]
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...