• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Dec 27, 2005; 102(52): 18914–18919.
Published online Dec 19, 2005. doi:  10.1073/pnas.0502181102
PMCID: PMC1323145

High-resolution protein folding with a transferable potential


A generalized computational method for folding proteins with a fully transferable potential and geometrically realistic all-atom model is presented and tested on seven helix bundle proteins. The protocol, which includes graph-theoretical analysis of the ensemble of resulting folded conformations, was systematically applied and consistently produced structure predictions of ≈3 Å without any knowledge of the native state. To measure and understand the significance of the results, extensive control simulations were conducted. Graph theoretic analysis provides a means for systematically identifying the native fold and provides physical insight, conceptually linking the results to modern theoretical views of protein folding. In addition to presenting a method for prediction of structure and folding mechanism, our model suggests that an accurate all-atom amino acid representation coupled with a physically reasonable atomic interaction potential and hydrogen bonding are essential features for a realistic protein model.

Keywords: Monte Carlo, all-atom model, folding simulation, structure prediction, graph theory

Protein folding is easy. Without effort, every living organism completes the process innumerable times. Unfortunately, modeling the process is notoriously difficult. Since Anfinsen's experiment (1), we have known that a protein's tertiary structure is defined by its primary sequence. However, the question of sequence-structure mapping remains unsolved. Although researchers in the field have risen to the challenge and continue to make incremental progress (2), a complete solution remains among the great outstanding problems in computational biology. The problem has two aspects. First, given a protein's amino acid sequence, can one reliably predict its tertiary structure? Second, can one accurately understand and describe at a detailed atomic level the physical process by which a protein reaches it native conformation and the dynamics of the folded conformation?

Approaches to the protein-folding problem fall into two major categories: bioinformatics methods attempt to model the structure of a protein primarily through homology to known structures and methods that rely on modeling the physical process by which the polymer chain attains its native conformation. Although homology-based approaches have generally yielded more accurate structure predictions and are more readily applied to larger proteins (3), they do not provide physical insight into the folding or conformational dynamics of proteins. The “holy grail” of the protein-folding community thus remains a computationally efficient model that both accurately predicts structure and provides physical insight into the folding and function of any protein given only its amino acid sequence.

Over the past decade, various models have been applied to protein folding and structure prediction. In an important study, the 36 residue villin headpiece fragment was folded to ≈4–5 Å from the native structure, demonstrating that the dream of ab initio protein folding is becoming a reality (4). Other highly successful methods (5) combine sequence and structural homology with incremental physical model-building for structure prediction. Many detailed physical studies use computationally intensive molecular dynamics simulations with complex potentials such as CHARMM and AMBER. Although they provide a measure of physical insight, these models have proven too computationally demanding to apply to any but the smallest proteins and, in such cases, usually produce results similar to simpler models. Additionally, the hundreds of extended simulations that would be necessary to create an ensemble picture of the folding process are beyond the reach of such models. This fact is especially important in context of the “new view” of protein folding as an ensemble process (6). With the profound success of lattice models and Go-type energy potentials in studying and understanding protein folding (79), we know that simple models can effectively abstract many of the essential features of protein folding. We thus are encouraged that a similarly fundamental model, one that represents the basic physics of folding, accurately represents the protein structure (in real, not lattice, space), and is not dependent on any a priori knowledge of the native fold, may provide a solution to the folding problem.

Many studies have been devoted to computationally modeling the Staphylococcus protein A and similar proteins, using everything from Go (10) potentials to empirical (3, 1113) and other potentials (1418). Simulations have used Monte Carlo (MC) (3, 14), molecular dynamics (12, 13, 17), discreet molecular dynamics (10), and conformational space annealing (11). Protein chain representations have varied from Cα (10) and other reduced atom (11, 15, 17) to all-atom (3, 12, 13, 18) models (3, 12, 13) with results similar to the best predictions of smaller proteins in casp (2). In most studies, the reported minimum rms deviation (rmsd) conformation was not identifiable by energy (3, 4, 1115, 1720).

In comparing these studies, it is critical to understand whether the potential is derived from an explicit (although approximate) representation of the physical forces (like CHARMM and AMBER) or as a knowledge-based potential (KBP). Statistical KBPs assume a form for multibody interactions and determine the parameters from a database, whereas optimized potentials optimize the parameters for a training set of proteins in an attempt to make the native state the global energy minimum. A danger of KBPs is that, without separation of the test and training sets, there is no demonstrated reason to believe that the potentials are usefully transferable. The present work demonstrates that a statistical approach can produce a successfully transferable all-atom potential, suitable for ab initio folding simulations, as other careful studies have also shown with optimization (17, 21) and physical approaches (12, 13, 20, 22).

Here, we present a previously undescribed model for high-resolution all-atom protein folding and demonstrate its efficacy on seven proteins. The model combines a realistic all-atom representation with a simple, fully transferable, contact potential and hydrogen-bonding (H-bonding) function and is propagated by means of MC dynamics. An advantage of this method's computational efficiency is that it allows for hundreds of fully independent simulations, resulting in representative statistics and that allows one to test ensemble kinetics and thermodynamics. The simulation requires no knowledge of the protein's native structures and may be applied in a systematic prescribed manner to any amino acid sequence. Although hierarchical clustering has previously been used in protein structure prediction (2325), we employ a different graph-theoretic approach that retains topological features of the relationships between members of a cluster and allows us to interpret the results in the context of landscape theories of folding, overcome noise in the potential, and identify high-resolution structure predictions from simulation.

The resulting predictions are significant by the criteria of the protein-folding community (2) and on estimates based on studies of structural homology (26). We also show that the results are meaningful when compared with a set of control simulations. Although conclusions regarding folding mechanism may be made from this model, we limit the present discussion to demonstrating the feasibility of folding and interpreting the graph-theoretic analysis in terms of landscape theory. The successes of the model, which was not optimized or parameterized to any specific protein or training set, are derived from a realistic representation of the topological effects of folding: chain connectivity and side-chain packing combined with a simple, physically reasonable two-body potential that governs specific collapse and a generic H-bonding potential that ensures secondary structure formation.

Model and Methods

Protein Representation and Dynamics. Simulations used structures from the Protein Data Bank (PDB). For randomized sequence controls, all-atom models were built from the fasta (http://fasta.bioch.virginia.edu) sequence by using swiss pdb viewer (www.expasy.org/spdbv). Nonhydrogen atoms are explicitly modeled as impenetrable hard spheres. The move set includes global and localized backbone moves and side-chain torsions. Bond length and connectivity, as well as excluded volume, are always maintained. The move set and MC simulation have been described in detail (27) and have been shown to behave ergodically and satisfy detailed balance (27, 28). Although the thermodynamics and kinetics of the μ-potential have not been calibrated in detail, cooperative two-state (single-exponential) folding and unfolding behavior is observed in all test proteins.

Form and Derivation of the Potential. We have adapted and modified a transferable, knowledge-based pairwise contact potential from earlier work (14) that has also been used in a hybrid potential to introduce physically realistic interaction in simulations of SH3 (29) of the form:

equation M1

where A and B are two interacting atoms, NAB and ÑAB are the number of AB pairs in contact and not in contact in the database, and μ is a parameter balancing attraction and repulsion. As is clear from the above equation, the potential becomes the Go potential at the limit where the number of atom types goes to the number of atoms in a single protein. The (nonoptimized) value of μ (0.9979) was chosen such that the left angle bracketEAB right angle bracket = 0. To calculate NAB and ÑAB, we use a database (30) of 103 proteins (see Data Set 1, which is published as supporting information on the PNAS web site) with <25% sequence homology that are >50 and <200 residues and define a set of transferable atom types where backbone atoms are typed as peptide N, Cα, carbonyl C, and O regardless of residue. Each side-chain atom has its own type, with the exception of those atoms related by symmetry (methyl group carbons in Val, for example) yielding a total of 84 atom types (see Table 6, which is published as supporting information on the PNAS web site). Importantly, all proteins in our test set and their homologues are excluded from the database used to compute the potential. These two-body interactions are represented by a square well potential, where atoms A and B with hard-sphere radii r separated by distance some D are in contact if 0.75(rA + rB) < D < 1.8(rA + rB). (Potential available from authors upon request.)

In addition to the above pairwise interaction potential, we consider a backbone H-bonding (EHB; see Fig. 3, which is published as supporting information on the PNAS web site) function to ensure proper secondary structure formation. The relative strength of H-bonding and pairwise interaction is controlled by α, which balances the forces of polymer elongation and collapse.

equation M2

To perform effective simulations, the relative energy scale between EAB and EHB must be set by α. When α is very high, the total energy is dominated by H-bonding and extended helix conformations are formed. When α is too low, hydrophobic interactions, which are the strongest among EAB, are overwhelmingly represented, leading to collapsed conformations with a well-packed hydrophobic core but without secondary structure. At extreme (α ~ 0 or 1) values, protein behavior is sensitive to this parameter. However, it is possible to systematically identify an appropriate value of α by beginning with a high value and annealing until the majority of structures collapse to globular conformations (see Fig. 4, which is published as supporting information on the PNAS web site). The values of α that induce collapse are 0.92 for PDB ID codes 1BDD, 1BA5, 1ENH, and 1GUU and 0.89 for 1GAB and 1GJS. Further tuning does not improve the results, and lowering α significantly beyond this point worsens the structures. As previously observed by other research groups (15), attempts at parameter optimization (in our case μ or α) did not improve the results.

Simulation Protocol. First, the native PDB structure is unfolded for 106 random (without energy but maintaining excluded volume) MC steps at T = 1,000 to create a random, fully unfolded (extended and without correlations to native [var phi] and ψ angles) starting conformation. Each folding simulation is initiated from an independent random conformation and propagated at T = 1.75 (≈Tf) for 108 MC steps. Next, the minimum energy conformation from the folding simulation is annealed from T = 1.75 to 0 for 5 × 107 MC steps to improve side-chain packing. The minimum energy conformation from this refinement simulation is then collected as the structure prediction. The above protocol was repeated 400 times for each protein to create an ensemble of predictions.

Test Proteins. Seven independently folding domains of short length were selected to test the folding model (see Table 7, which is published as supporting information on the PNAS web site). The albumin binding (“up-down-up” three-helix bundle) topology, which includes: Staphylococcus aureus protein A, Ig-binding B domain (PDB ID code 1BDD), Escherichia coli albumin-binding domain surface protein (PDB ID code 1GAB), and Streptococcus IgG binding protein G (PDB ID code 1GJS). The DNA binding (“helix-turn-helix” or homeodomain-type) topology, including: DNA binding domain of human telomeric protein HTRF1 (PDB ID code 1BA5), an engrailed homeodomain-DNA complex (PDB ID code 1ENH), and c-Myb R1 protooncogene (PDB ID code 1GUU). An FF domain protein (PDB ID code 1UZC), with a more complex four-helix (one of which is 310) topology, also was included. 1ENH and 1GUU were compared with x-ray structures, whereas all others were compared with the NMR structures. We calculate rmsd for the helical and turns regions (ignoring long, disordered regions at the termini), corresponding to F6-A60 in 1BDD, N9-A53 in 1GAB, D16-A62 in 1GJS, L7-L53 in 1BA5, F8-I56 in 1ENH, T44-L86 in 1GUU, and K14-T69 in 1UZC. Limitation of the present work to shorter proteins is, at least in part, due to the sampling restrictions presented by a fixed-temperature MC search that, unlike replica exchange, may be more suitable to studying the folding mechanism.

Graph-Theoretical Analysis. The rmsd in Cα coordinates is computed for all pairs of the lowest-energy structures obtained from 400 independent simulations of each protein. A graph is then created from these comparisons by considering each minimum energy structure as a node and connecting any two nodes that exhibit a rmsd less than a particular cutoff (r). The clusters in this graph are defined as any set of nodes where a path exists in the graph between any two members of that set. These disjoint clusters are obtained by using a standard depth-first search algorithm. At any given value of r, the giant component (GC) of the graph is defined as the largest disjoint cluster. As has been observed in many systems, this GC undergoes a transition as a function of r (see Fig. 5, which is published as supporting information on the PNAS web site). We analyze the GC for each protein at the midpoint of the transition, the cutoff r at which half of the total structures are contained within the largest cluster.

Results and Discussion

Separating Native Folds from Misfolds. From examining the 400 independent trajectories for each protein, it is clear that the native state is well sampled. Most minimum energy conformations fall in the 2–6 Å range. Examining all minimum energy structures (Table 1), it is clear that along with native folds, there are a number of low energy decoys. Because of the approximate nature of our energy function, this result is not surprising. Previous computational studies also resulted in the minimum of various energy functions not corresponding exactly to the native state (3, 4, 1115, 19, 20). Many of the decoys we observe involve undocked helices or poorly formed secondary structure. These misfolds usually exhibit higher energy than near-native structures and are thus easily identifiable. There is a second class of misfolds that are protein-like and are broadly describable as “mirror images” of the native structure. These misfolds represent a more difficult case because they have energies comparable with native-like structures despite high (8–10 Å) rmsd from the native state. Other researchers also have noted the presence of mirror misfolds, because a three-helix bundle exhibits twofold topologic degeneracy (3, 11, 15).

Table 1.
The GC presents a significant enrichment of the data, eliminating misfolds and reducing the average rmsd while retaining the best predictions

Given that our energy function cannot a priori distinguish these low-energy decoys from native conformations, we must rely on other objective analyses to identify native conformations. To accomplish this goal, we employ a graph-theoretic clustering procedure. The largest cluster in this graph, the GC, undergoes a sharp transition as r, the structural cutoff used to construct the graph, becomes more stringent. At the midpoint of the transition many of the decoy structures are excluded from the GC as evidenced by the decrease in average rmsd in the GC compared with the entire ensemble of structures at the midpoint of the transition in the GC (Table 1). In most cases the mirror misfold structures are excluded from the GC at this point in the transition. A representative graph (see Fig. 6, which is published as supporting information on the PNAS web site) for 1BDD at the midpoint in the transition is shown in Fig. 2. As evidenced by these graphs, in the predominance of cases no set of decoys obtained from these simulations forms a cluster that is as large and coherent as the native-like structures in the GC. This finding indicates that the predominant structural class sampled by our simulations and identified via clustering is the native basin.

Fig. 2.
The relationship between graph-theoretic analysis and landscape theory. Proteins (designed heteropolymers) exhibit a deep, pronounced minimum (large, dense native cluster), but the landscape is rugged with low energy traps (other small and disjoint clusters). ...

The two exceptions to the above observation are 1GUU and 1ENH. In the case of 1GUU mirror misfolds remain in the GC at the midpoint of the transition. Visual inspection of the 1GUU graph at the transition (Fig. 6) clearly reveals that the GC in this graph consists of two distinct, dense clusters that are connected to one another by only a few edges. At a lower cutoff these two clusters break into a near-native cluster and a mirror misfold cluster (Table 2). In the case of 1ENH the misfolds form a coherent cluster separate from, but of almost the same size as, the GC (Fig. 6 and Table 2). In both of these cases our method cannot identify which represents the native cluster. Graph-theoretic analysis allows us to identify this problem when it does occur, however, providing an objective measure of the degeneracy in our sampling of structural space.

Table 2.
Comparison of native and mirror misfold clusters

Identifying the Native Fold. Clustering the results of independent folding simulations improves the quality of the prediction by enriching the representation of the well-sampled native state and excluding the disparate misfolds. However, the size of the GC may be quite large (200 of 400 configurations at the half point of the transition). This finding raises the question of how the best models from the GC be reliably chosen using objective, quantifiable criteria. Although ranking predictions by energy provides reasonable results, it clearly fails in such cases as 1GAB (Table 3). The success of clustering in eliminating misfolds suggests that some topological features of the graphs may serve to identify the native state. We hypothesize that the most native conformations, if properly sampled, should be the most connected within the GC because a higher population of similar structures results in more connections between those conformations. We find that clustered conformations exhibit a general relationship between the number of neighbors a node exhibits (called k, the degree of the node) and the rmsd from the native state (see Fig. 7, which is published as supporting information on the PNAS web site). Although conformations with low rmsd from the native may exhibit a low degree, most nodes of high degree are among the most native conformations observed in our simulations. When the three highest-degree nodes from the GC are chosen, they unilaterally include some of the highest-quality structures from the entire simulation. This approach is only possible because the graph-theoretic approach we employ preserves topological information about the relationship between nodes in a cluster.

Table 3.
rmsd of structure predictions from the GC by E and k

The superposition of the top k prediction and native state for each protein is presented in Fig. 1. These predictions are obtained from a generalized, fully transferable potential, and the graph-theoretic approach represents a completely consistent, objective analysis that requires no knowledge of the native state. The resulting predictions, three at ≈3 Å rmsd, three at 4 Å, and a 5-Å “fold prediction” in the worst case, demonstrate the effectiveness of our potential and the utility of the clustering method. As discussed in the preceding section, the 1GUU GC contains native and misfolds at the half point and splits into two clusters at lower cutoffs. Applying the same highest-k criteria to these clusters provides a very good (3 Å) structure prediction in one of the two cases. In every test case the GC at half transition contains the native fold. In every test, the native fold is always among the highest three k-conformations. Graph-theoretic analysis thus provides a means for enriching the native fold by means of clustering, choosing a high-quality prediction using k, and gauging the reliability of that prediction by identifying the misfold problem when it occurs.

Fig. 1.
Top k predictions from Table 3, superimposed on the native conformation and colored from N (blue) to C (red) termini.

Controls and the Meaning of the Results. From the application of our protocol to seven real proteins, it is clear that sequences designed by evolution to fold do so in our model. However, several potential concerns should be addressed through control simulations. First, we show that the μ-potential contains useful and meaningful information by demonstrating that it does not behave like a random potential. Second, we confirm that the μ-potential predictions are meaningful in that the model is not constructed to generically fold every sequence into a helix-bundle topology. Last, we verify that the predicted topologies result from the interplay between the H-bonding and pair contact potential.

In all control simulations (Table 4), 1BDD is selected to represent albumin-binding domains, and 1ENH is chosen to represent DNA-binding domains. The protocol for 200 folding simulations and clustering analysis for each control are identical to those for folding the test proteins. The μ-potential has interaction energies that range from –1 to 1 and have an average of 0. For comparison, we produce a random potential with the same statistical properties and apply it, along with the same H-bond potential, to the folding of the 1BDD and 1ENH sequences. In 1BDD, this potential results in an average of 16.72-Å rmsd from the native state, and for 1ENH the average rmsd of the resulting structures is 16.40 Å. In both cases, the structure nearest to native is ≈11 Å and, upon clustering, the GC does not improve any of the predictions. None of the folding runs produced a three-helix bundle topology even remotely similar to the native conformation. Clearly, the μ-potential is nonrandom and contains information that discriminates the native protein structure.

Table 4.
Cα rmsd results from controls of 1ENH and 1BDD

To demonstrate that the model is not contrived to produce helix bundle proteins, we show that if the protocol is applied to some arbitrary random protein-like sequence it does not produce helix-bundle conformation. The fasta sequences for 1BDD and 1ENH were each subjected to 6,000 randomizing permutations (see Fig. 8, which is published as supporting information on the PNAS web site). When the resulting sequence was checked by using blast against the nonredundant National Center for Biotechnology Information database of protein sequences (31), no known sequence homologues were found. A new amino acid chain of the same amino acid composition and length as two different real proteins, but different sequence, then was constructed by using swiss pdb viewer. From the results it is clear that the 200 control runs result in structures that are much worse than the prediction runs and that no improvement comes from clustering. Within the GC, no helix-bundle topology resembling the native conformation of either protein is identified by energy or clustering (Table 4). Although no conformations resembling the WT are produced, the GC and “prediction” represent a cohesive set of similar structures. Clearly, this model is not contrived to turn any sequence into a helix bundle: a random collapsed false prediction should exhibit a rmsd of ≈10 Å, highlighting the significance of the ≈3-Å predictions.

Simulations run with only the μ-potential result in collapsed structures with a compact hydrophobic core, but without secondary structure, that are ≈10-Å rmsd from the native state. Moreover, clustering analysis fails to identify a dominant GC, even at high r cutoffs. When the conformations do cluster, they form a large number of small clusters, indicating great structural diversity. If simulations are run with only H-bonding, then the amino acid chain becomes a single extended helix, with a 19.43- and 21.30-Å rmsd for 1ENH and 1BDD, respectively. Neither EAB nor EHB alone can identify native conformations. Although an atomic interaction term is necessary for collapse, secondary structure formation functions to limit the conformational space available to an amino acid chain, reducing the conformational space necessary to search for the global minimum and contributing to the presence of a cohesive, well defined ensemble conformations in a protein's folded state. It was recently suggested (32, 33) that compaction, chain geometry, and excluded volume ensure protein-like conformations. The above simulations indicate that these factors alone are not able to produce protein-like conformations. We find that necessary conditions for a protein model to achieve a realistic tertiary structure include a geometrically and spatially realistic side chain and backbone representation, an accurate representation of H-bonding, and a potential that represents specific hydrophobic and other atom–atom interaction in a physically appropriate manner (34).

From the clustering of each control (Table 4) it is clear that the resulting “predictions” are nonrandom but do not resemble helix bundles. We observe primarily compact conformations that may have helices and turns, with the obvious exception of using only the H-bonding potential, which produced a single extended helix. It is not surprising to observe a small number of collapsed conformations resembling helix bundles in the randomized sequence controls, some as close as 4 Å from native. However these conformations cannot be identified by energy or any graph-theoretical criteria introduced in this work. Several studies have based claims of successful folding, at least in part, on the presence of low rmsd conformations (12, 22) in a subset of trajectories. However, from these controls we see that it is possible to sample helix-bundle conformations by randomly collapsing a random sequence heteropolymer chain with the caveat that the resulting native-like structure exists only in a tiny fractions of runs and cannot be identified by clustering or energy. For a protein-folding simulation only identification of low rmsd conformations by quantitative criteria independent of knowledge of the native state constitutes a successful description of a physical sequence-dependent folding event.


These results represent significant progress and promise for understanding protein folding and structure prediction in ab initio high-resolution simulations. Systematically applying a quantitative method that uses an energy function along with clustering to identify the prediction consistently identifies structures in the 3-Å rmsd range for most proteins (P ≈ 10–7); even the <6-Å (P ≈ 10–5) (26) structures are relatively good, indicating that Table 3 (excluding mirror images) presents as set of meaningful predictions. Clustering works in concert with the ensemble of energy-based predictions to reliably eliminate decoys and identify the free-energy minimum structures that closely resemble the native state. Graph-theoretic analysis provides an indication of the quality of the results, identifies the most native folds and misfolds, and provides a conceptually useful link to interpretation of the results in the context of physical theories of folding.

Fig. 2 shows the graph for all structures at the transition in the GC for 1BDD folding simulations and randomized sequence control. It is clear from this figure, and it has been shown analytically (35, 36), that random and designed heteropolymers exhibit different behaviors dictated by their energy landscapes. Whereas the designed polymer largely populates a distinct and deep minimum, the random polymer inhabits multiple, energetically similar, but structurally unrelated, states. This view from heteropolymer theory explains why clustering of conformations improves predictive power of the method: native-like states feature multiple interactions that work in concert to provide minimum energy structure in the native state, consistent with the “principle of minimal frustration” (37). Because our potential is approximate, the energy landscape has features of both design and random heteropolymers. Apparently the native basin of attraction is “broad,” containing many structurally related conformations, in contrast to spurious minima, characteristic of random heteropolymers that may be deep but contain only few structures. A graph is a topological entity and can serve to analyze and conceptualize the multidimensional protein-folding energy landscape (23, 38) without reliance on spatial coordinates and, as such, has potential for representing energy landscapes in a limited set of order parameters. These data show that a high k in a graph corresponds to a high density of states, or energy minimum, on the protein-folding energy landscape. In addition to understanding energy landscape topology at the minimum (structure prediction), we anticipate many applications for this model in understanding global landscape topology (folding).

We have demonstrated that fully atomistic simulations, using a single protein representation from beginning to end, have the ability to fold multiple proteins to their native states with a single transferable potential that is not trained or optimized on the test proteins or decoy sets and relies on absolutely no information of the native structure. Comparison of our results with contemporary studies is given in Table 5. Controls show that simply collapsing an amino acid chain with an attractive pair potential is insufficient to fold proteins; the problem of secondary structure also must be addressed. Likewise, we have demonstrated that our potential contains enough specific information to identify the native structures of proteins, without such bias that it introduces false-positive predictions; it does not turn any sequence into a helix-bundle protein. The model works because it accurately captures protein geometry and side-chain packing, H-bonding, and secondary structure formation and presents a physically reasonable pairwise potential for compaction. It is both encouraging and intellectually satisfying that simple physical models reliably represent many of the aspects of protein folding and that graph-theoretic analysis conceptually links the results of folding simulations to energy landscape theory.

Table 5.
Comparison of representative, contemporary models for protein folding

Supplementary Material

Supporting Information:


We thank C. Brian Roland for simulating discussion and careful reading of the manuscript. I.A.H. and E.J.D. are supported by the Howard Hughes Medical Institute. This work was supported by National Institutes of Health Grant GM52126.


Author contributions: I.A.H. and E.I.S. designed research; I.A.H. and E.J.D. performed research; I.A.H. and E.J.D. contributed new reagents/analytic tools; I.A.H., E.J.D., and E.I.S. analyzed data; and I.A.H., E.J.D., and E.I.S. wrote the paper.

Conflict of interest statement: No conflicts declared.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: GC, Giant Component; MC, Monte Carlo; PDB, Protein Data Bank; rmsd, rms deviation.


1. Anfinsen, C. B. (1973) Science 181, 223–230. [PubMed]
2. Venclovas, C., Zemla, A., Fidelis, K. & Moult, J. (2003) Proteins 53, Suppl. 6, 585–595. [PubMed]
3. Vila, J. A., Ripoll, D. R. & Scheraga, H. A. (2003) Proc. Natl. Acad. Sci. USA 100, 14812–14816. [PMC free article] [PubMed]
4. Duan, Y. & Kollman, P. A. (1998) Science 282, 740–744. [PubMed]
5. Bradley, P., Chivian, D., Meiler, J., Misura, K. M., Rohl, C. A., Schief, W. R., Wedemeyer, W. J., Schueler-Furman, O., Murphy, P., Schonbrun, J., et al. (2003) Proteins 53, Suppl. 6, 457–468. [PubMed]
6. Pande, V. S., Grosberg, A., Tanaka, T. & Rokhsar, D. S. (1998) Curr. Opin. Struct. Biol. 8, 68–79. [PubMed]
7. Shakhnovich, E. I. (1997) Curr. Opin. Struct. Biol. 7, 29–40. [PubMed]
8. Takada, S. (1999) Proc. Natl. Acad. Sci. USA 96, 11698–11700. [PMC free article] [PubMed]
9. Onuchic, J. N. & Wolynes, P. G. (2004) Curr. Opin. Struct. Biol. 14, 70–75. [PubMed]
10. Zhou, Y. & Karplus, M. (1999) Nature 401, 400–403. [PubMed]
11. Lee, J., Liwo, A. & Scheraga, H. A. (1999) Proc. Natl. Acad. Sci. USA 96, 2025–2030. [PMC free article] [PubMed]
12. Jang, S., Kim, E., Shin, S. & Pak, Y. (2003) J. Am. Chem. Soc. 125, 14841–14846. [PubMed]
13. Garcia, A. E. & Onuchic, J. N. (2003) Proc. Natl. Acad. Sci. USA 100, 13898–13903. [PMC free article] [PubMed]
14. Kussell, E., Shimada, J. & Shakhnovich, E. I. (2002) Proc. Natl. Acad. Sci. USA 99, 5343–5348. [PMC free article] [PubMed]
15. Favrin, G., Irback, A. & Wallin, S. (2002) Proteins 47, 99–105. [PubMed]
16. Liwo, A., Khalili, M. & Scheraga, H. A. (2005) Proc. Natl. Acad. Sci. USA 102, 2362–2367. [PMC free article] [PubMed]
17. Fujitsuka, Y., Takada, S., Luthey-Schulten, Z. A. & Wolynes, P. G. (2004) Proteins 54, 88–103. [PubMed]
18. Herges, T. & Wenzel, W. (2004) Biophys. J. 87, 3100–3109. [PMC free article] [PubMed]
19. Liwo, A., Lee, J., Ripoll, D. R., Pillardy, J. & Scheraga, H. A. (1999) Proc. Natl. Acad. Sci. USA 96, 5482–5485. [PMC free article] [PubMed]
20. Takada, S. (2001) Proteins 42, 85–98. [PubMed]
21. Papoian, G. A., Ulander, J., Eastwood, M. P., Luthey-Schulten, Z. & Wolynes, P. G. (2004) Proc. Natl. Acad. Sci. USA 101, 3352–3357. [PMC free article] [PubMed]
22. Zagrovic, B., Snow, C. D., Shirts, M. R. & Pande, V. S. (2002) J. Mol. Biol. 323, 927–937. [PubMed]
23. Shortle, D., Simons, K. T. & Baker, D. (1998) Proc. Natl. Acad. Sci. USA 95, 11158–11162. [PMC free article] [PubMed]
24. Zhang, Y. & Skolnick, J. (2004) J. Comput. Chem. 25, 865–871. [PubMed]
25. Bonneau, R., Strauss, C. E. & Baker, D. (2001) Proteins 43, 1–11. [PubMed]
26. Reva, B. A., Finkelstein, A. V. & Skolnick, J. (1998) Folding Des. 3, 141–147. [PubMed]
27. Shimada, J., Kussell, E. L. & Shakhnovich, E. I. (2001) J. Mol. Biol. 308, 79–95. [PubMed]
28. Shimada, J. & Shakhnovich, E. I. (2002) Proc. Natl. Acad. Sci. USA 99, 11175–11180. [PMC free article] [PubMed]
29. Hubner, I. A., Edmonds, K. A. & Shakhnovich, E. I. (2005) J. Mol. Biol. 349, 424–434. [PubMed]
30. Mirny, L. A. & Shakhnovich, E. I. (1996) J. Mol. Biol. 264, 1164–1179. [PubMed]
31. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997) Nucleic Acids Res. 25, 3389–3402. [PMC free article] [PubMed]
32. Banavar, J. R., Maritan, A., Micheletti, C. & Trovato, A. (2002) Proteins 47, 315–322. [PubMed]
33. Banavar, J. R. & Maritan, A. (2003) Rev. Mod. Phys. 75, 23.
34. Hubner, I. A. & Shakhnovich, E. (2005) Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top. 72, 022901. [PubMed]
35. Ramanathan, S. & Shakhnovich, E. (1994) Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdisc. Top. 50, 1303–1312. [PubMed]
36. Shakhnovich, E. I. & Gutin, A. M. (1989) Biophys. Chem. 34, 187–199. [PubMed]
37. Bryngelson, J. D. & Wolynes, P. G. (1987) Proc. Natl. Acad. Sci. USA 84, 7524–7528. [PMC free article] [PubMed]
38. Rao, F. & Caflisch, A. (2004) J. Mol. Biol. 342, 299–306. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...