• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Dec 7, 1999; 96(25): 14258–14263.

A physical basis for protein secondary structure


A physical theory of protein secondary structure is proposed and tested by performing exceedingly simple Monte Carlo simulations. In essence, secondary structure propensities are predominantly a consequence of two competing local effects, one favoring hydrogen bond formation in helices and turns, the other opposing the attendant reduction in sidechain conformational entropy on helix and turn formation. These sequence specific biases are densely dispersed throughout the unfolded polypeptide chain, where they serve to preorganize the folding process and largely, but imperfectly, anticipate the native secondary structure.

Elements of secondary structure—α-helix, β-sheet, and tight turns—are ubiquitous in proteins (1). What is the physical reason for their pervasive occurrence? Do these patterns arise as a direct consequence of formative interactions within the elements themselves (i.e., locally determined), or are they an indirect consequence of longer range interactions (i.e., globally determined)?

Surprisingly, the field lacks a simple physicochemical theory of secondary structure in peptides and proteins (2, 3). Instead, prediction methods tend to be based on statistical likelihoods (4) or, more recently, on neural nets (5). Alternating patterns of hydrophilic and hydrophobic residues have been noted in amphipathic helices and strands (6, 7), but the interactions they engender are exerted primarily within folded proteins and fail to explain the appearance of corresponding structures in isolated peptides. Statistical mechanical treatments (see, e.g., ref. 8) of secondary structure can be effective (9) but require numerous adjustable, empirical parameters. Surely, the absence of a simple physical theory of secondary structure has contributed to the continuing suspicion that none exists.

Yet, numerous experiments on the kinetics of protein folding show that native-like secondary structure elements form early and rapidly, before substantial tertiary organization. Still, such elements might be statistical accidents that play little or no role in guiding subsequent folding events.

Here, we propose a physical theory for secondary structure based on sterics and local interactions. Our findings demonstrate that local, intrinsic, sequence-dependent biases to be in helix, strand, and turns are densely dispersed throughout the polypeptide chain and are unlikely to be merely accidental (2, 10). At root, these biases are grounded in sterics (11), the most important organizing factor in protein conformation (12). Work in this area began with Sasisekharan (13) and Ramachandran (14), who showed that the conformational space available to amino acids is highly restricted. All residues except glycine and proline are largely constrained to occupy either of two mainchain regions. In one, the polypeptide chain is contracted; in the other, it is extended. Apart from these two, remaining alternatives are disfavored because of steric interference.

In essence, secondary structure bias is largely a consequence of the balance between two opposing local forces that govern the position of equilibrium between these two mainchain states. The competing forces are attractive local interactions vs. sidechain conformational restriction. The former is enthalpic and favorable; the latter is entropic and unfavorable. Contracted conformations are compatible with local hydrogen bonds—both mainchain–mainchain and mainchain–side chain—but the bulky backbone can interfere with sidechain flexibility. Steric interference between mainchain and side chains is relieved in extended conformations, but hydrogen bonds are sacrificed in this state. In some cases, short polar side chains can compensate for loss of conformational freedom by forming hydrogen bonds to the backbone. The equilibrium between these two states—contracted and extended—is sequence-specific because sidechains differ in their steric characteristics and ability to form hydrogen bonds (1517). Glycine and proline add further complexity to this picture because their backbone geometry differs from that of the other 18 residues, but no additional principles need be invoked.

This physical explanation is applicable to both repetitive and nonrepetitive secondary structure. In repetitive structures—helix and strand—the energetic “tug-of-war” is largely between sidechain conformational entropy and mainchain hydrogen bonding. In nonrepetitive structure—tight turns (18)—the peptide chain is contracted, similar to a single turn of helix, and sidechains may clash with the bulky backbone, but stabilizing sidechain-to-mainchain hydrogen bonds can provide energetic compensation.

Driven primarily by sterics and local hydrogen bonds, these secondary structure biases are expected to emerge in the unfolded state and to preorganize all subsequent folding events. Segments with strong biases are poised to form persisting structure, especially when fortified by additional stabilizing interactions.

We test these ideas by performing short Monte Carlo simulations using linus (19) for a diverse set of experimentally interesting proteins. Computer simulations are an especially effective tool in this regard because, unlike actual experiments, only interactions of interest are included; all others can be eliminated. As described below, we find that sterics and local interactions are sufficient to engender pronounced conformational biases that largely, but imperfectly, anticipate the native secondary structure of the protein.


Protein conformational space is explored by using a conventional Metropolis Monte Carlo procedure (20). Initially, the starting conformation, C, is set to an extended chain. Progressing from the amino to the carboxy terminus, successive residues, taken three at a time, are perturbed at random, using a predefined move set, to produce a trial conformation, C′. Next, C′ is evaluated: if free of steric clash and if application of the Metropolis criterion leads to acceptance, C is set to C′. Otherwise, C′ is rejected and C is retained. A “cycle” is said to be completed when the chain has been traversed from one end to the other, using this procedure. On completion of every cycle, the structure is saved. All proteins were simulated three times, 1,000 cycles per simulation. Additional details are given below.

Chain Geometry.

Each residue, except glycine, is represented by alanine: specifically, four backbone atoms (N, Cα, C′, O) and the β-carbon (Cβ). Also, each residue, except glycine and alanine, has either one or two side chain pseudoatoms, depending on whether the side chain is β-branched. In particular, valine, threonine, and isoleucine have two additional side chain atoms; others have only one. All relevant geometric parameters for each amino acid are given in Table 5, published as supplemental data on the PNAS web site, www.pnas.org.

Scoring Function.

The scoring function used in the Metropolis criterion consists of four terms, one repulsive and three attractive: steric clashes are penalized and hydrogen bonds, hydrophobic contacts, and salt bridges are all rewarded. To preclude nonlocal effects, attractive forces are limited to nearby chain neighbors. Specifically, the three attractive terms are evaluated only between amino acids separated by no more than five residues in sequence. These four terms are now described explicitly.

Electronic clouds of atoms are not allowed to overlap. Accordingly, all conformations with a steric clash are rejected. Atomic radii are given in the supplemental data.

An H-bond of maximal strength (0.5 units) is assigned to residues i and j when the distance between the amide nitrogen of i and the carbonyl oxygen of j is ≤3.5 Å, and the out-of-plane dihedral O(j) − N(i) − CA(i) − C(i − 1) > 140°. This score scales linearly to 0.0 as the distance between donor and acceptor increases from 3.5 to 5.0 Å. All backbone amide nitrogens (except proline) are considered H-bond donors, and all backbone carbonyl oxygens are considered H-bond acceptors. Additionally, the side chains of Ser, Thr, Asn, Asp, Gln, and Glu are also considered H-bond acceptors, with a maximal score of 1.0 unit. Two additional restrictions also apply: (i) a donor and acceptor must be at least three residues apart in sequence, and (ii) no donor can participate in more than one H-bond.

A hydrophobic contact is assigned between side chain carbon atoms i and j of two residues when

equation M1

where radiusx is the atom’s contact radius. The maximal value is realized when the two atoms are in contact, and it scales linearly to zero as the separation distance increases to 1.4 Å. The maximal value is 0.5 units when both residues are hydrophobic (Cys, Ile, Leu, Met, Phe, Trp, Val), 0.25 units when one residue is hydrophobic and the other is amphipathic (Ala, His, Thr, Tyr), and 0.0 units for all other combinations.

A salt bridge is assigned to contacts between oppositely charged groups (namely, Arg or Lys with Glu or Asp), with a maximal strength of 0.5 units that scales linearly to 0.0 over a separation interval of 1.4 Å.

Move Set.

linus uses a “smart” move set in which three consecutive residues are perturbed simultaneously. Initially, a move consists of choosing one of four equiprobable categories (19) at random: α-helix, β-strand, β-turn, and random coil. Side chain torsion values are chosen at random in the range [0°, 359°]. Both β-turn and random coil moves have multiple subcategories. Four β-turn types are included: types I, I′, II, and II′. A β-turn move defines the conformation of two consecutive residues uniquely, with the third residue set to a randomly chosen value. Specifically, a three residue sequence i-j-k would have either i-j or j-k set to a β-turn conformation, with k or i, respectively, chosen randomly, resulting in eight possibilities.

To extract biases, secondary structure is assigned for all 1,000 saved conformers in a simulation, using the procedure outlined below. This ensemble is evaluated, and for every residue the fraction of conformers in each of the four secondary structures is determined. This fraction is a statistical weight, the probability that the given residue will adopt one of the four secondary structures: helix, strand, turn, or coil. We note in passing that an earlier version of linus enforced biases by “freezing” the chain, an undesirable strategy that abolished reversibility. The current protocol, which uses linus-evolved biases as sample weights, does not suffer from this deficiency.

Helixj and j + 1 and j + 2 [set membership] Hj + 1 = helix
Strandj and j + 1 and j + 2 [set membership] Sj + 1 = strand
Type I turnj + 1 [set membership] T and j + 2 [set membership] Tj + 1 = Type I turn (residue i + 1)
j [set membership] T and j + 1 [set membership] Tj + 1 = Type I turn (residue i + 2)
Type I′ turnj + 1 [set membership] T′ and j + 2 [set membership] Tj + 1 = Type I′ turn (residue i + 1)
j [set membership] T′ and j + 1 [set membership] Tj + 1 = Type I′ turn (residue i + 2)
Type II turnj + 1 [set membership] U and j + 2 [set membership] Tj + 1 = Type II turn (residue i + 1)
j [set membership] U and j + 1 [set membership] Tj + 1 = Type II turn (residue i + 2)
Type II′ turnj + 1 [set membership] U′ and j + 2 [set membership] Tj + 1 = Type II′ turn (residue i + 1)
j [set membership] U′ and j + 1 [set membership] Tj + 1 = Type II′ turn (residue i + 2)
CoilNone of the abovej + 1 = Coil

Secondary Structure Assignment.

Secondary structure is assigned to protein conformation based solely on backbone torsion angles; hydrogen bonding considerations are excluded deliberately. Our assignment criteria are suited to simulations in which only sequentially local interactions between residues are allowed, a restriction that precludes formation of β-sheet or other H-bonded interactions between sequentially distant residues. If an H-bond based method, such as dssp (21), were used to assign secondary structure, then β-strands would evade detection.

Backbone conformation space is partitioned into 36 coarse-grained bins, each represented by a letter code (Table (Table1).1). Initially, [var phi], ψ, and ω values for each residue are computed and mapped into the closest letter code. Conformation codes are then mapped into a secondary structure class. Three codes (M, O, R) belong to two classes; 28 codes belong to no class. Secondary structure classes are S = { A,F,G,L,M,R }; H = { O }; T = { J,O,P }; T′ = { j,o,p }; U = { M,R }; and U′ = { m, r }.

Table 1
Partition of backbone conformational space into coarse-grained bins

Progressing along the sequence, conformation codes for each triple of consecutive residues, left angle bracket j, j + 1, j + 2 right angle bracket, are used to classify the central residue, j + 1, into the first applicable category satisfying one of the following definitions:


The simulation protocol described in Methods has been applied to dozens of proteins, with a similar degree of success in all cases. Twelve molecules were selected for presentation here, based on their perceived interest to the experimental folding community: (i) chymotrypsin inhibitor [3ci2], (ii) intestinal fatty acid binding protein [1ifb], (iii) phage lysozyme [2lzm], (iv) myoglobin [1mbo], (v) myohemerythrin [2hmq], (vi) plastocyanin [6pcy], (vii) protein G [1gb1], (viii) ribonuclease A (7rsa), (ix) ribonuclease S-peptide, (x) ribonuclease H [2rn2], (xi) staphylococcal nuclease [1stg], and (xii) ubiquitin [1ubq]. Protein Data Bank ID codes (22) are given in square brackets. In every case, three sets of simulations were performed, each with uniform sample weights. Little variation was seen in the final sample weights among the three sets. Accordingly, the weights from all three were averaged for presentation (see Fig. 2, published as supplemental data on the PNAS web site, www.pnas.org). In each protein, local biases extracted from simulations suggest the actual secondary structure, though imperfectly.

We seek to compare these simulations to corresponding experimental data. Given the nature of the simulations—local interactions and sterics—perhaps the ideal data for comparison would be the population that emerges in the dead time of most experiments, an elusive quantity. Fragment studies are also revealing, when available. Equilibrium folding studies of partially folded states are useful as well.

Of course, comparison with the native structure is irresistible. Detailed comparisons are given in Table Table2.2. For each secondary structure element in every protein, Table Table22 lists the fraction of conformers in helix, strand, turn, and coil. In Table Table3,3, the standard errors for native segments computed from 10 independent simulations are shown for two proteins, myoglobin and GB1. The examples represent worst-case and typical-case linus simulations, respectively; in either case, standard errors are slight.

Table 2
Population statistics for each secondary structure element
Table 3
Standard error of the bias toward native secondary structure in 10 independent simulations of myoglobin and GB1

Fig. Fig.11 summarizes these data for the 36 helices, 63 strands, and 74 turns in the total set of proteins. In our simulations, sequences corresponding to actual helices have helical biases that range between 4 and 78%. With one exception, all such sequences populate helical conformers in at least 10% of the ensemble, and half of the sequences populate helical conformers in at least 35% of the ensemble. Sequences corresponding to actual strands have even stronger biases, ranging between 15 and 93%. All but four populate strand conformers in at least one-third of the ensemble. Sequences corresponding to actual turns have turn biases that range between 1 and 38%. Although weaker than both helices and strands, all but eight populate turn conformers in at least 10% of the ensemble.

Figure 1
Histogram of data from Table Table2.2. The statistical bias toward native secondary structure in helix (red), strand (green), turn (blue), and generalized turn (black) for all segments in Table Table22 is parceled into bins, with statistical ...

Often, the sum of turn and helix weights is high, indicating a contracted conformation, although not specifically a β-turn or α-helix. In fact, there is only a slight difference in conformation between a turn of helix and a Type I or Type III peptide chain turn. Accordingly, Fig. Fig.11 also plots generalized turns, defined as the sum of contracted conformations (i.e., helix + turn biases). Sequences corresponding to actual turns have generalized turn biases ranging between 2 and 76%; with one exception, all exceed 10%, and all but 12 exceed 25%.

Fig. Fig.11 and Tables Tables22 and and33 demonstrate that a pronounced bias toward the native conformation is detectable in almost every element of secondary structure, despite the simplicity of these simulations and the absence of all long range attractive interactions. To be sure, the native structure does not necessarily have the highest weights in every case. Segments in which either helix or strand bias toward a non-native conformation exceeds that of the native conformation are annotated with an asterisk in Tables Tables22 and and3.3. In this regard, it is important to emphasize that these simulations should not be viewed as a secondary structure prediction algorithm. Rather, they are only intended to test our physical explanation for secondary structure formation based on sterics and short-range attractive interactions, particularly hydrogen-bonding. As seen in Fig. Fig.1,1, a substantial bias toward the native conformation is present in almost every case. It can happen that segments with locally high helix or strand weights undergo a conformational transition when longer range interactions are included, but this issue is not addressed here.

Chymotrypsin Inhibitor.

Chymotrypsin inhibitor has been studied extensively by Fersht and coworkers (23), who find that the only region with structure before the transition state is near the helix N terminus (namely, residue 16). The simulations reveal such a bias, along with other features of the native protein.

Intestinal Fatty Acid Binding Protein.

Consistent with NMR studies (24), biases for the second helix are weak. However, residues 67–73, a β-strand in the folded protein, have a clear helix/turn bias in the simulations, and, to our knowledge, no other experimental data is available about this site.

T4 Phage Lysozyme.

Using pulsed hydrogen exchange, Lu and Dahlquist (25) find that helices A and E, together with the N-terminal β-sheet, form an early folding intermediate. Although not the most prominent simulated bias, helix E is readily apparent, as is the N-terminal β-sheet. Biases for helix A exhibit considerable turn/helix weights. This N-terminal helix belongs to the C-terminal domain (26), but our simulations are too local to include contributions from such interactions. Both helices D and H have simulated high strand weights; neither appears to be involved in formation of the early intermediate (25).


The structure of apomyoglobin has been studied extensively by NMR (27). In equilibrium studies, Wright and coworkers (28) characterized progressively folded states of the molecule. In their hierarchic picture of the folding dynamics, helices A, D, and H are the first to emerge; all have clear helical biases in simulation. In contrast, helical bias is conspicuously absent in the region of the G helix. A peptide fragment corresponding to this region was studied experimentally by Waltho et al., who found “little propensity for helix formation in aqueous solution” (ref. 29, p. 6346).


This four-helix bundle protein was studied by Dyson et al. (30), who synthesized peptide fragments that cover the molecule and analyzed their conformational preferences by NMR. Fragments corresponding to the native helices exhibit clear preferences for helix-like conformations, which are more pronounced in the A and D helices, and less pronounced in the B and C helices. Simulated biases show the opposite tendency: regions corresponding to the B and C helices have higher helical weights than those corresponding to the A and D helices.


The native structure is a Greek key β-barrel. Barrel staves bracketed by turns are well delineated by the biases, despite a complete absence of interstrand hydrogen bonds, which are precluded by our simulation protocol. The region of non-native turn/helix bias surrounding residue 60 was observed in NMR experiments of Dyson et al. (31), who studied the conformational preferences of peptide fragments that cover the molecule. They noted conspicuous “prepartitioning of the conformational space sampled by the polypeptide backbone” (ref. 31, p. 819) in these isolated peptides.

Protein G B1 Domain.

Fragment studies of Blanco and Serrano (32) confirm a tendency to populate native-like conformations in peptides corresponding to both the initial and final β-hairpins and the central helix. Simulation biases also reflect these tendencies.

Ribonuclease A and S-Peptide.

Ribonuclease S-peptide (33), residues 1–20, is the progenitor of all peptide fragment studies, and the stop signal for the N-terminal helix (residues 3–13) is known to be preserved in the isolated peptide (34). In our simulation, a bias toward helix spans the first two helices but continues through the interconnecting nonhelical region. Puzzled by this result, S-peptide was simulated in isolation; the stop signal is apparent in this case, as shown in Fig. 2 in the supplemental data.

Ribonuclease H.

Summarizing multiple kinetic and equilibrium experiments, Chamberlain and Marqusee (35) find a self-consistent hierarchic folding pathway for the molecule in which helices A and D fold first and are then augmented by helix B and β-strand 4. Each of these regions has pronounced, native-like biases. In fact, the only discrepant region between the native structure and the simulated biases is around residues 78–82, corresponding to an irregular kink between helices B and C.

Staphylococcal Nuclease.

Wang and Shortle (36) synthesized several fragments, one of them corresponding to residues 92–99, which overlap residues 87–93, a β-strand in the x-ray structure with significant helical weights in the simulation (see supplemental data). Unfortunately, no conclusion can be drawn because the region of overlap is slight and the synthesized fragment has a residue substitution (I92G).


Fragment studies of Cox et al. (37) using CD and NMR show a marked tendency toward native-like structure in the molecule’s N-terminal half but not in the C-terminal half. Notably, the N-terminal β-hairpin (residues 1–17) can be detected in the A-state. In another study, Muñoz and Serrano (9) synthesized a fragment (residues 62–76) that includes the final strand of β-sheet (residues 65–71) and found it to have modest (≈8%), non-native helical content by CD. Both studies are consistent with the simulation biases.

Our simulations include additional details not presented here. Among them, regions with high turn weights can be assigned to specific turn types (38) from their backbone dihedral angles. To better understand the physical basis for turns, a separate series of host-guest turn simulations was conducted (see Fig. 3 in the supplemental data).

Turn Simulations.

A 14-residue host sequence (Val5 -Ala-Pro-Gly-Ala-Val5) with a central turn-forming sequence (namely, Pro-Gly) was simulated by using the protocol described in Methods. Six guest residues were introduced at position six to probe residue-specific effects: Asp, Asn, Ser, Leu, Glu, and Thr. Relative to the alanyl host, Ser, Asp, Asn, and Leu increase the turn propensity of the Pro-Gly sequence whereas Glu and Thr decrease the turn propensity. For Ser, Asp, and Asn, the preferred turn conformation is Type I or III, either of which enables the guest residue sidechain to form a stabilizing hydrogen bond with the backbone amide of Gly (i + 2) and/or Ala (i + 3). For Leu, Ala, and Glu, which lack side chain to mainchain hydrogen bonds, the preferred turn conformation is Type II. Thr does not show a marked preference. In the case of Leu, a hydrophobic contact (in lieu of an H-bond) can be made with Ala (i + 3) or Val (i + 5). Details are summarized in Table Table44 and in the supplemental material.

Table 4
Population statistics for host–guest turn simulations

These simulated turn preferences are consistent with the usual turn-formers, namely, Asp, Asn, and Ser (38, 39), and they arise for understandable physical reasons (e.g., hydrogen bonding). linus simulations are sensitive enough to distinguish between Asp, which forms sidechain-backbone H-bonds readily, and Glu, which fails to do so. The simulations also show that even a nonturn former, e.g., Leu, can nonetheless stabilize a turn by using a hydrophobic interaction.


Our central purpose in this paper has been to demonstrate that pronounced biases toward protein secondary structure are present in natural protein sequences, that these biases have a discernible physical basis, and that their existence begs reinterpretation of current folding models. Unlike more sophisticated simulations that use a comprehensive potential function—e.g., ref. 40—the biases evident in Tables Tables22 and and33 are a consequence of sterics and local interactions; longer range interactions were suppressed in the simulation protocol. In every case, these biases largely, albeit imperfectly, anticipate the observed secondary structure of the folded molecule. In several cases in which the linus-evolved biases differ from native secondary structure and in which data describing early folding intermediates are available, the simulations are consistent with these experimental data (e.g., myoglobin, plastocyanin, and ubiquitin).

There has been considerable debate in the literature about whether secondary structure formation is an early folding event (2). The simulations shown here—together with dozens of others that were conducted but not presented—confirm that sterically driven segments of nascent secondary structure can emerge in the unfolded state and preorganize all subsequent folding events.

If these simulations reproduced early folding events reliably, then chain regions with a strong bias toward the “wrong” secondary structure could signal the presence of a non-native intermediate. This need not be true for discrepancies involving weak biases, which may simply have lacked ample opportunity to develop. However, a strong bias toward a discrepant contracted conformation—such as bias toward helix in a known β-strand—would indicate the presence (though not the stability) of an early, non-native intermediate; examples include the non-native helices in intestinal fatty acid-binding protein and plastocyanin, described in the previous section, or those in β-lactoglobulin, described in the review by Baldwin and Rose (3).

Conformational biases arise for several reasons, but the primary factor involves steric interplay between the α- and β-regions of the [var phi],ψ map. The α-region (near [var phi] = −60°, ψ = −40°) is compatible with the formation of local hydrogen bonds, but in this contracted state, sidechains tend to clash with local backbone, resulting in unfavorable conformational restriction. The price of restriction is measured as loss of sidechain conformational entropy (11, 41). As that price mounts, chain segments are driven toward the remaining alternative, the β-region (near [var phi] = −120°, ψ = +130°), an extended conformation in which steric clash between sidechain and backbone is relieved.

In this physical context, β-strand is appropriately regarded as authentic secondary structure, even in the absence of a hydrogen-bonded partner strand. Accordingly, β-sheet, comprised of two or more H-bonded β-stands, is more appropriately classified as tertiary structure, in that it involves the spatial organization of multiple β-strands, which are often removed from each other in sequence. This distinction—or the lack of it—has spawned continuing confusion about suitable procedures to identify secondary structure from atomic coordinates (42) and motivated our own approach (in Methods), which is based solely on dihedral angles, not hydrogen bonding.

The conformational biases were extracted from Monte Carlo simulations in which all moves are weighted equally. As such, these values almost certainly underestimate the true bias in the protein. A better estimate could have been obtained by using the extracted biases as weights in another round of simulation. In fact, our simulations are typically run by using just such a protocol. However, the simpler protocol was adopted here deliberately because nothing more complicated than that is needed to demonstrate the existence of sharply differentiated, broadly dispersed chain bias.

Many proteins are found to adopt molten globule intermediates (43) at low pH, a state having substantial secondary structure but lacking in specific tertiary interactions. In this regard, the existence of nascent secondary structure segments, as described here, anticipates such states. Sterically driven biases are expected to manifest themselves under essentially all folding conditions, and they would become independently observable whenever specific conditions can be found that destabilize the native protein (relative to the unfolded form) but not some intermediate form.

Conformational Entropy and Protein Folding.

Anfinsen proposed that proteins attain their native state by folding to a global minimum of Gibbs free energy (44). Typically, this hypothesis has been interpreted to mean that the native conformation of individual molecules also corresponds to a global minimum in internal energy because a fully folded protein will have lost its conformational entropy, or almost so. Thus, conformational entropy is thought to play an insignificant role in the thermodynamics of protein folding. Specifically, the Boltzmann-weighted populations of any two states x and y, (gy/gx)e−(Uy−Ux)/kT (where k = the Boltzmann constant and T = absolute temperature), are thought to depend predominantly on their energy difference, Uy − Ux, and not on the degeneracy of state, gy/gx. In contrast, the work presented here reaches the conclusion that conformational entropy, reflected in the degeneracy, is the main factor that discriminates between the two energetically degenerate ground states, α and β, and, in so doing, preorganizes the protein.

The Levinthal Paradox.

The issue of secondary structure bias is intimately related to the Levinthal paradox, which argues that a folding protein does not explore conformational hyperspace freely; otherwise, it would encounter an insoluble search problem (45). For Levinthal, this insight was not a paradox at all, but a convincing demonstration that some intrinsic constraint limits the effective size of conformational space. In this view, proteins solve the “multiple minimum problem” not by an extensive search that identifies the deepest minimum but by a limited search that avoids false minima. The existence of intrinsic bias resolves this paradox by prejudicing the ensemble of available folding trajectories toward the native minimum (46). Thus, a folding protein need not discriminate among an astronomical number of conformations because intrinsic bias “steers” the molecule toward a high degree of preorganization.

“Protein Micelles.”

The prevalence of native-like, stable subdomains (47, 48) in proteins is an expected consequence of intrinsic chain bias. Segments with strong biases are poised to form persisting structure, especially when fortified by additional stabilizing interactions. In this context, it is important to distinguish between stability and specificity (49). Stability is associated with the equilibrium between folded and unfolded forms in a cooperative, two-state folding process. Specificity is associated with conformational particulars of a given folded form (e.g., why does the lysozyme sequence adopt the lysozyme fold and not, for example, the ribonuclease fold?). If the protein’s conformational specificity is established primarily by built-in bias, as this paper has attempted to demonstrate, then stabilizing interactions can be quite nonspecific. Like folding up a carpenter’s rule, the preorganized segments and their interconnecting turns constrain the folding process, which can then be exerted via nonspecific driving forces, such as solvent-squeezing and hydrophobic burial. Thus, a chain segment long enough to adopt conformations with protein-like surface-to-volume ratios (i.e., ≥≈35 residues) (50, 51), and that spans several elements of impending secondary structure with protein-like sequence composition, would be sufficient to engender a stable subdomain. In this view, such subdomains are merely “polypeptide micelles” with an intrinsic chain bias. Indeed, many examples in the literature are consistent with this interpretation (5255).

Supplementary Material

Supplemental Data:


We are indebted to our colleagues—L. Mario Amzel, Robert L. Baldwin, Trevor P. Creamer, Eaton E. Lattman, Venkatesh Murthy, and Rohit Pappu—for many good discussions, to the referees for substantive suggestions, and to grants from the National Institutes of Health and the Mathers Foundation for support.


1. Richardson J S. Adv Protein Chem. 1981;34:168–340.
2. Baldwin R L, Rose G D. Trends Biochem Sci. 1999;24:26–33. [PubMed]
3. Baldwin R L, Rose G D. Trends Biochem Sci. 1999;24:77–83. [PubMed]
4. Fasman G. The Development of the Prediction of Protein Structure. New York: Plenum; 1989.
5. Rost B, Sander C. Proteins Struct Funct Genet. 1994;19:55–72. [PubMed]
6. Eisenberg D, Weiss R M, Terwilliger T C. Proc Natl Acad Sci USA. 1984;81:140–144. [PMC free article] [PubMed]
7. Kamtekar S, Schiffer J M, Xiong H, Babik J M, Hecht M H. Science. 1993;262:1680–1685. [PubMed]
8. Zimm B H, Bragg J K. J Chem Phys. 1959;31:526–535.
9. Muñoz V, Serrano L. Nat Struct Biol. 1994;1:399–409. [PubMed]
10. Aurora R, Creamer T P, Srinivasan R, Rose G D. J Biol Chem. 1997;272:1413–1416. [PubMed]
11. Creamer T P, Rose G D. Proc Natl Acad Sci USA. 1992;89:5937–5941. [PMC free article] [PubMed]
12. Richards F M. Annu Rev Biophys Bioeng. 1977;6:151–176. [PubMed]
13. Sasisekharan V. In: Stereochemical Criteria for Polypeptide and Protein Structures. Ramanathan N, editor. New York: Wiley; 1962. pp. 39–78.
14. Ramachandran G N, Sasisekharan V. Adv Protein Chem. 1968;23:283–438. [PubMed]
15. Creamer T P, Rose G D. Proteins Struct Funct Genet. 1994;19:85–97. [PubMed]
16. Creamer T P, Rose G D. Protein Sci. 1995;4:1305–1314. [PMC free article] [PubMed]
17. Lee K H, Xie D, Amzel L M. Proteins Struct Funct Genet. 1994;20:68–84. [PubMed]
18. Venkatachalam C M. Biopolymers. 1968;6:1425–1436. [PubMed]
19. Srinivasan R, Rose G D. Proteins Struct Funct Genet. 1995;22:81–99. [PubMed]
20. Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H, Teller E. J Chem Phys. 1953;21:1087–1092.
21. Kabsch W, Sander C. Biopolymers. 1983;22:2577–2637. [PubMed]
22. Bernstein F C, Koetzle T G, Williams G J B, Meyer E F, Jr, Brice M D, Rogers J R, Kennard O, Shimanouchi T, Tasumi M. J Mol Biol. 1977;112:535–542. [PubMed]
23. Itzhaki L S, Otzen D E, Fersht A R. J Mol Biol. 1995;254:260–288. [PubMed]
24. Hodsdon M E, Toner J J, Cistola D P. J Biomol NMR. 1995;6:198–210. [PubMed]
25. Lu J, Dahlquist F W. Biochemistry. 1992;31:4749–4756. [PubMed]
26. Llinas M, Marqusee S. Protein Sci. 1998;7:96–104. [PMC free article] [PubMed]
27. Cocco M J, Lecomte J T J. Protein Sci. 1994;3:267–281. [PMC free article] [PubMed]
28. Eliezer D, Yao J, Wright P E. Nat Struct Biol. 1998;5:148–155. [PubMed]
29. Waltho J P, Feher V A, Merutka G, Dyson H J, Wright P E. Biochemistry. 1993;32:6337–6347. [PubMed]
30. Dyson H J, Merutka G, Waltho J P, Lerner R A, Wright P E. J Mol Biol. 1992;226:795–817. [PubMed]
31. Dyson H J, Sayre J R, Merutka G, Shin H C, Lerner R A, Wright P E. J Mol Biol. 1992;226:819–835. [PubMed]
32. Blanco F J, Serrano L. Eur J Biochem. 1995;230:634–649. [PubMed]
33. Richards F M. Proc Natl Acad Sci USA. 1958;44:162–166. [PMC free article] [PubMed]
34. Kim P S, Baldwin R L. Nature (London) 1984;307:329–334. [PubMed]
35. Chamberlain A K, Marqusee S. Structure (London) 1997;5:859–863. [PubMed]
36. Wang Y, Shortle D. Fold Des. 1997;2:93–100. [PubMed]
37. Cox J P L, Evans P A, Packman L C, Williams D H, Woolfson D N. J Mol Biol. 1993;234:483–492. [PubMed]
38. Rose G D, Gierasch L M, Smith J A. Adv Protein Chem. 1985;37:1–109. [PubMed]
39. Chou P Y, Fasman G D. Biophys J. 1979;26:367–383. [PMC free article] [PubMed]
40. Schaefer M, Bartels C, Karplus M. J Mol Biol. 1998;284:835–848. [PubMed]
41. Lee K H, Xie D, Freire E, Amzel L M. Proteins Struct Funct Genet. 1994;20:68–84. [PubMed]
42. King S M, Johnson W C. Proteins. 1999;35:313–320. [PubMed]
43. Kuwajima K. Proteins Struct Funct Genet. 1989;6:87–103. [PubMed]
44. Anfinsen C B. Science. 1973;181:223–230. [PubMed]
45. Levinthal C. In: Mössbauer Spectroscopy in Biological Systems. Debrunner P, Tsibris J C M, Münck E, editors. Urbana, IL: Univ. of Illinois Press; 1969. pp. 22–24.
46. Zwanzig R, Szabo A, Bagchi B. Proc Natl Acad Sci USA. 1992;89:20–22. [PMC free article] [PubMed]
47. Crippen G M. J Mol Biol. 1978;126:315–332. [PubMed]
48. Rose G D. J Mol Biol. 1979;134:447–470. [PubMed]
49. Lattman E E, Rose G D. Proc Natl Acad Sci USA. 1993;90:439–441. [PMC free article] [PubMed]
50. Zehfus M H, Rose G D. Biochemistry. 1986;25:5759–5765. [PubMed]
51. Rose G D, Wetlaufer D B. Nature (London) 1977;268:769–770. [PubMed]
52. Gutte B, Daumigen M, Wittschieber E. Nature (London) 1979;281:650–655. [PubMed]
53. Oas T G, Kim P S. Nature (London) 1988;336:42–48. [PubMed]
54. Constans A J, Mayer M R, Sukits S F, Lecomte J T. Protein Sci. 1998;7:1983–1993. [PMC free article] [PubMed]
55. Chamberlain, A. K., Fischer, K. F., Reardon, D., Handel, T. M. & Marqusee, S. (1999) Protein Sci., in press. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...