• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of biophysjLink to Publisher's site
Biophys J. Nov 1997; 73(5): 2393–2403.
PMCID: PMC1181141

How are model protein structures distributed in sequence space?


The figure-to-structure maps for all uniquely folding sequences of short hydrophobic polar (HP) model proteins on a square lattice is analyzed to investigate aspects considered relevant to evolution. By ranking structures by their frequencies, few very frequent and many rare structures are found. The distribution can be empirically described by a generalized Zipf's law. All structures are relatively compact, yet the most compact ones are rare. Most sequences falling to the same structure belong to "neutral nets." These graphs in sequence space are connected by point mutations and centered around prototype sequences, which tolerate the largest number (up to 55%) of neutral mutations. Profiles have been derived from these homologous sequences. Frequent structures conserve hydrophobic cores only while rare ones are sensitive to surface mutations as well. Shape space covering, i.e., the ability to transform any structure into most others with few point mutations, is very unlikely. It is concluded that many characteristic features of the sequence-to-structure map of real proteins, such as the dominance of few folds, can be explained by the simple HP model. In analogy to protein families, nets are dense and well separated in sequence space. Potential implications in better understanding the evolution of proteins and applications to improving database searches are discussed.

Full text

Full text is available as a scanned copy of the original print version. Get a printable copy (PDF file) of the complete article (2.1M), or click on a page image below to browse page by page. Links to PubMed are also available for Selected References.

Images in this article

Click on the image to see a larger version.

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Abkevich VI, Gutin AM, Shakhnovich EI. How the first biopolymers could have evolved. Proc Natl Acad Sci U S A. 1996 Jan 23;93(2):839–844. [PMC free article] [PubMed]
  • Benner SA, Cohen MA, Gonnet GH. Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J Mol Biol. 1993 Feb 20;229(4):1065–1082. [PubMed]
  • Bryngelson JD, Wolynes PG. Spin glasses and the statistical mechanics of protein folding. Proc Natl Acad Sci U S A. 1987 Nov;84(21):7524–7528. [PMC free article] [PubMed]
  • Camacho CJ, Thirumalai D. Minimum energy compact structures of random sequences of heteropolymers. Phys Rev Lett. 1993 Oct 11;71(15):2505–2508. [PubMed]
  • Casari G, Sippl MJ. Structure-derived hydrophobic potential. Hydrophobic potential derived from X-ray structures of globular proteins is able to identify native folds. J Mol Biol. 1992 Apr 5;224(3):725–732. [PubMed]
  • Chan HS, Dill KA. Polymer principles in protein structure and stability. Annu Rev Biophys Biophys Chem. 1991;20:447–490. [PubMed]
  • Chan HS, Dill KA. Comparing folding codes for proteins and polymers. Proteins. 1996 Mar;24(3):335–344. [PubMed]
  • Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992 Jun 18;357(6379):543–544. [PubMed]
  • Cordes MH, Davidson AR, Sauer RT. Sequence space, folding and protein design. Curr Opin Struct Biol. 1996 Feb;6(1):3–10. [PubMed]
  • Czirók A, Mantegna RN, Havlin S, Stanley HE. Correlations in binary sequences and a generalized Zipf analysis. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1995 Jul;52(1):446–452. [PubMed]
  • Davidson AR, Lumb KJ, Sauer RT. Cooperatively folded proteins in random sequence libraries. Nat Struct Biol. 1995 Oct;2(10):856–864. [PubMed]
  • Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, Chan HS. Principles of protein folding--a perspective from simple exact models. Protein Sci. 1995 Apr;4(4):561–602. [PMC free article] [PubMed]
  • Fontana W, Stadler PF, Bornberg-Bauer EG, Griesmacher T, Hofacker IL, Tacker M, Tarazona P, Weinberger ED, Schuster P. RNA folding and combinatory landscapes. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1993 Mar;47(3):2083–2099. [PubMed]
  • Goldstein RA, Luthey-Schulten ZA, Wolynes PG. Optimal protein-folding codes from spin-glass theory. Proc Natl Acad Sci U S A. 1992 Jun 1;89(11):4918–4922. [PMC free article] [PubMed]
  • Goodsell DS, Olson AJ. Soluble proteins: size, shape and function. Trends Biochem Sci. 1993 Mar;18(3):65–68. [PubMed]
  • Govindarajan S, Goldstein RA. Why are some proteins structures so common? Proc Natl Acad Sci U S A. 1996 Apr 16;93(8):3341–3345. [PMC free article] [PubMed]
  • Huang ES, Subbiah S, Levitt M. Recognizing native folds by the arrangement of hydrophobic and polar residues. J Mol Biol. 1995 Oct 6;252(5):709–720. [PubMed]
  • Hunt NG, Gregoret LM, Cohen FE. The origins of protein secondary structure. Effects of packing density and hydrogen bonding studied by a fast conformational search. J Mol Biol. 1994 Aug 12;241(2):214–225. [PubMed]
  • Huynen MA, Stadler PF, Fontana W. Smoothness within ruggedness: the role of neutrality in adaptation. Proc Natl Acad Sci U S A. 1996 Jan 9;93(1):397–401. [PMC free article] [PubMed]
  • Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH. Protein design by binary patterning of polar and nonpolar amino acids. Science. 1993 Dec 10;262(5140):1680–1685. [PubMed]
  • Kimura M. Evolutionary rate at the molecular level. Nature. 1968 Feb 17;217(5129):624–626. [PubMed]
  • King JL, Jukes TH. Non-Darwinian evolution. Science. 1969 May 16;164(3881):788–798. [PubMed]
  • Koshi JM, Goldstein RA. Mutation matrices and physical-chemical properties: correlations and implications. Proteins. 1997 Mar;27(3):336–344. [PubMed]
  • Li H, Helling R, Tang C, Wingreen N. Emergence of preferred structures in a simple model of protein folding. Science. 1996 Aug 2;273(5275):666–669. [PubMed]
  • Lipman DJ, Wilbur WJ. Modelling neutral and selective evolution of protein folding. Proc Biol Sci. 1991 Jul 22;245(1312):7–11. [PubMed]
  • Lupas A. Coiled coils: new structures and new functions. Trends Biochem Sci. 1996 Oct;21(10):375–382. [PubMed]
  • Smith JM. Natural selection and the concept of a protein space. Nature. 1970 Feb 7;225(5232):563–564. [PubMed]
  • Onuchic JN, Wolynes PG, Luthey-Schulten Z, Socci ND. Toward an outline of the topography of a realistic protein-folding funnel. Proc Natl Acad Sci U S A. 1995 Apr 11;92(8):3626–3630. [PMC free article] [PubMed]
  • Orengo CA, Jones DT, Thornton JM. Protein superfamilies and domain superfolds. Nature. 1994 Dec 15;372(6507):631–634. [PubMed]
  • Reidhaar-Olson JF, Sauer RT. Combinatorial cassette mutagenesis as a probe of the informational content of protein sequences. Science. 1988 Jul 1;241(4861):53–57. [PubMed]
  • Sali A, Shakhnovich E, Karplus M. Kinetics of protein folding. A lattice model study of the requirements for folding to the native state. J Mol Biol. 1994 Feb 4;235(5):1614–1636. [PubMed]
  • Schulman BA, Kim PS. Proline scanning mutagenesis of a molten globule reveals non-cooperative formation of a protein's overall topology. Nat Struct Biol. 1996 Aug;3(8):682–687. [PubMed]
  • Schuster P, Fontana W, Stadler PF, Hofacker IL. From sequences to shapes and back: a case study in RNA secondary structures. Proc Biol Sci. 1994 Mar 22;255(1344):279–284. [PubMed]
  • Skolnick J, Kolinski A. Simulations of the folding of a globular protein. Science. 1990 Nov 23;250(4984):1121–1125. [PubMed]
  • Strait BJ, Dewey TG. The Shannon information entropy of protein sequences. Biophys J. 1996 Jul;71(1):148–155. [PMC free article] [PubMed]
  • Yee DP, Chan HS, Havel TF, Dill KA. Does compactness induce secondary structure in proteins? A study of poly-alanine chains computed by distance geometry. J Mol Biol. 1994 Aug 26;241(4):557–573. [PubMed]
  • Yue K, Fiebig KM, Thomas PD, Chan HS, Shakhnovich EI, Dill KA. A test of lattice protein folding algorithms. Proc Natl Acad Sci U S A. 1995 Jan 3;92(1):325–329. [PMC free article] [PubMed]

Articles from Biophysical Journal are provided here courtesy of The Biophysical Society


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...