What is the protein design alphabet?

Proteins. 2004 Mar 1;54(4):622-8. doi: 10.1002/prot.10633.

Abstract

Selecting a protein sequence that corresponds to a specific three-dimensional protein structure is known as the protein design problem. One principal bottleneck in solving this problem is our lack of knowledge of precise atomic interactions. Using a simple model of amino acid interactions, we determine three crucial factors that are important for solving the protein design problem. Among these factors is the protein alphabet-a set of sequence elements that encodes protein structure. Our model predicts that alphabet size is independent of protein length, suggesting the possibility of designing a protein of arbitrary length with the natural protein alphabet. We also find that protein alphabet size is governed by protein structural properties and the energetic properties of the protein alphabet units. We discover that the usage of average types of amino acid in proteins is less than expected if amino acids were chosen randomly with naturally occurring frequencies. We propose three possible scenarios that account for amino acid underusage in proteins. These scenarios suggest the possibility that amino acids themselves might not constitute the alphabet of natural proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / analysis*
  • Amino Acids / metabolism
  • Drug Design
  • Hydrophobic and Hydrophilic Interactions
  • Protein Conformation
  • Protein Engineering*
  • Protein Folding*
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Structure-Activity Relationship
  • Thermodynamics

Substances

  • Amino Acids
  • Proteins