NCBI » Bookshelf » Molecular Cell Biology » Protein Structure and Function » 3.1 Hierarchical Structure of Proteins
 
mcb
Molecular Cell Biology
4th
Harvey Lodish,1 Arnold Berk,2 Lawrence Zipursky,2 Paul Matsudaira,3 David Baltimore,4 and James Darnell5
1Whitehead Institute for Biomedical Research and Massachusetts Institute of Technology
2Molecular Biology Institute, University of California, Los Angeles
3Howard Hughes Medical Institute, School of Medicine, University of California, Los Angeles
4California Institute of Technology (Caltech)
5Rockefeller University, New York
W. H. Freeman0-7167-3136-32000
cell biologymolecular biology

 3:  3.1 Hierarchical Structure of Proteins

Proteins are designed to bind every conceivable molecule—from simple ions to large complex molecules like fats, sugars, nucleic acids, and other proteins. They catalyze an extraordinary range of chemical reactions, provide structural rigidity to the cell, control flow of material through membranes, regulate the concentrations of metabolites, act as sensors and switches, cause motion, and control gene function. The three-dimensional structures of proteins have evolved to carry out these functions efficiently and under precise control. The spatial organization of proteins, their shape in three dimensions, is a key to understanding how they work.

One of the major areas of biological research today is how proteins, constructed from only 20 different amino acids, carry out the incredible array of diverse tasks that they do. Unlike the intricate branched structure of carbohydrates, proteins are single, unbranched chains of amino acid monomers. The unique shape of proteins arises from noncovalent interactions between regions in the linear sequence of amino acids. Only when a protein is in its correct three-dimensional structure, or conformation, is it able to function efficiently. A key concept in understanding how proteins work is that function is derived from three-dimensional structure, and three-dimensional structure is specified by amino acid sequence.

The Amino Acids Composing Proteins Differ Only in Their Side Chains

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch3f1.jpg.

Figure 3-1

.

   Amino acids, the monomeric units that link together to form proteins, have a common structure

The α carbon atom (green) of each amino acid is bonded to four different chemical groups and thus is asymmetric. The side chain, or R group (red), is unique to each amino acid. The diversity of natural proteins reflects different linear combinations of the 20 naturally occurring amino acids. The short peptide shown here, containing only four amino acids, has 204, or 160,000, possible sequences.

Amino acids are the monomeric building blocks of proteins. The α carbon atom (Cα) of amino acids, which is adjacent to the carboxyl group, is bonded to four different chemical groups: an amino (NH2) group, a carboxyl (COOH) group, a hydrogen (H) atom, and one variable group, called a side chain or R group (Figure 3-1). All 20 different amino acids have this same general structure, but their side-chain groups vary in size, shape, charge, hydrophobicity, and reactivity.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch3f2.jpg.

Figure 3-2

.

   The structures of the 20 common amino acids grouped into three categories: hydrophilic, hydrophobic, and special amino acids

The side chain determines the characteristic properties of each amino acid. Shown are the zwitterion forms, which exist at the pH of the cytosol. In parentheses are the three-letter and one-letter abbreviations for each amino acid.

The amino acids can be considered the alphabet in which linear proteins are “written.” Students of biology must be familiar with the special properties of each letter of this alphabet, which are determined by the side chain. Amino acids can be classified into a few distinct categories based primarily on their solubility in water, which is influenced by the polarity of their side chains (Figure 3-2). Amino acids with polar side groups tend to be on the surface of proteins; by interacting with water, they make proteins soluble in aqueous solutions. In contrast, amino acids with nonpolar side groups avoid water and aggregate to form the waterinsoluble core of proteins. The polarity of amino acid side chains thus is one of the forces responsible for shaping the final three-dimensional structure of proteins.

Hydrophilic, or water-soluble, amino acids have ionized or polar side chains. At neutral pH, arginine and lysine are positively charged; aspartic acid and glutamic acid are negatively charged and exist as aspartate and glutamate. These four amino acids are the prime contributors to the overall charge of a protein. A fifth amino acid, histidine, has an imidazole side chain, which has a pKa of 6.8, the pH of the cytoplasm. As a result, small shifts of cellular pH will change the charge of histidine side chains:

graphic element

The activities of many proteins are modulated by pH through protonation of histidine side chains. Asparagine and glutamine are uncharged but have polar amide groups with extensive hydrogen-bonding capacities. Similarly, serine and threonine are uncharged but have polar hydroxyl groups, which also participate in hydrogen bonds with other polar molecules. Because the charged and polar amino acids are hydrophilic, they are usually found at the surface of a water-soluble protein, where they not only contribute to the solubility of the protein in water but also form binding sites for charged molecules.

Hydrophobic amino acids have aliphatic side chains, which are insoluble or only slightly soluble in water. The side chains of alanine, valine, leucine, isoleucine, and methionine consist entirely of hydrocarbons, except for the sulfur atom in methionine, and all are nonpolar. Phenylalanine, tyrosine, and tryptophan have large bulky aromatic side groups. As explained in Chapter 2, hydrophobic molecules avoid water by coalescing into an oily or waxy droplet. The same forces cause hydrophobic amino acids to pack in the interior of proteins, away from the aqueous environment. Later in this chapter, we will see in detail how hydrophobic residues line the surface of membrane proteins that reside in the hydrophobic environment of the lipid bilayer.

Lastly, cysteine, glycine, and proline exhibit special roles in proteins because of the unique properties of their side chains. The side chain of cysteine contains a reactive sulfhydryl group (SH), which can oxidize to form a disulfide bond (SS) to a second cysteine:

graphic element

Regions within a protein chain or in separate chains sometimes are cross-linked covalently through disulfide bonds. Although disulfide bonds are rare in intracellular proteins, they are commonly found in extracellular proteins, where they help maintain the native, folded structure. The smallest amino acid, glycine, has a single hydrogen atom as its R group. Its small size allows it to fit into tight spaces. Unlike any of the other common amino acids, proline has a cyclic ring that is produced by formation of a covalent bond between its R group and the amino group on Cα. Proline is very rigid, and its presence creates a fixed kink in a protein chain. Proline and glycine are sometimes found at points on a protein’s surface where the chain loops back into the protein.

The 6225 known and predicted proteins encoded by the yeast genome have an average molecular weight (MW) of 52,728 and contain, on average, 466 amino acid residues. Assuming that these average values represent a “typical” eukaryotic protein, then the average molecular weight of amino acids is 113, taking their average relative abundance in proteins into account. This is a useful number to remember, as we can use it to estimate the number of residues from the molecular weight of a protein or vice versa. Some amino acids are more abundant in proteins than other amino acids. Cysteine, tryptophan, and methionine are rare amino acids; together they constitute approximately 5 percent of the amino acids in a protein. Four amino acids—leucine, serine, lysine, and glutamic acid—are the most abundant amino acids, totaling 32 percent of all the amino acid residues in a typical protein. However, the amino acid composition of proteins can vary widely from these values. For example, as discussed in later sections, proteins that reside in the lipid bilayer are enriched in hydrophobic amino acids.

Peptide Bonds Connect Amino Acids into Linear Chains

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch3f3a.jpg.
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch3f3b.jpg.

Figure 3-3

.

   The peptide bond

(a) A condensation reaction between two amino acids forms the peptide bond, which links all the adjacent residues in a protein chain. (b) Side-chain groups (R) extend from the backbone of a protein chain, in which the amino N, α carbon, carbonyl carbon sequence is repeated throughout.

Nature has evolved a single chemical linkage, the peptide bond, to connect amino acids into a linear, unbranched chain. The peptide bond is formed by a condensation reaction between the amino group of one amino acid and the carboxyl group of another (Figure 3-3a). The repeated amide N, Cα, and carbonyl C atoms of each amino acid residue form the backbone of a protein molecule from which the various side-chain groups project. As a consequence of the peptide linkage, the backbone has polarity, since all the amino groups lie to the same side of the Cα atoms. This leaves at opposite ends of the chain a free (unlinked) amino group (the N-terminus) and a free carboxyl group (the C-terminus). A protein chain is conventionally depicted with its N-terminal amino acid on the left and its C-terminal amino acid on the right (Figure 3-3b).

Many terms are used to denote the chains formed by polymerization of amino acids. A short chain of amino acids linked by peptide bonds and having a defined sequence is a peptide; longer peptides are referred to as polypeptides. Peptides generally contain fewer than 20–30 amino acid residues, whereas polypeptides contain as many as 4000 residues. We reserve the term protein for a polypeptide (or a complex of polypeptides) that has a threedimensional structure. It is implied that proteins and peptides represent natural products of a cell.

The size of a protein or a polypeptide is reported as its mass in daltons (a dalton is 1 atomic mass unit) or as its molecular weight (a dimensionless number). For example, a 10,000-MW protein has a mass of 10,000 daltons (Da), or 10 kilodaltons (kDa). In the last section of this chapter, we will discuss different methods for measuring the sizes and other physical characteristics of proteins.

Four Levels of Structure Determine the Shape of Proteins

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch3f4.jpg.

Figure 3-4

.

   Four levels of structure in hemagglutinin, which is a long multimeric molecule whose three identical subunits are each composed of two chains, HA1 and HA2

(a) Primary structure is illustrated by the amino acid sequence of residues 68 –195 of HA1. This region is used by influenza virus to bind to animal cells. The one-letter amino acid code is used. Secondary structure is represented diagrammatically beneath the sequence, showing regions of the polypeptide chain that are folded into α helices (light blue cylinders), β strands (light green arrows), and random coils (white strands). (b) Tertiary structure constitutes the folding of the helices and strands in each HA subunit into a compact structure that is 13.5 nm long and divided into two domains. The membrane-distal domain is folded into a globular conformation. The blue and green segments in this domain correspond to the sequence shown in part (a). The proximal domain, which lies adjacent to the viral membrane, has a stemlike conformation due to alignment of two long helices of HA2 (dark blue) with β strands in HA1. Short turns and longer loops, which usually lie at the surface of the molecule, connect the helices and strands in a given chain. (c) The quaternary structure comprises the three subunits of HA; the structure is stabilized by lateral interactions among the long helices (dark blue) in the subunit stems, forming a triple-stranded coiled-coil stalk. Each of the distal globular domains in trimeric hemagglutinin has a site (red) for binding sialic acid molecules on the surface of target cells. Like many membrane proteins, HA has several covalently bound carbohydrate (CHO) chains.

The structure of proteins commonly is described in terms of four hierarchical levels of organization. These levels are illustrated in Figure 3-4, which depicts the structure of hemagglutinin, a surface protein on the influenza virus. This protein binds to the surface of animal cells, including human cells, and is responsible for the infectivity of the flu virus.

The primary structure of a protein is the linear arrangement, or sequence, of amino acid residues that constitute the polypeptide chain.

Secondary structure refers to the localized organization of parts of a polypeptide chain, which can assume several different spatial arrangements. A single polypeptide may exhibit all types of secondary structure. Without any stabilizing interactions, a polypeptide assumes a random-coil structure. However, when stabilizing hydrogen bonds form between certain residues, the backbone folds periodically into one of two geometric arrangements: an α helix, which is a spiral, rodlike structure, or a β sheet, a planar structure composed of alignments of two or more β strands, which are relatively short, fully extended segments of the backbone. Finally, U-shaped four-residue segments stabilized by hydrogen bonds between their arms are called turns. They are located at the surfaces of proteins and redirect the polypeptide chain toward the interior. (These structures will be discussed in greater detail later.)

Tertiary structure, the next-higher level of structure, refers to the overall conformation of a polypeptide chain, that is, the three-dimensional arrangement of all the amino acids residues. In contrast to secondary structure, which is stabilized by hydrogen bonds, tertiary structure is stabilized by hydrophobic interactions between the nonpolar side chains and, in some proteins, by disulfide bonds. These stabilizing forces hold the α helices, β strands, turns, and random coils in a compact internal scaffold. Thus, a protein’s size and shape is dependent not only on its sequence but also on the number, size, and arrangement of its secondary structures. For proteins that consist of a single polypeptide chain, monomeric proteins, tertiary structure is the highest level of organization.

Multimeric proteins contain two or more polypeptide chains, or subunits, held together by noncovalent bonds. Quaternary structure describes the number (stoichiometry) and relative positions of the subunits in a multimeric protein. Hemagglutinin is a trimer of three identical subunits; other multimeric proteins can be composed of any number of identical or different subunits.

In a fashion similar to the hierarchy of structures that make up a protein, proteins themselves are part of a hierarchy of cellular structures. Proteins can associate into larger structures termed macromolecular assemblies. Examples of such macromolecular assemblies include the protein coat of a virus, a bundle of actin filaments, the nuclear pore complex, and other large submicroscopic objects. Macromolecular assemblies in turn combine with other cell biopolymers like lipids, carbohydrates, and nucleic acids to form complex cell organelles.

Graphic Representations of Proteins Highlight Different Features

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.

Figure 3-5

.

   Various graphic representations of the structure of Ras, a guanine nucleotide–binding protein

Guanosine diphosphate, the substrate that is bound, is shown as a blue space-filling figure in parts (a)–(d). (a) The Cαtrace of Ras, which highlights the course of the backbone. Evident from this view is how the polypeptide is packed into the smallest possible volume. (b) Ball-and-stick model of Ras showing the location of all atoms. (c) A schematic diagram of Ras showing how β strands (arrows) and α helices (cylinders) are organized in the protein. Note the turns and loops connecting pairs of helices and strands. (d) The water-accessible surface of Ras. Painted on the surface are regions of positive charge (blue) and negative charge (red). Here we see that the surface of a protein is not smooth but has lumps, bumps, and crevices. The molecular basis for specific binding interactions lies in the uneven distribution of charge over the surface of the protein. [Adapted from L. Tong et al., 1991, J. Mol. Biol.217:503; courtesy of S. Choe.]

Different ways of depicting proteins convey different types of information. The simplest way to represent three-dimensional structure is to trace the course of the backbone atoms with a solid line (Figure 3-5a); the most complex model shows the location of every atom (Figure 3-5b; see also Figure 2-1a). The former shows the overall organization of the polypeptide chain without consideration of the amino acid side chains; the latter details the interactions among atoms that form the backbone and that stabilize the protein’s conformation. Even though both views are useful, the elements of secondary structure are not easily discerned in them.

Another type of representation uses common shorthand symbols for depicting secondary structure, cylinders for α helices, arrows for β strands, and a flexible stringlike form for parts of the backbone without any regular structure (Figure 3-5c). This type of representation emphasizes the organization of the secondary structure of a protein, and various combinations of secondary structures are easily seen.

However, none of these three ways of representing protein structure conveys much information about the protein surface, which is of interest because this is where other molecules bind to a protein. Computer analysis in which a water molecule is rolled around the surface of a protein can identify the atoms that are in contact with the watery environment. On this water-accessible surface, regions having a common chemical (hydrophobicity or hydrophilicity) and electrical (basic or acidic) character can be mapped. Such models show the texture of the protein surface and the distribution of charge, both of which are important parameters of binding sites (Figure 3-5d). This view represents a protein as seen by another molecule.

Secondary Structures Are Crucial Elements of Protein Architecture

In an average protein, 60 percent of the polypeptide chain exists as two regular secondary structures, α helices and β sheets; the remainder of the molecule is in random coils and turns. Thus, α helices and β sheets are the major internal supportive elements in proteins. In this section, we explore the forces that favor formation of secondary structures. In later sections, we examine how these structures can pack into larger arrays.

The α Helix

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch3f6.jpg.

Figure 3-6

.

   Model of the α helix

The polypeptide backbone is folded into a spiral that is held in place by hydrogen bonds (black dots) between backbone oxygen atoms and hydrogen atoms. Note that all the hydrogen bonds have the same polarity. The outer surface of the helix is covered by the side-chain R groups.

Polypeptide segments can assume a regular spiral, or helical, conformation, called the α helix. In this secondary structure, the carbonyl oxygen of each peptide bond is hydrogen-bonded to the amide hydrogen of the amino acid four residues toward the C-terminus. This uniform arrangement of bonds confers a polarity on a helix because all the hydrogen-bond donors have the same orientation. The peptide backbone twists into a helix having 3.6 amino acids per turn (Figure 3-6). The stable arrangement of amino acids in the α helix holds the backbone as a rodlike cylinder from which the side chains point outward. The hydrophobic or hydrophilic quality of the helix is determined entirely by the side chains, because the polar groups of the peptide backbone are already involved in hydrogen bonding in the helix and thus are unable to affect its hydrophobicity or hydrophilicity.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.

Figure 3-7

.

   Regions of an α helix may be amphipathic

The five chains of cartilage oligomeric matrix protein associate into a coiled-coil fibrous domain through amphipathic α helices. Seen in cross section through a part of the domain, the hy-drophobic residues (gray) face the interior, and the hydrophilic residues (yellow) line the surface. This arrangement of hydropho-bic and hydrophilic residues is typical of proteins in an aqueous environment. [Courtesy of V. Malashkevich.]

In many α helices hydrophilic side chains extend from one side of the helix and hydrophobic side chains from the opposite side, making the overall structure amphipathic. In such helices the hydrophobic residues, although apparently randomly arranged, occur in a regular pattern (Figure 3-7). One way of visualizing this arrangement is to look down the center of an α helix and then project the amino acid residues onto the plane of the paper. The residues will appear as a wheel, and in the case of an amphipathic helix, the hydrophobic residues all lie on one side of the wheel and the hydrophilic ones on the other side.

Amphipathic α helices are important structural elements in fibrous proteins found in a watery environment. In a coiled-coil region of a protein, the hydrophobic surface of the α helix faces inward to form the hydrophobic core, and the hydrophilic surfaces face outward toward the surrounding fluid. This same orientation of surfaces is also found in most globular proteins. A crucial difference is that the hydrophobic interaction could be with a β strand, random coil, or another α helix. As we discuss later, amphipathic β strands line the walls of an ion channel in the cell membrane.

The β Sheet

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch3f8.jpg.

Figure 3-8

.

   β SHEETS

(a) A simple two-stranded β sheet with antiparallel β strands. A sheet is stabilized by hydrogen bonds (black dots) between the β strands. The planarity of the peptide bond forces a β sheet to be pleated; hence, this structure is also called a β pleated sheet, or simply a pleated sheet. (b) Side view of a β sheet showing how the R groups protrude above and below the plane of the sheet. (c) Model of binding site in class I MHC (major histocompatibility complex) molecules, which are involved in graft rejection. A sheet comprising eight antiparallel β strands (green) forms the bottom of the binding cleft, which is lined by a pair of α helices (blue). A disulfide bond is shown as two connected yellow spheres. The MHC binding cleft is large enough to bind a peptide 8–10 residues long. [Part (b) adapted from C. Branden and J. Tooze, 1991, Introduction to Protein Structure, Garland.]

Another regular secondary structure, the β sheet, consists of laterally packed β strands. Each β strand is a short (5–8-residue), nearly fully extended polypeptide chain. Hydrogen bonding between backbone atoms in adjacent β strands, within either the same or different polypeptide chains, forms a β sheet (Figure 3-8a). Like α helices, β strands have a polarity defined by the orientation of the peptide bond. Therefore, in a pleated sheet, adjacent β strands can be oriented antiparallel or parallel with respect to each other. In both arrangements of the backbone, the side chains project from both faces of the sheet (Figure 3-8b).

In some proteins, β sheets form the floor of a binding pocket (Figure 3-8c). In many structural proteins, multiple layers of pleated sheets provide toughness. Silk fibers, for example, consist almost entirely of stacks of antiparallel β sheets. The fibers are flexible because the stacks of β sheets can slip over one another. However, they are also resistant to breakage because the peptide backbone is aligned parallel with the fiber axis.

Turns

Composed of three or four residues, turns are compact, U-shaped secondary structures stabilized by a hydrogen bond between their end residues. They are located on the surface of a protein, forming a sharp bend that redirects the polypeptide backbone back toward the interior. Glycine and proline are commonly present in turns. The lack of a large side chain in the case of glycine and the presence of a built-in bend in the case of proline allow the polypeptide backbone to fold into a tight U-shaped structure. Without turns, a protein would be large, extended, and loosely packed. A polypeptide backbone also may contain long bends, or loops. In contrast to turns, which exhibit a few defined structures, loops can be formed in many different ways.

Motifs Are Regular Combinations of Secondary Structures

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.

Figure 3-9

.

   Secondary-structure motifs

(a) The coiled-coil motif (left) is characterized by two or more helices wound around one another. In some DNA-binding proteins, like c-Jun, a two-stranded coiled coil is responsible for dimerization (right). Each helix in a coiled coil has a repeated heptad sequence.

graphic element
with a leucine or other hydrophobic residue (red) at positions 1 and 4, forming a hydrophobic stripe along the helix surface. The helices pair by binding along their hydrophobic stripes, as seen in both models displayed here, in which the hydro- phobic side chains are shown in red. (b) The helix-loop-helix motif occurs in many calcium-binding proteins. Oxygen-containing R groups of residues in the loop form a ring around a Ca2+ ion. The 14-aa loop sequence (right) is rich in invariant hydrophilic residues. (c) The zincfinger motif is present in many proteins that bind nucleic acids. A Zn2+ ion is held between a pair of β strands (green) and a single α helix (blue) by a pair of cysteine and histidine residues. In the 25-aa sequence of this motif the invariant cysteines usually occur at positions 3 and 6, and the invariant histidines at positions 20 and 24. [Part (a) courtesy of V. Malashkevich and S. Choe.]

Many proteins contain one or more motifs built from particular combinations of secondary structures. A motif is defined by a specific combination of secondary structures that has a particular topology and is organized into a characteristic three-dimensional structure. Three common motifs are depicted in Figure 3-9.

The coiled-coil motif comprises two, three, or four amphipathic α helices wrapped around one another. In this motif, hydrophobic side chains project like “knobs” from one helix and interdigitate into the gaps, or “holes,” between the hydrophobic side chains of the other helix along the contact surface. The subunits in some multimeric proteins and in rodlike fibers are held together by coiled-coil interactions. The Ca2+-binding helix-loop-helix motif is marked by the presence of certain hydrophilic residues at invariant positions in the loop. Oxygen atoms in the invariant residues bind a calcium ion through hydrogen bonds. In another common motif, the zinc finger, three secondary structures—an α helix and two β strands with an antiparallel orientation—form a fingerlike bundle held together by a zinc ion. This motif is most commonly found in proteins that bind RNA or DNA.

Additional motifs will be examined in discussions of other proteins. The presence of the same motif in different proteins with similar functions clearly indicates that during evolution these useful combinations of secondary structures have been conserved.

Structural and Functional Domains Are Modules of Tertiary Structure

The tertiary structure of large proteins is often subdivided into distinct globular or fibrous regions called domains. Structurally, a domain is a compactly folded region of polypeptide. For large proteins, domains can be recognized in structures determined by x-ray crystallography or in images captured by electron microscopy. These discrete regions are well distinguished or physically separated from other parts of the protein, but connected by the polypeptide chain. Hemagglutinin, for example, contains a globular domain and a fibrous domain (see Figure 3-4b).

A structural domain consists of 100–200 residues in various combinations of α helices, β sheets, turns, and random coils. Often a domain is characterized by some interesting structural feature, for example, an unusual abundance of a particular amino acid (a proline-rich domain, an acidic domain, a glycine-rich domain), sequences common to (conserved in) many proteins (SH3, or Src homology region 3), or a particular secondary-structure motif (zinc-finger motif in kringle domain).

Domains sometimes are defined in functional terms based on observations that the activity of a protein is localized to a small region along its length. For instance, a particular region or regions of a protein may be responsible for its catalytic activity (e.g., a kinase domain) or binding ability (e.g., a DNA-binding domain, membrane-binding domain). Functional domains often are identified experimentally by whittling down a protein to its smallest active fragment with the aid of proteases, enzymes that cleave the polypeptide backbone. Alternatively, the DNA encoding a protein can be subjected to mutagenesis, so that segments of the protein’s backbone are removed or changed (Chapter 7). The activity of the truncated or altered protein product synthesized from the mutated gene is then monitored.

The functional definition of a domain is less rigorous than a structural definition. However, if the three-dimensional structure of a protein has not been determined, identification of functional domains can provide useful information about the protein. Because the activity of a protein usually depends on a proper three-dimensional structure, a functional domain consists of at least one and often several structural domains.

The organization of tertiary structure into domains further illustrates the principle that complex molecules are built from simpler components. Like secondary-structure motifs, tertiary-structure domains are incorporated as modules into different proteins, thereby modifying their functional activities. The modular approach to protein architecture is particularly easy to recognize in large proteins, which tend to be a mosaic of different domains and thus can perform different functions simultaneously.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch3f10.jpg.

Figure 3-10

.

   Schematic diagrams of various proteins, illustrating their modular nature

Epidermal growth factor (EGF) is generated by proteolytic cleavage of a precursor protein containing multiple EGF domains (orange). The EGF domain also occurs in Neu protein and in tissue plasminogen activator (TPA). Other domains, or modules, in these proteins include a chymotryptic domain (purple), an immunoglobulin domain (green), a fibronectin domain (yellow), a membrane-spanning domain (pink), and a kringle domain (blue). [Adapted from I. D. Campbell and P. Bork, 1993, Curr. Opin. Struc. Biol. 3:385.]

The epidermal growth factor (EGF) domain is one example of a module that is present in several proteins (Figure 3-10). EGF is a small soluble peptide hormone that binds to cells in the skin and connective tissue, causing them to divide. It is generated by proteolytic cleavage between repeated EGF domains in the EGF precursor protein, which is anchored in the cell membrane by a membrane-spanning domain. Six conserved cysteine residues form three pairs of disulfide bonds that hold EGF in its native conformation. The EGF domain also occurs in other proteins, including tissue plasminogen activator (TPA), a protease that is used to dissolve blood clots in heart attack victims; Neu protein, which is involved in embryonic differentiation; and Notch protein, a cell-adhesion molecule that glues cells together. Besides the EGF domain, these proteins contain additional domains found in other proteins. For example, TPA possesses a chymotryptic domain, a common feature in proteins that catalyze proteolysis.

Sequence Homology Suggests Functional and Evolutionary Relationships between Proteins

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.

Figure 3-11

.

   Models of the tertiary structures of the oxygen-carrier proteins myoglobin and hemoglobin based on x-ray crystallographic analysis

Note the similarity in the tertiary structures of myoglobin and the two α subunits (blue) and two β subunits (purple) of hemoglobin. The planar white (or gray) structure in the center of each polypeptide chain is the heme prosthetic group. [Myoglobin adapted from S. E. V. Phillips, 1980, J. Mol. Biol.142:531; hemoglobin adapted from B. Shaanan, 1983, J. Mol. Biol.171:31; courtesy of S. Choe.

Early evidence supporting the key principle that the amino acid sequence of a protein determines its three-dimensional structure was obtained in the 1960s by Max Perutz. On comparing the structures of myoglobin and hemoglobin determined from x-ray crystallographic analysis, he immediately noted that the subunits of hemoglobin, a tetramer of two α and two β subunits, resembled myoglobin, a monomer (Figure 3-11). Although the sequences of the two proteins were unknown at the time, Perutz proposed that the similar arrangement of α helices in the two proteins is a consequence of their having similar amino acid sequences. Later sequencing of myoglobin and hemoglobin revealed that many identical or chemically similar residues occur in identical positions throughout the sequences of both proteins. The two proteins also exhibit similar functions: myoglobin is the oxygen-carrier protein in muscle, and hemoglobin the oxygen-carrier protein in blood. Most of the conserved residues hold the heme group in place or are responsible for maintaining the hydrophobic interior of the protein.

As data concerning protein sequences and three- dimensional structures accumulated, the concept that similar sequences fold into similar secondary and tertiary structures was confirmed. The propensity of each amino acid to occur in the various types of secondary structures has been calculated from the amino acid sequence of secondary structures extracted from databases of the three-dimensional structures of proteins. This tabulation of the folding information inherent in the sequence is now being used in attempts to predict the three-dimensional structure of various proteins from their amino acid sequences.

In the classical taxonomy of the eighteenth and nineteenth centuries, organisms were classified according to their morphological similarities and differences. In this century, the molecular revolution in biology has given birth to “molecular” taxonomy: the classification of proteins based on similarities and differences in their amino acid sequences. This new taxonomy provides much information about protein function and evolutionary relationships. If the similarity between proteins from different organisms is significant over their entire sequence, then the proteins are homologs of one another, and they probably carry out similar functions. Sequence similarity also suggests an evolutionary relationship between proteins; that is, they evolved from a common ancestor. We can therefore describe homologous proteins as belonging to the same “family” and can trace their lineage from comparisons of sequences. Closely related proteins have the most similar sequences; distantly related proteins have only faintly similar sequences.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch3f12.jpg.

Figure 3-12

.

   Evolutionary tree showing how the globin protein family arose, starting from the most primitive oxygen-binding proteins, leghemoglobins, in plants

Sequence comparisons have revealed that evolution of the globin proteins parallels the evolution of vertebrates. Major junctions occurred with the divergence of myoglobin from hemoglobin and the later divergence of hemoglobin into the α and β subunits. [Adapted from R. E. Dickerson and I. Geis, 1983, Hemoglobin: Structure, Function, Evolution, and Pathology, Benjamin-Cummings.

The kinship among homologous proteins is most easily visualized from a tree diagram based on sequence analyses. For example, the amino acid sequences of hemoglobins from different species suggest that they evolved from an ancestral monomeric, oxygen-binding protein (Figure 3-12). Over time, this ancestral protein slowly changed, giving rise to myoglobin, which remained a monomeric protein, and to the α and β subunits, which evolved to associate into the tetrameric hemoglobin molecule. As the tree diagram in Figure 3-12 shows, evolution of the globin protein family parallels that of the vertebrates.

The power of such comparative analysis and identification of homologous proteins has expanded substantially in recent years by use of the base sequences in an organism’s genome to deduce the amino acid sequences of the encoded proteins. As discussed in Chapter 7, this approach permits “sequencing” of proteins that are difficult to purify in significant amounts.

SUMMARY

Help ǀ Contact Bookshelf