Proteins are designed to bind every conceivable molecule—from simple ions to large
complex molecules like fats, sugars, nucleic acids, and other proteins. They catalyze an
extraordinary range of chemical reactions, provide structural rigidity to the cell, control flow
of material through membranes, regulate the concentrations of metabolites, act as sensors and
switches, cause motion, and control gene function. The three-dimensional structures of proteins
have evolved to carry out these functions efficiently and under precise control. The
spatial organization of proteins, their shape in three dimensions, is a key to
understanding how they work.
One of the major areas of biological research today is how proteins, constructed from only 20
different amino acids, carry out the incredible array of diverse tasks that they
do. Unlike the intricate branched structure of carbohydrates, proteins are single, unbranched
chains of amino acid monomers. The unique shape of proteins arises from noncovalent interactions
between regions in the linear sequence of amino acids. Only when a protein is in its correct
three-dimensional structure, or conformation, is it able to function efficiently. A
key concept in understanding how proteins work is that function is derived from
three-dimensional structure, and three-dimensional structure is specified by amino acid
sequence.
The Amino Acids Composing Proteins Differ Only in Their Side Chains
Figure 3-1
.
Amino acids, the monomeric units that link together to form proteins, have a common
structure
The α carbon atom (green) of each amino acid is bonded to four different
chemical groups and thus is asymmetric. The side chain, or R group (red), is unique to each
amino acid. The diversity of natural proteins reflects different linear combinations of the
20 naturally occurring amino acids. The short peptide shown here, containing only four amino
acids, has 204, or 160,000, possible sequences.
Amino acids are the
monomeric building blocks of
proteins. The α carbon atom
(C
α) of
amino acids, which is adjacent to the carboxyl group, is bonded
to four different chemical groups: an amino (NH
2) group, a carboxyl (COOH) group, a
hydrogen (H) atom, and one variable group, called a
side chain or
R
group (). All 20 different amino
acids have this same general structure, but their side-chain groups vary in size, shape,
charge, hydrophobicity, and reactivity.
Figure 3-2
.
The structures of the 20 common amino acids grouped into three categories:
hydrophilic, hydrophobic, and special amino acids
The side chain determines the characteristic properties of each amino acid. Shown are the
zwitterion forms, which exist at the pH of the cytosol. In parentheses are the three-letter
and one-letter abbreviations for each amino acid.
The
amino acids can be considered the alphabet in which linear
proteins are
“written.” Students of biology must be familiar with the special properties
of each letter of this alphabet, which are determined by the side chain.
Amino acids can be
classified into a few distinct categories based primarily on their solubility in water, which
is influenced by the
polarity of their side chains ().
Amino acids with
polar side groups
tend to be on the surface of
proteins; by interacting with water, they make
proteins soluble in
aqueous solutions. In contrast,
amino acids with
nonpolar side groups avoid water and aggregate to form the waterinsoluble core of
proteins. The
polarity of
amino acid side chains thus is one of the forces responsible for
shaping the final three-dimensional structure of
proteins.
Hydrophilic, or water-soluble, amino acids have
ionized or polar side chains. At neutral pH, arginine and
lysine are positively charged; aspartic acid and
glutamic acid are negatively charged and exist as aspartate and glutamate.
These four amino acids are the prime contributors to the overall charge of a protein. A fifth
amino acid, histidine, has an imidazole side chain, which has a
pKa of 6.8, the pH of the cytoplasm. As a result, small shifts of
cellular pH will change the charge of histidine side chains:

The activities of many proteins are modulated by pH through protonation of histidine side
chains. Asparagine and glutamine are uncharged but have polar
amide groups with extensive hydrogen-bonding capacities. Similarly, serine and
threonine are uncharged but have polar hydroxyl groups, which also
participate in hydrogen bonds with other polar molecules. Because the charged and polar amino
acids are hydrophilic, they are usually found at the surface of a water-soluble protein, where
they not only contribute to the solubility of the protein in water but also form binding sites
for charged molecules.
Hydrophobic amino acids have aliphatic side
chains, which are insoluble or only slightly soluble in water. The side chains of
alanine, valine, leucine, isoleucine, and methionine consist
entirely of hydrocarbons, except for the sulfur atom in methionine, and all are nonpolar.
Phenylalanine, tyrosine, and tryptophan have large bulky
aromatic side groups. As explained in Chapter 2,
hydrophobic molecules avoid water by coalescing into an oily or waxy droplet. The same forces
cause hydrophobic amino acids to pack in the interior of proteins, away from the aqueous
environment. Later in this chapter, we will see in detail how hydrophobic residues line the
surface of membrane proteins that reside in the hydrophobic environment of
the lipid bilayer.
Lastly, cysteine, glycine, and proline exhibit special
roles in proteins because of the unique properties of their side chains. The side chain of
cysteine contains a reactive sulfhydryl group (—SH), which
can oxidize to form a disulfide bond
(—S—S—) to a
second cysteine:

Regions within a protein chain or in separate chains sometimes are cross-linked covalently
through disulfide bonds. Although disulfide bonds are rare in intracellular proteins, they are
commonly found in extracellular proteins, where they help maintain the native, folded
structure. The smallest amino acid, glycine, has a single hydrogen atom as its R group. Its
small size allows it to fit into tight spaces. Unlike any of the other common amino acids,
proline has a cyclic ring that is produced by formation of a covalent bond between its R group
and the amino group on Cα. Proline is very rigid, and its presence creates
a fixed kink in a protein chain. Proline and glycine are sometimes found at points on a
protein’s surface where the chain loops back into the protein.
The 6225 known and predicted proteins encoded by the yeast genome have an average molecular
weight (MW) of 52,728 and contain, on average, 466 amino acid residues. Assuming that these
average values represent a “typical” eukaryotic protein, then the average
molecular weight of amino acids is 113, taking their average relative abundance in proteins
into account. This is a useful number to remember, as we can use it to estimate the number of
residues from the molecular weight of a protein or vice versa. Some amino acids are more
abundant in proteins than other amino acids. Cysteine, tryptophan, and methionine are rare
amino acids; together they constitute approximately 5 percent of the amino acids in a protein.
Four amino acids—leucine, serine, lysine, and glutamic acid—are the most
abundant amino acids, totaling 32 percent of all the amino acid residues in a typical protein.
However, the amino acid composition of proteins can vary widely from these values. For example,
as discussed in later sections, proteins that reside in the lipid bilayer are enriched in
hydrophobic amino acids.
Peptide Bonds Connect Amino Acids into Linear Chains
Figure 3-3
.
The peptide bond
(a) A condensation reaction between two amino acids forms the peptide bond, which links
all the adjacent residues in a protein chain. (b) Side-chain groups (R) extend from the
backbone of a protein chain, in which the amino N, α carbon, carbonyl carbon
sequence is repeated throughout.
Nature has evolved a single chemical
linkage, the
peptide bond, to connect
amino acids into a linear, unbranched chain. The
peptide bond
is formed by a condensation reaction between the amino group of one
amino acid and the carboxyl
group of another (). The repeated amide N,
C
α, and carbonyl C atoms of each
amino acid residue form the backbone of
a
protein molecule from which the various side-chain groups project. As a consequence of the
peptide linkage, the backbone has
polarity, since all the amino groups lie to the same side of
the C
α atoms. This leaves at opposite ends of the chain a free (unlinked)
amino group (the N-terminus) and a free carboxyl group (the C-terminus). A
protein chain is
conventionally depicted with its N-terminal
amino acid on the left and its C-terminal amino
acid on the right ().
Many terms are used to denote the chains formed by polymerization of amino acids. A short
chain of amino acids linked by peptide bonds and having a defined sequence is a peptide; longer peptides are referred to as polypeptides. Peptides generally contain fewer than
20–30 amino acid residues, whereas polypeptides contain as many as 4000 residues. We
reserve the term protein for a polypeptide (or a
complex of polypeptides) that has a threedimensional structure. It is implied that proteins and
peptides represent natural products of a cell.
The size of a protein or a polypeptide is reported as its mass in daltons (a dalton is 1 atomic mass unit) or as its molecular weight (a
dimensionless number). For example, a 10,000-MW protein has a mass of 10,000 daltons (Da), or
10 kilodaltons (kDa). In the last section of this chapter, we will discuss different methods
for measuring the sizes and other physical characteristics of proteins.
Four Levels of Structure Determine the Shape of Proteins
Figure 3-4
.
Four levels of structure in hemagglutinin, which is a long multimeric molecule whose
three identical subunits are each composed of two chains, HA1 and
HA2
(a) Primary structure is illustrated by the amino acid sequence of residues
68 –195 of HA1. This region is used by influenza virus to
bind to animal cells. The one-letter amino acid code is used. Secondary structure is
represented diagrammatically beneath the sequence, showing regions of the polypeptide chain
that are folded into α helices (light blue cylinders), β strands (light
green arrows), and random coils (white strands). (b) Tertiary structure constitutes the
folding of the helices and strands in each HA subunit into a compact structure that is 13.5
nm long and divided into two domains. The membrane-distal domain is folded into a globular
conformation. The blue and green segments in this domain correspond to the sequence shown in
part (a). The proximal domain, which lies adjacent to the viral membrane, has a stemlike
conformation due to alignment of two long helices of HA2 (dark blue) with
β strands in HA1. Short turns and longer loops, which usually lie at the
surface of the molecule, connect the helices and strands in a given chain. (c) The
quaternary structure comprises the three subunits of HA; the structure is stabilized by
lateral interactions among the long helices (dark blue) in the subunit stems, forming a
triple-stranded coiled-coil stalk. Each of the distal globular domains in trimeric
hemagglutinin has a site (red) for binding sialic acid molecules on the surface of target
cells. Like many membrane proteins, HA has several covalently bound carbohydrate (CHO)
chains.
The structure of
proteins commonly is described in terms of four hierarchical levels of
organization. These levels are illustrated in , which depicts the structure of hemagglutinin, a surface
protein on the influenza
virus. This
protein binds to the surface of animal cells, including human cells, and is
responsible for the infectivity of the flu
virus.
The primary structure of a protein is the linear
arrangement, or sequence, of amino acid residues that constitute the
polypeptide chain.
Secondary structure refers to the localized
organization of parts of a polypeptide chain, which can assume several different spatial
arrangements. A single polypeptide may exhibit all types of secondary structure. Without any
stabilizing interactions, a polypeptide assumes a random-coil structure.
However, when stabilizing hydrogen bonds form between certain residues, the backbone folds
periodically into one of two geometric arrangements: an α helix, which
is a spiral, rodlike structure, or a β sheet, a planar structure composed
of alignments of two or more β strands, which are relatively short,
fully extended segments of the backbone. Finally, U-shaped four-residue segments stabilized by
hydrogen bonds between their arms are called turns. They are located at the
surfaces of proteins and redirect the polypeptide chain toward the interior. (These structures
will be discussed in greater detail later.)
Tertiary structure, the next-higher level of
structure, refers to the overall conformation of a polypeptide chain, that is, the
three-dimensional arrangement of all the amino acids residues. In contrast to secondary
structure, which is stabilized by hydrogen bonds, tertiary structure is stabilized by
hydrophobic interactions between the nonpolar side chains and, in some proteins, by disulfide
bonds. These stabilizing forces hold the α helices, β strands, turns, and
random coils in a compact internal scaffold. Thus, a protein’s size and shape is
dependent not only on its sequence but also on the number, size, and arrangement of its
secondary structures. For proteins that consist of a single polypeptide chain, monomeric proteins, tertiary structure is the highest
level of organization.
Multimeric proteins contain two or more
polypeptide chains, or subunits, held together by noncovalent bonds. Quaternary structure describes the number
(stoichiometry) and relative positions of the subunits in a multimeric protein. Hemagglutinin
is a trimer of three identical subunits; other multimeric proteins can be composed of any
number of identical or different subunits.
In a fashion similar to the hierarchy of structures that make up a protein, proteins
themselves are part of a hierarchy of cellular structures. Proteins can associate into larger
structures termed macromolecular assemblies. Examples of such macromolecular
assemblies include the protein coat of a virus, a bundle of actin filaments, the nuclear pore
complex, and other large submicroscopic objects. Macromolecular assemblies in turn combine with
other cell biopolymers like lipids, carbohydrates, and nucleic acids to form complex cell
organelles.
Graphic Representations of Proteins Highlight Different Features
Figure 3-5
.
Various graphic representations of the structure of Ras, a guanine
nucleotide–binding protein
Guanosine diphosphate, the substrate that is bound, is shown as a blue space-filling
figure in parts (a)–(d). (a) The Cαtrace of Ras, which
highlights the course of the backbone. Evident from this view is how the polypeptide is
packed into the smallest possible volume. (b) Ball-and-stick model of Ras showing the
location of all atoms. (c) A schematic diagram of Ras showing how β strands
(arrows) and α helices (cylinders) are organized in the protein. Note the turns
and loops connecting pairs of helices and strands. (d) The water-accessible surface of Ras.
Painted on the surface are regions of positive charge (blue) and negative charge (red). Here
we see that the surface of a protein is not smooth but has lumps, bumps, and crevices. The
molecular basis for specific binding interactions lies in the uneven distribution of charge
over the surface of the protein. [Adapted from L. Tong et al., 1991, J. Mol.
Biol.217:503; courtesy of S. Choe.]
Different ways of depicting
proteins convey different types of information. The simplest way
to represent three-dimensional structure is to trace the course of the backbone atoms with a
solid line (); the most complex model shows
the location of every atom (; see also
Figure 2-1a). The former shows the overall organization of
the
polypeptide chain without consideration of the
amino acid side chains; the latter details
the interactions among atoms that form the backbone and that stabilize the
protein’s
conformation. Even though both views are useful, the elements of
secondary structure are not
easily discerned in them.
Another type of representation uses common shorthand symbols for depicting secondary
structure, cylinders for α helices, arrows for β strands, and a flexible
stringlike form for parts of the backbone without any regular structure (). This type of representation emphasizes the organization of the
secondary structure of a
protein, and various combinations of
secondary structures are easily
seen.
However, none of these three ways of representing
protein structure conveys much information
about the
protein surface, which is of interest because this is where other molecules bind to a
protein. Computer analysis in which a water molecule is rolled around the surface of a
protein
can identify the atoms that are in contact with the watery environment. On this
water-accessible surface, regions having a common chemical (hydrophobicity or hydrophilicity)
and electrical (basic or acidic) character can be mapped. Such models show the texture of the
protein surface and the distribution of charge, both of which are important parameters of
binding sites (). This view represents a
protein as seen by another molecule.
Secondary Structures Are Crucial Elements of Protein Architecture
In an average protein, 60 percent of the polypeptide chain exists as two regular secondary
structures, α helices and β sheets; the remainder of the molecule is in
random coils and turns. Thus, α helices and β sheets are the major internal
supportive elements in proteins. In this section, we explore the forces that favor formation of
secondary structures. In later sections, we examine how these structures can pack into larger
arrays.
The α Helix
Figure 3-6
.
Model of the α helix
The polypeptide backbone is folded into a spiral that is held in place by hydrogen bonds
(black dots) between backbone oxygen atoms and hydrogen atoms. Note that all the hydrogen
bonds have the same polarity. The outer surface of the helix is covered by the side-chain R
groups.
Polypeptide segments can assume a regular spiral, or helical,
conformation, called the
α
helix. In this
secondary structure, the carbonyl oxygen of each
peptide bond is hydrogen-bonded to the amide hydrogen of the
amino acid four residues toward
the C-terminus. This uniform arrangement of bonds confers a
polarity on a helix because all
the hydrogen-bond donors have the same orientation. The
peptide backbone twists into a helix
having 3.6
amino acids per turn (). The
stable arrangement of
amino acids in the α helix holds the backbone as a rodlike
cylinder from which the side chains point outward. The
hydrophobic or
hydrophilic quality of
the helix is determined entirely by the side chains, because the
polar groups of the
peptide
backbone are already involved in hydrogen bonding in the helix and thus are unable to affect
its hydrophobicity or hydrophilicity.
Figure 3-7
.
Regions of an α helix may be amphipathic
The five chains of cartilage oligomeric matrix protein associate into a coiled-coil
fibrous domain through amphipathic α helices. Seen in cross section through a
part of the domain, the hy-drophobic residues (gray) face the interior, and the hydrophilic
residues (yellow) line the surface. This arrangement of hydropho-bic and hydrophilic
residues is typical of proteins in an aqueous environment. [Courtesy of V.
Malashkevich.]
In many α helices
hydrophilic side chains extend from one side of the helix and
hydrophobic side chains from the opposite side, making the overall structure
amphipathic. In such helices the
hydrophobic residues,
although apparently randomly arranged, occur in a regular pattern (). One way of visualizing this arrangement is to look down the
center of an α helix and then project the
amino acid residues onto the plane of the
paper. The residues will appear as a wheel, and in the case of an
amphipathic helix, the
hydrophobic residues all lie on one side of the wheel and the
hydrophilic ones on the other
side.
Amphipathic α helices are important structural elements in fibrous proteins found
in a watery environment. In a coiled-coil region of a protein, the hydrophobic surface of the
α helix faces inward to form the hydrophobic core, and the hydrophilic surfaces face
outward toward the surrounding fluid. This same orientation of surfaces is also found in most
globular proteins. A crucial difference is that the hydrophobic interaction could be with a
β strand, random coil, or another α helix. As we discuss later, amphipathic
β strands line the walls of an ion channel in the cell membrane.
The β Sheet
Figure 3-8
.
β SHEETS
(a) A simple two-stranded β sheet with antiparallel β strands. A sheet
is stabilized by hydrogen bonds (black dots) between the β strands. The planarity
of the peptide bond forces a β sheet to be pleated; hence, this structure is also
called a β pleated sheet, or simply a pleated
sheet. (b) Side view of a β sheet showing how the R groups protrude
above and below the plane of the sheet. (c) Model of binding site in class I MHC (major
histocompatibility complex) molecules, which are involved in graft rejection. A sheet
comprising eight antiparallel β strands (green) forms the bottom of the binding
cleft, which is lined by a pair of α helices (blue). A disulfide bond is shown as
two connected yellow spheres. The MHC binding cleft is large enough to bind a peptide
8–10 residues long. [Part (b) adapted from C. Branden and J. Tooze, 1991,
Introduction to Protein Structure, Garland.]
Another regular
secondary structure, the β sheet, consists of laterally packed
β strands. Each β strand is a short (5–8-residue), nearly fully
extended
polypeptide chain. Hydrogen bonding between backbone atoms in adjacent β
strands, within either the same or different
polypeptide chains, forms a β sheet
(). Like α helices, β
strands have a
polarity defined by the orientation of the
peptide bond. Therefore, in a
pleated sheet, adjacent β strands can be oriented antiparallel or parallel with
respect to each other. In both arrangements of the backbone, the side chains project from both
faces of the sheet ().
In some
proteins, β sheets form the floor of a binding pocket (). In many structural
proteins, multiple layers of pleated
sheets provide toughness. Silk fibers, for example, consist almost entirely of stacks of
antiparallel β sheets. The fibers are flexible because the stacks of β
sheets can slip over one another. However, they are also resistant to breakage because the
peptide backbone is aligned parallel with the fiber axis.
Turns
Composed of three or four residues, turns are compact, U-shaped secondary structures
stabilized by a hydrogen bond between their end residues. They are located on the surface of a
protein, forming a sharp bend that redirects the polypeptide backbone back toward the
interior. Glycine and proline are commonly present in turns. The lack of a large side chain in
the case of glycine and the presence of a built-in bend in the case of proline allow the
polypeptide backbone to fold into a tight U-shaped structure. Without turns, a protein would
be large, extended, and loosely packed. A polypeptide backbone also may contain long bends, or
loops. In contrast to turns, which exhibit a few defined structures, loops
can be formed in many different ways.
Motifs Are Regular Combinations of Secondary Structures
Figure 3-9
.
Secondary-structure motifs
(a) The coiled-coil motif (left) is characterized by two or more helices
wound around one another. In some DNA-binding proteins, like c-Jun, a two-stranded coiled
coil is responsible for dimerization (right). Each helix in a coiled coil
has a repeated heptad sequence.
with a
leucine or other
hydrophobic residue (red) at positions 1 and 4, forming a
hydrophobic
stripe along the helix surface. The helices pair by binding along their
hydrophobic stripes,
as seen in both models displayed here, in which the hydro- phobic side chains are shown in
red. (b) The
helix-loop-helix motif occurs in many calcium-binding
proteins.
Oxygen-containing R groups of residues in the loop form a ring around a
Ca
2+ ion. The 14-aa loop sequence
(right) is rich in
invariant
hydrophilic residues. (c) The zincfinger
motif is present in many
proteins that
bind
nucleic acids. A Zn
2+ ion is held between a pair of β
strands (green) and a single α helix (blue) by a pair of cysteine and histidine
residues. In the 25-aa sequence of this
motif the invariant cysteines usually occur at
positions 3 and 6, and the invariant histidines at positions 20 and 24. [Part (a) courtesy
of V. Malashkevich and S. Choe.]
Many
proteins contain one or more
motifs built
from particular combinations of
secondary structures. A
motif is defined by a specific
combination of
secondary structures that has a particular topology and is organized into a
characteristic three-dimensional structure. Three common
motifs are depicted in .
The coiled-coil motif comprises two, three, or
four amphipathic α helices wrapped around one another. In this motif, hydrophobic
side chains project like “knobs” from one helix and interdigitate into the
gaps, or “holes,” between the hydrophobic side chains of the other helix
along the contact surface. The subunits in some multimeric proteins and in rodlike fibers are
held together by coiled-coil interactions. The Ca2+-binding helix-loop-helix motif is marked by the presence of
certain hydrophilic residues at invariant positions in the loop. Oxygen atoms in the invariant
residues bind a calcium ion through hydrogen bonds. In another common motif, the zinc finger, three secondary structures—an
α helix and two β strands with an antiparallel orientation—form a
fingerlike bundle held together by a zinc ion. This motif is most commonly found in proteins
that bind RNA or DNA.
Additional motifs will be examined in discussions of other proteins. The presence of the same
motif in different proteins with similar functions clearly indicates that during evolution
these useful combinations of secondary structures have been conserved.
Structural and Functional Domains Are Modules of Tertiary Structure
The
tertiary structure of large
proteins is often subdivided into distinct globular or
fibrous regions called
domains. Structurally, a
domain is a compactly folded region of
polypeptide. For large
proteins,
domains can be
recognized in structures determined by
x-ray
crystallography or in images captured by electron microscopy. These discrete regions
are well distinguished or physically separated from other parts of the
protein, but connected
by the
polypeptide chain. Hemagglutinin, for example, contains a globular
domain and a fibrous
domain (see ).
A structural domain consists of 100–200 residues in various combinations of
α helices, β sheets, turns, and random coils. Often a domain is
characterized by some interesting structural feature, for example, an unusual abundance of a
particular amino acid (a proline-rich domain, an acidic domain, a glycine-rich domain),
sequences common to (conserved in) many proteins (SH3, or Src homology region 3), or a
particular secondary-structure motif (zinc-finger motif in kringle domain).
Domains sometimes are defined in functional terms based on observations that the activity of
a protein is localized to a small region along its length. For instance, a particular region or
regions of a protein may be responsible for its catalytic activity (e.g., a kinase domain) or
binding ability (e.g., a DNA-binding domain, membrane-binding domain). Functional domains often
are identified experimentally by whittling down a protein to its smallest active fragment with
the aid of proteases, enzymes that cleave the polypeptide backbone. Alternatively, the DNA
encoding a protein can be subjected to mutagenesis, so that segments of the protein’s
backbone are removed or changed (Chapter 7). The
activity of the truncated or altered protein product synthesized from the mutated gene is then
monitored.
The functional definition of a domain is less rigorous than a structural definition. However,
if the three-dimensional structure of a protein has not been determined, identification of
functional domains can provide useful information about the protein. Because the activity of a
protein usually depends on a proper three-dimensional structure, a functional domain consists
of at least one and often several structural domains.
The organization of tertiary structure into domains further illustrates the principle that
complex molecules are built from simpler components. Like secondary-structure motifs,
tertiary-structure domains are incorporated as modules into different proteins, thereby
modifying their functional activities. The modular approach to protein architecture is
particularly easy to recognize in large proteins, which tend to be a mosaic of different
domains and thus can perform different functions simultaneously.
Figure 3-10
.
Schematic diagrams of various proteins, illustrating their modular nature
Epidermal growth factor (EGF) is generated by proteolytic cleavage of a precursor protein
containing multiple EGF domains (orange). The EGF domain also occurs in Neu protein and in
tissue plasminogen activator (TPA). Other domains, or modules, in these proteins include a
chymotryptic domain (purple), an immunoglobulin domain (green), a fibronectin domain
(yellow), a membrane-spanning domain (pink), and a kringle domain (blue). [Adapted from I.
D. Campbell and P. Bork, 1993, Curr. Opin. Struc. Biol.
3:385.]
The epidermal
growth factor (EGF)
domain is one example of a module that is present in
several
proteins (). EGF is a small soluble
peptide hormone that binds to cells in the skin and connective tissue, causing them to divide.
It is generated by proteolytic cleavage between repeated EGF
domains in the EGF precursor
protein, which is anchored in the cell
membrane by a
membrane-spanning
domain. Six conserved
cysteine residues form three pairs of disulfide bonds that hold EGF in its native
conformation.
The EGF
domain also occurs in other
proteins, including tissue plasminogen activator (TPA), a
protease that is used to dissolve blood clots in heart attack victims; Neu
protein, which is
involved in embryonic
differentiation; and Notch
protein, a cell-adhesion molecule that glues
cells together. Besides the EGF
domain, these
proteins contain additional
domains found in
other
proteins. For example, TPA possesses a chymotryptic
domain, a common feature in
proteins
that catalyze proteolysis.
Sequence Homology Suggests Functional and Evolutionary Relationships between
Proteins
Figure 3-11
.
Models of the tertiary structures of the oxygen-carrier proteins myoglobin and
hemoglobin based on x-ray crystallographic analysis
Note the similarity in the tertiary structures of myoglobin and the two α
subunits (blue) and two β subunits (purple) of hemoglobin. The planar white (or
gray) structure in the center of each polypeptide chain is the heme prosthetic group.
[Myoglobin adapted from S. E. V. Phillips, 1980, J. Mol.
Biol.142:531; hemoglobin adapted from B. Shaanan, 1983, J.
Mol. Biol.171:31; courtesy of S. Choe.
Early evidence supporting the key principle that the
amino acid sequence of a
protein
determines its three-dimensional structure was obtained in the 1960s by Max Perutz. On
comparing the structures of myoglobin and hemoglobin determined from x-ray crystallographic
analysis, he immediately noted that the subunits of hemoglobin, a tetramer of two α
and two β subunits, resembled myoglobin, a
monomer (). Although the sequences of the two
proteins were unknown at the time,
Perutz proposed that the similar arrangement of α helices in the two
proteins is a
consequence of their having similar
amino acid sequences. Later sequencing of myoglobin and
hemoglobin revealed that many identical or chemically similar residues occur in identical
positions throughout the sequences of both
proteins. The two
proteins also exhibit similar
functions: myoglobin is the oxygen-carrier
protein in muscle, and hemoglobin the oxygen-carrier
protein in blood. Most of the conserved residues hold the heme group in place or are
responsible for maintaining the
hydrophobic interior of the
protein.
As data concerning protein sequences and three- dimensional structures accumulated, the
concept that similar sequences fold into similar secondary and tertiary structures was
confirmed. The propensity of each amino acid to occur in the various types of secondary
structures has been calculated from the amino acid sequence of secondary structures extracted
from databases of the three-dimensional structures of proteins. This tabulation of the folding
information inherent in the sequence is now being used in attempts to predict the
three-dimensional structure of various proteins from their amino acid sequences.
In the classical taxonomy of the eighteenth and nineteenth centuries, organisms were
classified according to their morphological similarities and differences. In this century, the
molecular revolution in biology has given birth to “molecular” taxonomy:
the classification of proteins based on similarities and differences in their amino acid
sequences. This new taxonomy provides much information about protein function and evolutionary
relationships. If the similarity between proteins from different organisms is significant over
their entire sequence, then the proteins are homologs of one another, and they probably carry
out similar functions. Sequence similarity also suggests an evolutionary relationship between
proteins; that is, they evolved from a common ancestor. We can therefore describe homologous
proteins as belonging to the same “family” and can trace their lineage from
comparisons of sequences. Closely related proteins have the most similar sequences; distantly
related proteins have only faintly similar sequences.
Figure 3-12
.
Evolutionary tree showing how the globin protein family arose, starting from the most
primitive oxygen-binding proteins, leghemoglobins, in plants
Sequence comparisons have revealed that evolution of the globin proteins parallels the
evolution of vertebrates. Major junctions occurred with the divergence of myoglobin from
hemoglobin and the later divergence of hemoglobin into the α and β
subunits. [Adapted from R. E. Dickerson and I. Geis, 1983, Hemoglobin: Structure,
Function, Evolution, and Pathology, Benjamin-Cummings.
The kinship among homologous
proteins is most easily visualized from a tree diagram based on
sequence analyses. For example, the
amino acid sequences of hemoglobins from different species
suggest that they evolved from an ancestral
monomeric, oxygen-binding
protein (). Over time, this ancestral
protein slowly
changed, giving rise to myoglobin, which remained a
monomeric protein, and to the α
and β subunits, which evolved to associate into the tetrameric hemoglobin molecule. As
the tree diagram in shows, evolution of the
globin
protein family parallels that of the vertebrates.
The power of such comparative analysis and identification of homologous proteins has expanded
substantially in recent years by use of the base sequences in an organism’s genome to
deduce the amino acid sequences of the encoded proteins. As discussed in Chapter 7, this approach permits
“sequencing” of proteins that are difficult to purify in significant
amounts.
SUMMARY
-
A protein is a linear polymer of amino acids linked together
by peptide bonds. Various, mostly noncovalent, interactions between amino acids in the linear
sequence stabilize a specific folded three-dimensional structure (conformation) for each
protein.
-
The 20 different amino acids found in natural proteins are
conveniently grouped into three categories based on the nature of their side (R) groups:
hydrophilic amino acids, with a charged or polar and uncharged R group; hydrophobic amino
acids, with an aliphatic or bulky and aromatic R group; and amino acids with a special group,
consisting of cysteine, glycine, and proline (see ). -
The α helix, β strand and sheet, and turn
are the most prevalent elements of protein secondary structure, which is stabilized by
hydrogen bonds between atoms of the peptide backbone. Certain combinations of secondary
structures give rise to different motifs, which are found in a variety of proteins and often
are associated with specific functions (see ). -
Protein tertiary structure results from hydrophobic
interactions and disulfide bonds that stabilize folding of the secondary structure into a
compact overall arrangement, or conformation. Large proteins often contain distinct domains,
independently folded regions of tertiary structure with characteristic structural and/or
functional properties.
-
Quaternary structure encompasses the number and organization
of subunits in multimeric proteins.
-
The sequence of a protein determines its threedimensional
structure, which determines its function. In short, function is derived from structure;
structure is derived from sequence.
-
Homologous proteins, which have similar sequences,
structures, and functions, most likely evolved from a common ancestor.
ǀ