Molecular Recognition
Processes 1
Introduction
Figure 3-1
.
The size of protein molecules compared to some other cell components
The ribosome is an important macromolecular
assembly composed of about 60 protein and RNA molecules.
Table 3-1
Approximate Chemical Compositions of a Typical Bacterium and a Typical Mammalian Cell
| Percent of Total Cell Weight
|
|---|
| H2O | 70 | 70 |
| Inorganic ions (Na +, K +, Mg 2+, Ca 2+, Cl -, etc.) | 1 | 1 |
| Miscellaneous small metabolites | 3 | 3 |
| Proteins | 15 | 18 |
| RNA | 6 | 1.1 |
| DNA | 1 | 0.25 |
| Phospholipids | 2 | 3 |
| Other lipids | - | 2 |
| Polysaccharides | 2 | 2 |
| Total cell volume: | 2 x 10 -12 cm 3 | 4 x 10 -9 cm 3 |
|
| Relative cell volume: | 1 | 2000 |
Macromolecules typically have molecular weights between about 10,000 and
1 million and are intermediate in size between the organic molecules of the
cell discussed in
Chapter 2 and the large macromolecular assemblies and
organelles that will be discussed in subsequent chapters ( ). One small
molecule, water, constitutes 70% of the total mass of a cell; nearly all of the remaining
cell mass is due to macromolecules (
Table 3-1).
As described in Chapter 2, a macromolecule is assembled from low-molecular-weight subunits that are repeatedly added to one end to form a
long, chainlike polymer. Usually only one family of subunits is used to construct
each chain: amino acids are linked to other amino acids to form proteins,
nucleotides are linked to other nucleotides to form nucleic acids, and sugars are linked
to other sugars to form polysaccharides. Because the precise sequence of
subunits is crucial to the function of a macromolecule, its biosynthesis requires
mechanisms to ensure that the correct subunit goes into the polymer at each
position in the chain.
The Specific Interactions of a Macromolecule Depend
on Weak, Noncovalent Bonds 2
A macromolecular chain is held together by covalent bonds, which are strong enough to preserve the sequence of subunits for long periods of time.
Although the sequence of subunits determines the information content of a
macromolecule, utilizing that information depends largely on much weaker, noncovalent bonds. These weak bonds form between different parts of the same
macromolecule and between different macromolecules. They therefore play a major
part in determining both the three-dimensional structure of macromolecular
chains and how these structures interact with one another.
The noncovalent bonds encountered in biological molecules are usually
classified into three types: ionic bonds, hydrogen
bonds, and van der Waals attractions. Another important weak force is created by the three-dimensional
structure of water, which forces exposed hydrophobic groups together in order
to minimize their disruptive effect on the hydrogen-bonded network of water
molecules (see
Panel 2-1, pp. 48-49). This expulsion from the aqueous solution
generates what is sometimes thought of as a fourth kind of weak, noncovalent
bond. These four types of weak attractive forces are the subject of
Panel 3-1, pages
92-93.
Table 3-2
Covalent and Noncovalent Chemical Bonds
| Strength (kcal/mole)*
|
|---|
| Covalent | 0.15 | 90 | 90 |
| Ionic | 0.25 | 80 | 3 |
| Hydrogen | 0.30 | 4 | 1 |
| van der Waals attraction (per atom) | 0.35 | 0.1 | 0.1 |
Figure 3-2
.
Comparative energies of some important molecular events in cells
Note that energy is displayed
on a logarithmic scale.
Figure 3-3
.
Noncovalent bonds
How weak bonds mediate recognition between macromolecules.
In an aqueous environment each noncovalent bond is 30 to 300 times
weaker than the typical covalent bonds that hold biological molecules together
(
Table 3-2) and only slightly stronger than the average energy of thermal collisions
at 37°C ( ). A single noncovalent bond - unlike a single covalent
bond - is therefore too weak to withstand the thermal motions that tend to pull
molecules apart. Large numbers of noncovalent bonds are needed to hold two
molecular surfaces together, and these can form between two surfaces only
when large numbers of atoms on the surfaces are precisely matched to each other
( ). The exacting requirements for matching account for the specificity
of biological recognition, such as occurs between an enzyme and its substrate.
Figure 3-4
.
Steric limitations on the bond angles in a polypeptide chain
(A) Each amino acid contributes
three bonds (colored red) to its polypeptide chain. The peptide
bond is planar ( gray shading) and does not permit rotation.
By contrast, rotation can occur about the
C α-C bond, whose angle of rotation is called psi
(ψ), and about the N-C α bond, whose angle of rotation is called phi
(
). The R group denotes an amino acid side chain. (B) The conformation of the
main-chain atoms in a protein is determined by one pair of phi and
psi angles for each amino acid; because of steric collisions
within each amino acid, most pairs of phi and psi angles do not
occur. In this so-called Ramachandran plot, each dot represents
an observed pair of angles in a protein. (B, from J.
Richardson, Adv. Prot. Chem. 34:174-175, 1981.)
As explained at the top of
Panel 3-1, atoms behave almost as if they
were hard spheres with a definite radius (their van der Waals radius). The
requirement that no two atoms overlap limits the possible bond angles in a polypeptide
chain ( ). These and other steric interactions severely constrain the
number of three-dimensional arrangements of atoms (or
conformations) that are possible. Nevertheless, a long flexible chain such as a protein can still fold in
an enormous number of ways. Each conformation will have a different set of
weak intrachain interactions, and it is the total strength of these interactions that
determines which conformations will form.
Most proteins in a cell fold stably in only one way: during the course of
evolution the sequence of amino acid subunits in each protein has been selected
so that one conformation is able to form many more favorable intrachain
interactions than any other.
A Helix Is a Common Structural Motif in Biological Structures Made from Repeated
Subunits 3
Figure 3-5
.
A helix will form when a series of subunits bind to each other in a regular way
In the foreground the interaction between two
subunits is shown; behind it are the helices that result. These helices have
two (A), three (B), and six (C and D) subunits per turn. At the top,
the arrangement of subunits has been photographed from directly above
the helix. Note that the helix in (D) has a wider path than that in (C).
Figure 3-6
.
Comparison of a left-handed and a right-handed helix
As a reference, it is useful to
remember that standard screws, which insert when turned clockwise, are
right-handed. Note that a helix preserves the same handedness when it
is turned upside down.
Biological structures are often formed by linking subunits that are very
similar to each other - such as amino acids or nucleotides - into a long, repetitive
chain. If all the subunits are identical, neighboring subunits in the chain will often
fit together in only one way, adjusting their relative positions so as to minimize
the free energy of the contact between them. In this case, each subunit will be
positioned in exactly the same way in relation to its neighboring subunits, so
that subunit 3 will fit onto subunit 2 in the same way that subunit 2 fits onto
subunit 1, and so on. Because it is very rare for subunits to join up in a straight line,
this arrangement will generally result in a helix - a regular structure that
resembles a spiral staircase, as illustrated in . Depending on the twist of the
staircase, a helix is said to be either right-handed or left-handed ( ).
Handedness is not affected by turning the helix upside down, but it is reversed if
the helix is reflected in a mirror.
Helices occur commonly in biological structures, whether the subunits
are small molecules that are covalently linked together (as in DNA) or large
protein molecules that are linked by noncovalent forces (as in actin filaments). This is
not surprising. A helix is an unexceptional structure, generated simply by
placing many similar subunits next to each other, each in the same strictly repeated
relationship to the one before.
Diffusion Is the First Step to Molecular
Recognition 4
Figure 3-7
.
A random walk
Molecules in solution move in a random fashion due to the
continual buffeting they receive in collisions with other molecules. This
movement allows small molecules to diffuse from one part of the cell to another in
a surprisingly short time: such molecules will generally diffuse
across a typical animal cell in less than a second.
Before two molecules can bind to each other, they must come into close
contact. This is achieved by the thermal motions that cause molecules to wander, or
diffuse, from their starting positions. As the molecules in a liquid rapidly collide
and bounce off one another, an individual molecule moves first one way and
then another, its path constituting a "random walk" ( ). The average
distance that each type of molecule travels from its starting point is proportional to
the square root of the time involved: that is, if it takes a particular molecule 1
second on average to go 1 µm, it will go 2 µm in 4 seconds, 10
µm in 100 seconds, and so on. Diffusion is therefore an efficient way for molecules to move
limited distances but an inefficient way for molecules to move long distances.
Figure 3-8
.
Macromolecules in the cell cytoplasm
The drawing is approximately to scale
and emphasizes the crowding in the cytoplasm. Only the
macromolecules are shown: RNAs are shown in blue, ribosomes in green, and proteins
in red. Macromolecules diffuse relatively slowly in the cytoplasm because
they interact with many other macromolecules; small molecules,
by contrast, diffuse nearly as rapidly as they do in water. (Adapted from
D.S. Goodsell, Trends in Biochem. Sci.16:203-206, 1991.)
Experiments performed by injecting fluorescent dyes and other
labeled molecules into cells show that the diffusion of small molecules through the
cytoplasm is nearly as rapid as it is in water. A molecule the size of ATP, for
example, requires only about 0.2 second to diffuse an average distance of 10
µm - the diameter of a small animal cell. Large macromolecules, however, move
much more slowly. Not only is their diffusion rate intrinsically slower, but their
movement is retarded by frequent collisions with many other macromolecules that
are held in place by molecular associations in the cytoplasm ( ).
Thermal Motions Bring Molecules Together
and Then Pull Them Apart 4
Encounters between two macromolecules or between a macromolecule and
a small molecule occur randomly through simple diffusion. An encounter may
lead immediately to the formation of a complex between the two molecules, in
which case the rate of complex formation is said to be
diffusion-limited. Alternatively, the rate of complex formation may be slower, requiring some adjustment of
the structure of one or both molecules before the interacting surfaces can fit
together, so that most often the two colliding molecules will bounce off each other
without sticking. In either case once the two interacting surfaces have come
sufficiently close together, they will form multiple weak bonds with each other
that persist until random thermal motions cause the molecules to dissociate again
(see ).
In general, the stronger the binding of the molecules in the complex,
the slower their rate of dissociation. At one extreme the total energy of the
bonds formed is negligible compared with that of thermal motion, and the two
molecules dissociate as rapidly as they came together. At the other extreme the
total bond energy is so high that dissociation rarely occurs. Strong
interactions occur in cells whenever a biological function requires that two
macromolecules remain tightly associated for a long time - for example, when a gene
regulatory protein binds to DNA to turn off a gene. Weaker interactions occur when
the function demands a rapid change in the structure of a complex - for
example, when two interacting proteins change partners during the movements of a
protein machine.
The Equilibrium Constant Is a Measure of the Strength
of an Interaction Between Two Molecules 5
Figure 3-9
.
The principle of equilibrium
The equilibrium between molecules A and B and the complex AB
is maintained by a balance between the two opposing reactions shown in (1) and (2). As shown in (3), the ratio of the
rate constants for the association and the dissociation reactions is equal to the equilibrium constant (K) for the reaction. Molecules A and B must collide in order to react, and the rate in reaction (2) is therefore proportional to the product
of their individual concentrations. As a result, the product [A]
x [B] appears in the final expression for K, where [ ] indicates concentration.
As traditionally defined, the concentrations of products appear in the numerator and the concentrations
of reactants appear in the denominator of the equation for an equilibrium constant. Thus the equilibrium constant in
(3) is that for the association reaction A + B
→
AB. For simple binding interactions this constant is called the affinity constant or association
constant (in units of liters per mole); the larger the value of the association constant
( K a), the stronger is the binding between A and B. The reciprocal of K ais the dissociation
constant (in units of moles per liter); the smaller the value of the dissociation constant
( K d), the stronger is the binding between A and B.
The precise strength of the bonding between two molecules is a useful index
of the specificity of their interaction. To illustrate how the binding strength is
measured, let us consider a reaction in which molecule A binds to molecule B.
The reaction will proceed until it reaches an
equilibrium
point, at which the rates of formation and dissociation are equal ( ). The concentrations of A, B,
and the complex AB at this point can be used to determine an
equilibrium constant (
K) for the reaction, as explained in . This constant is
sometimes termed the affinity constant and is commonly employed as a measure of
the strength of binding between two molecules: the
stronger the binding, the
larger is the value of the affinity constant.
Table 3-3
The Relationship Between Free-Energy Differences and Equilibrium Constants
The equilibrium constant of a reaction in which two molecules bind to
each other is related directly to the standard free-energy change for the binding
(Δ
G°) by the equation described in
Table 3-3. The table also lists the
Δ
G° values corresponding to a range of
K values. Affinity constants for simple binding
interactions in biological systems often range between
10
3 and 10
12 liters/mole;
this corresponds to binding energies in the range 4-17 kcal/mole, which could
arise from 4 to 17 average hydrogen bonds.
Atoms and Molecules Move Very Rapidly 6
The chemical reactions in a cell occur at amazingly fast rates. A typical
enzyme molecule, for example, will catalyze on the order of 1000 reactions per
second, and rates of more than 10 6 reactions per second are achieved by some
enzymes. Since each reaction requires a separate encounter between an enzyme and
a substrate molecule, such rates are possible only because the molecules are
moving so rapidly. Molecular motions can be classified broadly into three kinds:
(1) the movement of a molecule from one place to another (translational
motion), (2) the rapid back-and-forth movement of covalently linked atoms with
respect to one another (vibrations), and (3) rotations. All of these motions are
important in bringing the surfaces of interacting molecules together.
The rates of molecular motions can be measured by a variety of
spectroscopic techniques. These indicate that a large globular protein is constantly
tumbling, rotating about its axis about a million times per second. The rates of
diffusional encounters due to translational movements are proportional to the
concentration of the diffusing molecule. If ATP is present at its typical intracellular
concentration of about 1 mM, for instance, each site on a protein molecule will be
bombarded by about 10 6 random collisions with ATP molecules per second; for
an ATP concentration tenfold lower, the number of collisions would drop to
10 5 per second and so on.
Once two molecules have collided and are in the correct relative
orientation, a chemical reaction can occur between them extremely rapidly. When one
appreciates how quickly molecules move and react, the observed rates of
enzymatic catalysis do not seem so amazing.
Molecular Recognition Processes Can Never Be
Perfect 7
All molecules possess energy - the kinetic energy of their translational
movements, vibrations, and rotations and the potential energy stored in their
electron distributions. Through molecular collisions this energy is randomly
distributed to all of the atoms present, so that most atoms will have energy levels close to
the average, with only a small proportion possessing very high energy. Although
the favored conformations or states for a molecule will be those of lowest free
energy (see p. 75), states of higher energy occur through unusually violent
collisions. Given the temperature, it is possible to calculate the probability that an atom
or a molecule will be in a particular energy state (see
Table 3-3). The probability
of a high-energy state becomes smaller relative to a low-energy state as the
difference in free energy between the two increases. It reaches zero, however,
only when this energy difference becomes infinite.
Because of the random factor in molecular interactions, minor "side
reactions" are bound to occur occasionally. As a consequence, a cell
continually makes errors. Even reactions that are very energetically unfavorable will
take place occasionally. Two atoms joined to each other by a covalent bond, for
example, will eventually be subjected to an especially energetic collision and
fall apart. Similarly, the specificity of an enzyme for its substrate cannot be
absolute because the recognition of one molecule as distinct from another can never
be perfect. Mistakes could be avoided completely only if the cell could
evolve mechanisms with infinite energy differences between alternatives. Since this
is not possible, cells are forced to tolerate a certain level of failure and have
instead evolved a variety of repair reactions to correct those errors that are the
most damaging.
On the other hand, errors are essential to life as we know it. If it were not
for occasional mistakes in the maintenance of DNA sequences, evolution could
not occur.
Summary
The sequence of subunits in a macromolecule contains information that
determines the three-dimensional contours of its surface. These contours in turn govern the
recognition between one molecule and another, or between different parts of the
same molecule, by means of weak, noncovalent bonds. The attractive forces are of
four types: ionic bonds, van der Waals attractions, hydrogen bonds, and an
interaction between nonpolar groups caused by their hydrophobic expulsion from water.
Two molecules will recognize each other by a process in which they meet by random
diffusion, stick together for a while, and then dissociate. The strength of this
interaction is generally expressed in terms of an equilibrium constant. Since the only way to
make recognition infallible is to make the energy of binding infinitely large, living
cells constantly make errors; those that are intolerable are corrected by specific
repair processes.
Nucleic Acids 8
Genes Are Made of DNA 9
It has been obvious for as long as humans have sown crops or raised animals
that each seed or fertilized egg must contain a hidden plan, or design, for the
development of the organism. In modern times the science of genetics grew up
around the premise of invisible information-containing elements, called
genes, that are distributed to each daughter cell when a cell divides. Therefore, before
dividing, a cell has to make a copy of its genes in order to give a complete set to
each daughter cell. The genes in the sperm and egg cells carry the hereditary
information from one generation to the next.
The inheritance of biological characteristics must involve patterns of
atoms that follow the laws of physics and chemistry: in other words, genes must
be formed from molecules. At first the nature of these molecules was hard to
imagine. What kind of molecule could be stored in a cell and direct the activities
of a developing organism and also be capable of accurate and almost
unlimited replication?
By the end of the nineteenth century biologists had recognized that the
carriers of inherited information were the chromosomes that become visible in the nucleus as a cell begins to divide. But the evidence that the deoxyribonucleic
acid (DNA) in these chromosomes is the substance of which genes are made
came only much later, from studies on bacteria. In 1944 it was shown that adding
purified DNA from one strain of bacteria to a second, slightly different
bacterial strain conferred heritable properties characteristic of the first strain upon
the second. Because it had been commonly believed that only proteins have
enough conformational complexity to carry genetic information, this discovery came
as a surprise, and it was not generally accepted until the early 1950s. Today the
idea that DNA carries genetic information in its long chain of nucleotides is so
fundamental to biological thought that it is sometimes difficult to realize the
enormous intellectual gap that it filled.
DNA Molecules Consist of Two Long Chains Held
Together by Complementary Base Pairs 10
The difficulty that geneticists had in accepting DNA as the substance of genes
is understandable, considering the simplicity of its chemistry. A DNA chain is
a long, unbranched polymer composed of only four types of subunits. These
are the deoxyribonucleotides containing the bases adenine (A), cytosine (C),
guanine (G), and thymine (T). The nucleotides are linked together by covalent
phospho-diester bonds that join the 5' carbon of one deoxyribose group to the
3' carbon of the next (see Panel 2-6, pp. 58-59). The four kinds of bases are attached to
this repetitive sugar-phosphate chain almost like four kinds of beads strung on
a necklace.
How can a long chain of nucleotides encode the instructions for an
organism or even a cell? And how can these messages be copied from one
generation of cells to the next? The answers lie in the structure of the DNA molecule.
Early in the 1950s x-ray diffraction analyses of specimens of DNA pulled
into fibers suggested that the DNA molecule is a helical polymer composed of
two strands. The helical structure of DNA was not surprising since, as we have
seen, a helix will often form if each of the neighboring subunits in a polymer is
regularly oriented. But the finding that DNA is two-stranded was of crucial
significance. It provided the clue that led, in 1953, to the construction of a model
that fitted the observed x-ray diffraction pattern and thereby solved the puzzle of
DNA structure and function.
Figure 3-10
.
The DNA double helix
(A) A short section of the helix viewed from its side. Four
complementary base pairs are shown. The bases are shown in
green,while the deoxyribose sugars are
blue.(B) The helix viewed from an end. Note that the two
DNA strands run in opposite directions and that each base pair is held together
by either two or three hydrogen bonds (see also
Panel 3-2, pp. 100-101).
An essential feature of the model was that all of the bases of the DNA
molecule are on the
inside of the double helix, with the sugar phosphates on
the outside. This demands that the bases on one strand be extremely close to
those on the other, and the fit proposed required specific
base-pairing between a large purine base (A or G, each of which has a double ring) on one chain and a
smaller pyrimidine base (T or C, each of which has a single ring) on the other chain
( ).
Both evidence from earlier biochemical experiments and conclusions
derived from model building suggested that complementary base
pairs (also called
Watson-Crick base pairs) form between A and T and between G and C.
Biochemical analyses of DNA preparations from different species had shown that,
although the nucleotide composition of DNA varies a great deal (for example, from
13% A residues to 36% A residues in the DNA of different types of bacteria), there
is a general rule that quantitatively [G] = [C] and [A] = [T]. Model building
revealed that the numbers of effective hydrogen bonds that could be formed between
G and C or between A and T were greater than for any other combinations
(see
Panel 3-2, pp. 100-101). The double-helical model for DNA thus neatly
explained the quantitative biochemistry.
The Structure of DNA Provides an Explanation
for Heredity 11
A gene carries biological information in a form that must be precisely copied
and transmitted from each cell to all of its progeny. The implications of the
discovery of the DNA double helix were profound because the structure
immediately suggested how information transfer could be accomplished. Since each
strand contains a nucleotide sequence that is exactly complementary to the
nucleotide sequence of its partner strand, both strands actually carry the same genetic
information. If we designate the two strands A and
A', strand A can serve as a mold or template for making a new strand A', while strand
A' can serve in the same way to make a new strand A. Thus genetic information can be copied by a process
in which strand A separates from strand A' and each separated strand then
serves as a template for the production of a new complementary partner strand.
As a direct consequence of the base-pairing mechanism, it becomes
evident that DNA carries information by means of the linear sequence of its
nucleotides. Each nucleotide - A, C, T, or G - can be considered a letter in a four-letter
alphabet that is used to write out biological messages in a linear "ticker-tape"
form. Organisms differ because their respective DNA molecules carry different
nucleotide sequences and therefore different biological messages.
Figure 3-11
.
The DNA sequence of the human
β-globin gene
The gene encodes one of the two subunits of the hemoglobin molecule, which
carries oxygen in the blood. Only one of the two DNA strands is shown
(the "coding strand"), since the other strand has a precisely
complementary sequence. The sequence should be read from left to right in successive
lines down the page, as if it were normal English text.
Since the number of possible sequences in a DNA chain
n nucleotides long is 4
n, the biological variety that could in principle be generated using even
a modest length of DNA is enormous. A typical animal cell contains a meter of
DNA (3 x 10
9 nucleotides). Written in a linear αbet of four letters, an
unusually small human gene would occupy a quarter of a page of text ( ),
while the genetic information carried in a human cell would fill a book of more
than 500,000 pages.
Figure 3-12
.
DNA synthesis
The addition of a deoxyribonucleotide
to the 3' end of a polynucleotide chain is the fundamental reaction by
which DNA is synthesized. As shown, base-pairing between this
incoming deoxyribonucleotide and an existing strand of DNA (the template strand) guides the formation of a new
strand of DNA with a complementary nucleotide sequence.
Although the principle underlying gene replication is both elegant
and simple, the actual machinery by which this copying is carried out in the cell
is complicated and involves a complex of proteins that form a "replication
machine." The fundamental reaction is that shown in , in which
the enzyme
DNA polymerase catalyzes the addition of a deoxyribonucleotide to
the 3' end of a DNA chain. Each nucleotide added to the chain is a
deoxyribonucleoside triphosphate;the release of pyrophosphate from this activated
nucleotide and its subsequent hydrolysis provide the energy for the
DNA replication reaction and make it effectively irreversible.
Figure 3-13
.
The semiconservative replication of DNA
In each round of replication each of the two strands of DNA is used as a template for
the formation of a complementary DNA strand. The original strands
therefore remain intact through many cell generations.
Replication of the DNA helix begins with the local separation of its
two complementary DNA strands. Each strand then acts as a template for the
formation of a new DNA molecule by the sequential addition of
deoxyribonucleoside triphosphates. The nucleotide to be added at each step is selected by a
process that requires it to form a complementary base pair with the next nucleotide
in the parental template strand, thereby generating a new DNA strand that
is complementary in sequence to the template strand (see ).
Eventually, the genetic information is duplicated in its entirety, so that two complete
DNA double helices are formed, each identical in nucleotide sequence to the
parental DNA helix that served as the template. Since each daughter DNA
molecule ends up with one of the original strands plus one newly synthesized strand,
the mechanism of DNA replication is said to be
semiconservative ( ).
Errors in DNA Replication Cause Mutations 12
One of the most impressive features of DNA replication is its accuracy.
Several proofreading mechanisms are used to eliminate incorrectly positioned
nucleotides; as a result, the sequence of nucleotides in a DNA molecule is copied
with fewer than one mistake in 10 9 nucleotides added. Very rarely, however, the
replication machinery skips or adds a few nucleotides, or puts a T where it
should have put a C, or an A instead of a G. Any change of this kind in the DNA
sequence constitutes a genetic mistake, called a
mutation, which will be copied in all future cell generations since "wrong" DNA sequences are copied as faithfully
as "correct" ones. The consequence of such an error can be great, for even a
single nucleotide change can have important effects on the cell, depending on
where the mutation has occurred.
Geneticists demonstrated conclusively in the early 1940s that genes
specify the structure of individual proteins. Thus a mutation in a gene, caused by
an alteration in its DNA sequence, may lead to the inactivation of a crucial
protein and result in cell death, in which case the mutation will be lost. On the
other hand, a mutation may be silent and not affect the function of the protein.
Very rarely, a mutation will create a gene with an improved or novel useful
function. In this case organisms carrying the mutation will have an advantage, and
the mutated gene may eventually replace the original gene in the population
through natural selection.
The Nucleotide Sequence of a Gene Determines the
Amino Acid Sequence of a Protein 13
DNA is relatively inert chemically. The information it contains is expressed
indirectly via other molecules: DNA directs the synthesis of specific RNA and
protein molecules, which in turn determine the cell's chemical and physical
properties.
Figure 3-14
.
The amino acid sequence of bovine insulin
Insulin is a very small protein that consists
of two polypeptide chains, one 21 and the other 30 amino acid residues
long. Each chain has a unique, genetically determined sequence of amino
acids. The one-letter symbols used to specify amino acids are those listed
in Panel 2-5, pages 56-57; the
SS bonds shown in red are disulfide bonds between cysteine residues.
The protein is made initially as a single long polypeptide chain (encoded by
a single gene) that is subsequently cleaved to give the two chains.
At about the time that biophysicists were analyzing the
three-dimensional structure of DNA by x-ray diffraction, biochemists were intensively studying
the chemical structure of proteins. It was already known that proteins are chains
of amino acids joined together by sequential peptide linkages; but it was only in
the early 1950s, when the small protein
insulin was sequenced ( ), that
it was discovered that each type of protein consists of a unique sequence of
amino acids. Just as solving the structure of DNA was seminal in understanding
the molecular basis of genetics and heredity, so sequencing insulin provided a
key to understanding the structure and function of proteins. If insulin had a
definite, genetically determined sequence, then presumably so did every other
protein. It seemed reasonable to suppose, moreover, that the properties of a
protein would depend on the precise order in which its constituent amino acids are
arranged.
Both DNA and protein are composed of a linear sequence of subunits;
eventually, the analysis of the proteins made by mutant genes demonstrated that
the two sequences are co-linear - that is, the nucleotides in DNA are arranged in
an order corresponding to the order of the amino acids in the protein they
specify. It became evident that the DNA sequence contains a coded specification of
the protein sequence. The central question in molecular biology then became
how a cell translates a nucleotide sequence in DNA into an amino acid sequence
in a protein.
Portions of DNA Sequence Are Copied into RNA
Molecules That Guide Protein Synthesis 14
The synthesis of proteins involves copying specific regions of DNA (the
genes)
into polynucleotides of a chemically and functionally different type known as
ribonucleic acid, or RNA. RNA, like DNA, is composed of a linear sequence
of nucleotides, but it has two small chemical differences: (1) the
sugar-phosphate backbone of RNA contains ribose instead of a deoxyribose sugar and (2) the
base thymine (T) is replaced by uracil (U), a very closely related base that likewise
pairs with A (see
Panel 3-2, pp. 100-101).
RNA retains all of the information of the DNA sequence from which it
was copied, as well as the base-pairing properties of DNA. Molecules of RNA are
synthesized by a process known as DNA
transcription, which is similar to DNA replication in that one of the two strands of DNA acts as a template on which
the base-pairing abilities of incoming nucleotides are tested. When a good match
is achieved with the DNA template, a ribonucleotide is incorporated as a
covalently bonded unit. In this way the growing RNA chain is elongated one nucleotide
at a time.
Figure 3-15
.
The transfer of information from DNA to protein
The transfer proceeds by means of
an RNA intermediate called messenger RNA (mRNA). In procaryotic cells
the process is simpler than in eucaryotic cells. In eucaryotes the coding
regions of the DNA (in the exons,shown in color) are separated by
noncoding regions (the introns). As
indicated, these introns must be removed by an enzymatically catalyzed
RNA-splicing reaction to form the mRNA.
DNA transcription differs from DNA replication in a number of ways.
The RNA product, for example, does not remain as a strand annealed to DNA.
Just behind the region where the ribonucleotides are being added, the original
DNA helix re-forms and releases the RNA chain. Thus RNA molecules are
single-stranded. Moreover, RNA molecules are relatively short compared to DNA
molecules since they are copied from a limited region of the DNA - enough to
make one or a few proteins ( ). RNA transcripts that direct the synthesis
of protein molecules are called messenger RNA
(mRNA) molecules, while other RNA transcripts serve as
transfer RNAs (tRNAs) or form the RNA components
of ribosomes (rRNA) or smaller ribonucleoprotein particles.
The amount of RNA made from a particular region of DNA is controlled
by gene regulatory proteins that bind to specific sites on DNA close to the
coding sequences of a gene. In any cell at any given time, some genes are used to
make RNA in very large quantities while other genes are not transcribed at all. For
an active gene thousands of RNA transcripts can be made from the same DNA
segment in each cell generation. Because each mRNA molecule can be
translated into many thousands of copies of a polypeptide chain, the information
contained in a small region of DNA can direct the synthesis of millions of copies of a
specific protein. The protein fibroin, for example, is the major component of silk.
In each silk gland cell a single fibroin gene makes
10 4 copies of mRNA, each of which directs the synthesis of
10 5 molecules of fibroin - producing a total of
10 9 molecules of fibroin in just 4 days.
Eucaryotic RNA Molecules Are Spliced to Remove
Intron Sequences 15
In bacterial cells most proteins are encoded by a single uninterrupted stretch
of DNA sequence that is copied without alteration to produce an mRNA
molecule. In 1977 molecular biologists were astonished by the discovery that most
eucaryotic genes have their coding sequences (called
exons) interrupted by noncoding sequences (called
introns). To produce a protein, the entire length of the
gene, including both its introns and its exons, is first transcribed into a very large
RNA molecule - the
primary transcript. Before this RNA molecule leaves the
nucleus, a complex of RNA-processing enzymes removes all of the intron
sequences, thereby producing a much shorter RNA molecule. After this RNA-processing
step, called RNA splicing, has been completed, the RNA molecule moves to the
cytoplasm as an mRNA molecule that directs the synthesis of a particular protein
(see ).
This seemingly wasteful mode of information transfer in eucaryotes is
presumed to have evolved because it makes protein synthesis much more
versatile. The primary RNA transcripts of some genes, for example, can be spliced in
various ways to produce different mRNAs, depending on the cell type or stage
of development. This allows different proteins to be produced from the same
gene. Moreover, because the presence of numerous introns facilitates genetic
recombination events between exons, this type of gene arrangement is likely to
have been profoundly important in the early evolutionary history of genes,
speeding up the process whereby organisms evolve new proteins from parts of
preexisting ones instead of evolving totally new amino acid sequences.
Sequences of Nucleotides in mRNA Are "Read" in Sets
of Three and Translated into Amino Acids 16
Figure 3-16
.
The genetic code
Sets of three nucleotides ( codons) in
an mRNA molecule are translated into amino acids in the course of
protein synthesis according to the rules shown. The codons GUG and
GAG, for example, are translated into valine and glutamic acid, respectively.
Note that those codons with U or C as the second nucleotide tend to specify
the more hydrophobic amino acids (compare with Panel 2-5, pp. 56-57).
The rules by which the nucleotide sequence of a gene is translated into the
amino acid sequence of a protein, the so-called genetic
code, were deciphered in the early 1960s. The sequence of nucleotides in the mRNA molecule that acts as
an intermediate was found to be read in serial order in groups of three. Each
triplet of nucleotides, called a codon, specifies one amino acid. Since RNA is
a linear polymer of four different nucleotides, there are
4
3 = 64 possible codon triplets (remember that it is the
sequence of nucleotides in the triplet that is
important). However, only 20 different amino acids are commonly found in
proteins, so that most amino acids are specified by several codons; that is,
the genetic code is
degenerate. The code (shown in ) has been highly
conserved during evolution: with a few minor exceptions, it is the same in
organisms as diverse as bacteria, plants, and humans.
Figure 3-17
.
The three possible reading frames in protein synthesis
In the process of translating
a nucleotide sequence ( blue) into an amino acid sequence
( green), the sequence of nucleotides in an
mRNA molecule is read from the 5' to the
3' end in sequential sets of three nucleotides. In principle,
therefore, the same RNA sequence can specify three completely different amino
acid sequences, depending on the "reading frame."
In principle, each RNA sequence can be translated in any one of three
different
reading frames depending on where the decoding process begins
( ). In almost every case only one of these reading frames will produce a
functional protein. Since there are no punctuation signals except at the beginning
and end of the RNA message, the reading frame is set at the initiation of the
translation process and is maintained thereafter.
tRNA Molecules Match Amino Acids
to Groups of Nucleotides 17
The codons in an mRNA molecule do not directly recognize the amino acids
they specify in the way that an enzyme recognizes a substrate. The
translation of mRNA into protein depends on "adaptor" molecules that recognize both
an amino acid and a group of three nucleotides. These adaptors consist of a set
of small RNA molecules known as transfer RNAs
(tRNAs), each about 80 nucleotides in length.
Figure 3-18
.
Phenylalanine tRNA of yeast
(A) The molecule is drawn with a cloverleaf shape to show
the complementary base-pairing ( short gray
bars) that occurs in the helical regions of the molecule. (B)
The actual shape of the molecule, based on x-ray diffraction analysis, is
shown schematically. Complementary base pairs are indicated as long gray bars. In addition, the nucleotides
involved in unusual base-pair interactions that hold different parts of the
molecule together are colored red and are connected by a red line in both (A) and (B). The pairs are numbered
in (B). (C) One of the unusual base-pair interactions. Here one base
forms hydrogen-bond interactions with two others; several such "base
triples" help fold up this tRNA molecule.
A tRNA molecule has a folded three-dimensional conformation that is
held together in part by noncovalent base-pairing interactions like those that
hold together the two strands of the DNA helix. In the single-stranded tRNA
molecule, however, the complementary base pairs form between nucleotide residues in
the
same chain, which causes the tRNA molecule to fold up in a unique way that
is important for its function as an adaptor. Four short segments of the
molecule contain a double-helical structure, producing a molecule that looks like a
"cloverleaf" in two dimensions. This cloverleaf is in turn further compacted into
a highly folded, L-shaped conformation that is held together by more
complex hydrogen-bonding interactions ( ). Two sets of unpaired
nucleotide residues at either end of the "L" are especially important for the function of
the tRNA molecule in protein synthesis: one forms the
anticodon that base-pairs to a complementary triplet in an mRNA molecule (the codon), while the
CCA sequence at the 3' end of the molecule is attached covalently to a specific
amino acid (see ).
The RNA Message Is Read from One End
to the Other by a Ribosome 18
Figure 3-19
.
Information flow in protein synthesis
(A) The nucleo-tides in an mRNA molecule are
joined together to form a complementary copy of a segment of one strand
of DNA. (B) They are then matched three at a time to complementary
sets of three nucleotides in the anticodon regions of tRNA molecules. At
the other end of each type of tRNA molecule, a specific amino acid
is held in a high-energy linkage, and when matching occurs, this
amino acid is added to the end of the growing polypeptide chain.
Thus translation of the mRNA nucleotide sequence into an amino
acid sequence depends on complementary base-pairing between codons in
the mRNA and corresponding tRNA anticodons. The molecular basis
of information transfer in translation is therefore very similar to that in
DNA replication and transcription. Note that the mRNA is both
synthesized and translated starting from its
5' end.
Figure 3-20
.
Synthesis of a protein by ribosomes attached to an mRNA molecule
Ribosomes
become attached to a start signal near the 5' end of the mRNA molecule and
then move toward the 3' end, synthesizing protein as they go. A single mRNA
will usually have a number of ribosomes traveling along it at the same
time, each making a separate but identical polypeptide chain; the
entire structure is known as a polyribosome.
The codon recognition process by which genetic information is transferred
from mRNA via tRNA to protein depends on the same type of base-pair
interactions that mediate the transfer of genetic information from DNA to DNA and from
DNA to RNA ( ). But the mechanics of ordering the tRNA molecules on
the mRNA are complicated and require a ribosome, a complex of more than 50
different proteins associated with several structural RNA molecules (rRNAs).
Each ribosome is a large protein-synthesizing machine on which tRNA
molecules position themselves so as to read the genetic message encoded in an
mRNA molecule. The ribosome first finds a specific start site on the mRNA that sets
the reading frame and determines the amino-terminal end of the protein. Then,
as the ribosome moves along the mRNA molecule, it translates the nucleotide
sequence into an amino acid sequence one codon at a time, using tRNA
molecules to add amino acids to the growing end of the polypeptide chain ( ). When a ribosome reaches the end of the message, both it and the freshly
made carboxyl end of the protein are released from the
3' end of the mRNA molecule into the cytoplasm.
Ribosomes operate with remarkable efficiency: in one second a single
bacterial ribosome adds about 20 amino acids to a growing polypeptide
chain. Ribosome structure and the mechanism of protein synthesis are discussed
in Chapter 6.
Some RNA Molecules Function as Catalysts 19
Figure 3-21
.
A self-splicing RNA molecule
The diagram shows the self-splicing reaction in which
an intron sequence catalyzes its own excision from a Tetrahymena ribosomal RNA molecule. As
shown, the reaction is initiated when a G nucleotide is added to the
intron sequence, cleaving the RNA chain in the process; the newly created
3' end of the RNA chain then attacks the other side of the intron to
complete the reaction.
Figure 3-22
.
An enzymelike
reaction catalyzed by the purified Tetrahymena intron sequence
In this reaction, which corresponds
to the first step in , both a specific substrate RNA molecule and
a G nucleotide become tightly bound to the surface of the catalytic
RNA molecule. The nucleotide is then covalently attached to the
substrate RNA molecule, cleaving it at a specific site. The release of the resulting
two RNA chains frees the intron sequence for further cycles of reaction.
RNA molecules have commonly been viewed as strings of nucleotides with
a relatively uninteresting chemistry. In 1981 this view was shattered by the
discovery of a catalytic RNA molecule with the type of sophisticated chemical
reactivity that biochemists had previously associated only with proteins. The
ribosomal RNA molecules of the ciliated protozoan
Tetrahymena are initially synthesized as a large precursor from which one of the rRNAs is produced by an
RNA-splicing reaction. The surprise came with the discovery that this splicing can occur
in vitro in the absence of protein. It was subsequently shown that the intron
sequence itself has an enzymelike catalytic activity that carries out the two-step
reaction illustrated in . The 400-nucleotide-long intron sequence was
then synthesized in a test tube and shown to fold up to form a complex surface
that can function like an enzyme in reactions with other RNA molecules. For
example, it can bind two specific substrates tightly - a guanine nucleotide and an
RNA chain - and catalyze their covalent attachment so as to sever the RNA chain
at a specific site ( ).
In this model reaction, which mimics the first step in , the
same intron sequence acts repeatedly to cut many RNA chains. Although RNA
splicing is most commonly achieved by means that are not autocatalytic
(discussed in
Chapter 8), self-splicing RNAs with intron sequences related to that in
Tetrahymena have been discovered in other types of cells, including fungi and
bacteria. This suggests that these RNA sequences may have arisen before the
eucaryotic and procaryotic lineages diverged about 1.5 billion years ago.
Figure 3-23
.
A peptidyl transferase reaction catalyzed by a deproteinized ribosomal RNA molecule
The puromycin molecule mimics a tRNA charged with
the amino acid tyrosine, and it acts as a powerful inhibitor of
protein synthesis in cells by adding to the growing end of a polypeptide
chain on a ribosome. In this model reaction the growing polypeptide chain end
is mimicked by a hexanucleotide ( red,representing a tRNA) that
is covalently linked to N-formyl methionine (representing
the polypeptide). A highly purified large rRNA molecule catalyzes the
addition of the puromycin to the N-formyl methionine, forming a new
peptide bond and releasing the hexanucleotide.
Several other families of catalytic RNAs have recently been discovered.
Most tRNAs, for example, are initially synthesized as a larger precursor RNA, and
an RNA molecule has been shown to play the major catalytic role in an
RNA-protein complex that recognizes these precursors and cleaves them at specific
sites. A catalytic RNA sequence also plays an important part in the life cycle of
many plant viroids. Most remarkably, ribosomes are now suspected to function
largely by RNA-based catalysis, with the ribosomal proteins playing a supporting role
to the ribosomal RNAs (rRNAs), which make up more than half the mass of
the ribosome. The large rRNA by itself, for example, has peptidyl transferase
activity and will catalyze the formation of new peptide bonds ( ).
Figure 3-24
.
A
three-dimensional view of the catalytic core of the
type of intron RNA sequence illustrated in and
(A) The folded molecule, with
hydrogen-bond interactions shown in red.
This molecule, which is about 240 nucleotides long, is
shown immediately after the initial cut at the
5' side of the intron ( yellow). (B) Schematic of the molecule in (A) in
its unfolded form. (Adapted from L. Jaeger, E. Westhof, and F. Michel, J. Mol. Biol.221:1153-1164, 1991.)
How is it possible for an RNA molecule to act like an enzyme? The
example of tRNA indicates that RNA molecules can fold up in highly specific ways. A
proposed three-dimensional structure for the core of the self-splicing
Tetrahymena intron sequence is shown in . Interactions between different parts
of this RNA molecule (analogous to the unusual hydrogen bonds in tRNA
molecules - see ) are responsible for folding it to create a complex
three-dimensional surface with catalytic activity. An unusual juxtaposition of
atoms presumably strains covalent bonds and thereby makes selected atoms in
the folded RNA chain unusually reactive.
As explained in Chapter 1, the discovery of catalytic RNA molecules has
profoundly changed our views of how the first living cells arose.
Summary
Genetic information is carried in the linear sequence of nucleotides in DNA.
Each molecule of DNA is a double helix formed from two complementary strands of
nucleotides held together by hydrogen bonds between G-C and A-T base pairs.
Duplication of the genetic information occurs by the polymerization of a new
complementary strand onto each of the old strands of the double helix during DNA replication.
The expression of the genetic information stored in DNA involves the
translation of a linear sequence of nucleotides into a co-linear sequence of amino acids in
proteins. A limited segment of DNA is first copied into a complementary strand of
RNA. This primary RNA transcript is spliced to remove intron sequences, producing
an mRNA molecule. Finally, the mRNA is translated into protein in a complex set of
reactions that occur on a ribosome. The amino acids used for protein synthesis are
first attached to a family of tRNA molecules, each of which recognizes, by
complementary base-pairing interactions, particular sets of three nucleotides in the mRNA
(
codons
). The sequence of nucleotides in the mRNA is then read from one end to the other
in sets of three, according to a universal genetic code.
Other RNA molecules in cells function as enzymelike catalysts. These RNA
molecules fold up to create a surface containing nucleotides that have become
unusually reactive. One of these catalysts is the large rRNA of the ribosome, which catalyzes
the formation of peptide bonds during protein synthesis.
Protein
Structure 20
Introduction
To a large extent, cells are made of protein, which constitutes more than half
of their dry weight (see
Table 3-1). Proteins determine the shape and structure
of the cell and also serve as the main instruments of molecular recognition
and catalysis. Although DNA stores the information required to make a cell, it has
little direct influence on cellular processes. The gene for hemoglobin, for
example, cannot carry oxygen; that is a property of the protein specified by the gene.
DNA and RNA are chains of nucleotides that are chemically very similar
to one another. In contrast, proteins are made from an assortment of 20 very
different amino acids, each with a distinct chemical personality (see Panel 2-5,
pp. 56-57). This variety allows for enormous versatility in the chemical properties
of different proteins, and it presumably explains why evolution eventually
selected proteins rather than RNA molecules to catalyze most cellular reactions.
The Shape of a Protein Molecule Is Determined
by Its Amino Acid Sequence 21
Many of the bonds in a long polypeptide chain allow free rotation of the
atoms they join, giving the protein backbone great flexibility. In principle, then,
any protein molecule could adopt an almost unlimited number of shapes
(
conformations). Most polypeptide chains, however, fold into only one particular
conformation determined by their amino acid sequence. This is because the
backbones and side chains of the amino acids associate with one another and with water
to form various weak noncovalent bonds (see
Panel 3-1, pp. 92-93). Provided
that the appropriate side chains are present at crucial positions in the chain,
large forces are developed that make one particular conformation especially stable.
Most proteins can fold spontaneously into their correct shape. By
treatment with certain solvents, a protein can be unfolded, or denatured, to give a flexible polypeptide chain that has lost its native conformation. When the
denaturing solvent is removed, the protein will usually refold spontaneously into its
original conformation, indicating that all the information necessary to specify
the shape of a protein is contained in the amino acid sequence itself.
Figure 3-25
.
How a protein folds into a globular conformation
The polar amino acid side chains tend to
gather on the outside of the protein, where they can interact with water.
The nonpolar amino acid side chains are buried on the inside to form
a hydrophobic core that is "hidden" from water.
Figure 3-26
.
Hydrogen bonding
Some of the hydrogen bonds (shown in color) that can form between
the amino acids in a protein. The peptide bonds are shaded in gray.
Figure 3-27
.
Details of intra-molecular hydrogen bonds in a protein
In this region of the
enzyme lysozyme, hydrogen bonds form between two side chains
(
blue), between a side chain and an atom in a peptide bond
(
yellow), or between atoms in two peptide bonds
(
red). For reference, see . (After
C.K. Mathews and K.E. van Holde, Biochemistry. Redwood City,
CA: Benjamin/Cummings, 1990.)
One of the most important factors governing the folding of a protein is
the distribution of its polar and nonpolar side chains. The many hydrophobic
side chains in a protein tend to be pushed together in the interior of the
molecule, which enables them to avoid contact with the aqueous environment (just as
oil droplets coalesce after being mechanically dispersed in water). By contrast,
the polar side chains tend to arrange themselves near the outside of the protein
molecule, where they can interact with water and with other polar molecules
( ). Since the peptide bonds are themselves polar, they tend to
interact both with one another and with polar side chains to form hydrogen bonds
( ); nearly all polar residues buried within the protein are paired in this
way ( ). Hydrogen bonds thus play a major part in holding together
different regions of polypeptide chain in a folded protein molecule. They are also
crucially important for many of the binding interactions that occur on protein
surfaces.
Figure 3-28
.
Disulfide-bond formation
The drawing illustrates
the formation of a covalent disulfide bond between the side chains
of neighboring cysteine residues in a protein.
Secreted or cell-surface proteins often form additional
covalent intrachain bonds. Most notably, the formation of
disulfide bonds (also called S-S bonds) between the two
-SH groups of neighboring cysteine residues in a
folded polypeptide chain ( ) often serves to stabilize the
three-dimensional structure of extracellular proteins. These bonds are not required for the
specific folding of proteins, since folding occurs normally in the presence of
reducing agents that prevent S-S bond formation. In fact,
S-S bonds are rarely, if ever, formed in protein molecules in the cytosol because the high cytosolic
concentration of -SH reducing agents breaks such bonds.
The net result of all the individual amino acid interactions is that most
protein molecules fold up spontaneously into precisely defined
conformations. Those that are compact and globular have an inner core composed of
clustered hydrophobic side chains - packed into a tight, nearly crystalline
arrangement - while a very complex and irregular exterior surface is formed by the more
polar side chains. The positioning and chemistry of the different atoms on this
intricate surface make each protein unique and enable it to bind specifically to
other macromolecular surfaces and to certain small molecules (discussed below).
>
From both a chemical and a structural standpoint, proteins are the most
sophisticated molecules known.
Common Folding Patterns Recur in Different
Protein Chains 22
Although all the information required for the folding of a protein chain is
contained in its amino acid sequence, we have not yet learned how to "read"
this information so as to predict the detailed three-dimensional structure of a
protein whose sequence is known. Consequently, the folded conformation can
be determined only by an elaborate x-ray diffraction
analysis performed on crystals of the protein or, if the protein is very small, by nuclear magnetic resonance
techniques (see Chapter 4). So far, more than 100 types of protein folds have been
discovered by this technique. Each protein has a specific conformation so
intricate and irregular that it would require a chapter to describe it in full
three-dimensional detail.
When the three-dimensional structures of different protein molecules
are compared, it becomes clear that, although the overall conformation of each
protein is unique, several structural patterns recur repeatedly in parts of these
macromolecules. Two patterns are particularly common because they result
from regular hydrogen-bonding interactions between the peptide bonds
themselves rather than between the side chains of particular amino acids. Both patterns
were correctly predicted in 1951 from model-building studies based on the
different x-ray diffraction patterns of silk and hair. The two regular patterns discovered
are now known as the β sheet, which occurs in the protein fibroin, found in silk,
and the α helix, which occurs in the protein
α-keratin, found in skin and its appendages, such as hair, nails, and feathers.
Figure 3-29
.
A β sheet is a common structure formed by parts of the polypeptide chain in globular proteins
At the top, a domain of
115 amino acids from an immunoglobulin molecule is shown; it consists of
a sandwich-like structure of two β sheets, one of which is drawn in
color. At the bottom, a perfect antiparallel β sheet is shown in detail, with
the amino acid side chains denoted R. Note that every peptide bond
is hydrogen-bonded to a neighboring peptide bond. The actual
sheet structures in globular proteins are usually less regular than the
β sheet shown here, and most sheets are slightly twisted (see ).
The core of most (but not all) globular proteins contains extensive
regions of β sheet. In the example illustrated in , which shows part of an
antibody molecule, an
antiparallel β sheet is formed when an extended
polypeptide chain folds back and forth upon itself, with each section of the chain running
in the direction opposite to that of its immediate neighbors. This gives a very
rigid structure held together by hydrogen bonds that connect the peptide bonds
in neighboring chains. The antiparallel β sheet and the closely related
parallel β sheet (which is formed by regions of polypeptide chain that run in the same
direction) frequently serve as the framework around which globular proteins
are constructed.
Figure 3-30
.
An α helix is another common structure formed by parts of the polypeptide chain in proteins
(A) The oxygen-carrying
molecule myoglobin (153 amino acids long) is shown, with one region of
α helix outlined in color. (B) A perfect α helix is shown in outline. (C) As in the
β sheet, every peptide bond in an α helix is hydrogen-bonded to
a neighboring peptide bond. Note that for clarity in (B) both the side
chains [which protrude radially along the outside of the helix and are
denoted by R in (C)] and the hydrogen atom are omitted on the
α-carbon atom
of each amino acid (see also ).
An α helix is generated when a single polypeptide chain turns regularly
about itself to make a rigid cylinder in which each peptide bond is regularly
hydrogen-bonded to other peptide bonds nearby in the chain. Many globular
proteins contain short regions of such α helices ( ), and those portions
of a transmembrane protein that cross the lipid bilayer are usually
α helices because of the constraints imposed by the hydrophobic lipid environment (discussed
in
Chapter 10).
In aqueous environments an isolated α helix is usually not stable on its
own. Two identical α helices that have a repeating arrangement of nonpolar
side chains, however, will twist around each other gradually to form a
particularly stable structure known as a coiled-coil (see p. 125). Long rodlike coiled-coils
are found in many fibrous proteins, such as the intracellular
α-keratin fibers that reinforce skin and its appendages.
Figure 3-31
.
Space-filling models
of an α helix and a β sheet with
( right) and without ( left) their amino acid side chains
(A) An α helix (part of the structure of myoglobin). (B) A
region of β sheet (part of the structure of an immunoglobulin domain). In
the photographs on the left, each side chain is represented by a single
darkly shaded atom (the R groups in and ), while the entire
side chain is shown on the right. (Courtesy of Richard J. Feldmann.)
Space-filling representations of an α helix and a
β sheet from actual proteins are shown with and without their side chains in .
Proteins Are Amazingly Versatile Molecules 23
Figure 3-32
.
Contrast between collagen and elastin
(A) Collagen is a triple helix formed by three
extended protein chains that wrap around each other. Many
rodlike collagen molecules are cross-linked together
in the extracellular space to form inextensible
collagen fibrils ( top) that have the tensile strength of steel.
(B) Elastin polypeptide chains are cross-linked together
to form elastic fibers. Each elastin molecule uncoils into
a more extended conformation when the fiber is
stretched. The striking contrast between the physical properties
of elastin and collagen is due entirely to their very
different amino acid sequences.
Because of the variety of their amino acid side chains, proteins are
remarkably versatile with respect to the types of structures they can form. Contrast, for
example, two abundant proteins secreted by cells in connective
tissue - collagen and elastin - both present in the extracellular matrix. In
collagen molecules three separate polypeptide chains, each rich in the amino acid proline and
containing the amino acid glycine at every third residue, are wound around one another
to generate a regular triple helix. These collagen molecules are packed together
into fibrils in which adjacent molecules are tied together by covalent cross-links
between neighboring lysine residues, giving the fibril enormous tensile
strength ( ).
Elastin is at the opposite extreme. Its relatively loose and
unstructured polypeptide chains are cross-linked covalently to generate a rubberlike
elastic meshwork that enables tissues such as arteries and lungs to deform and
stretch without damage. As illustrated in , the elasticity is due to the
ability of individual protein molecules to uncoil reversibly whenever a stretching
force is applied.
Figure 3-33
.
Some possible sizes and shapes of a protein molecule 300 amino acid residues long
The structure formed is determined by
the amino acid sequence. (Adapted from D.E. Metzler, Biochemistry. New
York: Academic Press, 1977.)
It is remarkable that the same basic chemical structure - a chain of
amino acids - can form so many different structures: a rubberlike elastic
meshwork (elastin), an inextensible cable with the tensile strength of steel (collagen), or
any of the wide variety of catalytic surfaces on the globular proteins that function
as enzymes. illustrates and compares the range of shapes that
could, in theory, be adopted by a polypeptide chain 300 amino acids long. As we
have already emphasized, the conformation actually adopted depends on the
amino acid sequence.
Proteins Have Different Levels of Structural
Organization 24
In describing the structure of a protein, it is helpful to distinguish various
levels of organization. The amino acid sequence is called the
primary structure of the protein. Regular hydrogen-bond interactions within contiguous stretches
of polypeptide chain give rise to α helices and β sheets, which constitute
the protein's secondary structure. Certain combinations of
α helices and β sheets pack together to form compactly folded globular units, each of which is called
a protein domain. Domains are usually constructed from a section of
polypeptide chain that contains between 50 and 350 amino acids, and they seem to be
the modular units from which proteins are constructed (see below). While
small proteins may contain only a single domain, larger proteins contain a number
of domains, which are often connected by relatively open lengths of
polypeptide chain. Finally, individual polypeptides often serve as subunits for the
formation of larger molecules, sometimes called protein assemblies or protein
complexes, in which the subunits are bound to one another by a large number of
weak, noncovalent interactions; in extracellular proteins these interactions are
often stabilized by disulfide bonds.
Figure 3-34
.
Basic pancreatic trypsin inhibitor (BPTI)
The three-dimensional conformation of this small protein is shown in five
commonly used representations. (A) A stereo pair illustrating the positions of
all nonhydrogen atoms. The main chain is shown with heavy lines and the
side chains with thin lines. (B) Space-filling model showing the van der
Waals radii of all atoms (see
Panel 3-1). (C) Backbone wire
model composed of lines that connect each α carbon along the
polypeptide backbone. (D) "Ribbon model," which represents all regions of
regular hydrogen-bonded interactions as either helices
(a helices) or sets of arrows (b sheets) pointing toward the carboxyl-terminal end of the chain.
(E) "Sausage model," which shows the course of the polypeptide chain
but omits all detail. In the bottom three panels the
hairpin β motif is colored
green; this motif is also found in many other proteins (see text). Note
that the core of all globular proteins is densely packed with atoms. Thus
the impression of an open structure produced by models (C), (D), and (E)
is misleading. (B and C, courtesy of Richard J. Feldmann; A and D, courtesy
of Jane Richardson.)
The three-dimensional structure of a protein can be illustrated in
various ways. Consider the unusually small protein basic pancreatic trypsin
inhibitor (BPTI), which contains 58 amino acid residues folded into one domain. BPTI
can be shown as a stereo pair displaying all of its nonhydrogen atoms ( ) or as an accurate space-filling model, where most of the details are
obscured ( ). Alternatively, it can be shown more schematically, with all of
the side chains and actual atoms omitted so that it is easier to follow the course
of the main polypeptide chain ( ). An average-size
protein contains about six times more amino acid residues than BPTI, and many
proteins are more than 20 times its size. Schematic drawings are essential for
displaying the structure of these larger proteins, and we use them throughout this text.
Figure 3-35
.
Three levels of organization of a protein
The three-dimensional structure of a
protein can be described in terms of different levels of folding, each of which
is constructed from the preceding one in hierarchical fashion. These
levels are illustrated here using the catabolite activator protein (CAP),
a bacterial gene regulatory protein with two domains. When the large
domain binds cyclic AMP, it causes a conformational change in the
protein that enables the small domain to bind to a specific DNA sequence.
The amino acid sequence is termed the primary
structure and the first folding level the secondary
structure. As indicated under the brackets at
the bottom of this figure, the combination of the second and
third folding levels shown here is commonly termed the tertiary structure, and the fourth level
(the assembly of subunits) the quaternary
structure of a protein. (Modified from a drawing by Jane Richardson.)
shows how the structure of a large protein can be resolved
into several levels of organization, each level constructed from the one below it in
a hierarchical fashion. These levels of increased organizational complexity
may correspond to the steps by which a newly synthesized protein folds into its
final native structure inside the cell.
Domains Are Formed from a Polypeptide Chain
That Winds Back and Forth, Making Sharp Turns
at the Protein Surface 24
A protein domain can be viewed as the basic structural unit of a protein
structure. The core of each domain is largely composed of a set of interconnected
β sheets or α helices or both. These regular secondary structures are favored
because they permit an extensive hydrogen bonding between the backbone
atoms, which is essential for stabilizing the interior of the domain, where water is
not available to form hydrogen bonds with the polar carbonyl oxygen or amide
hydrogen of the peptide bond.
Figure 3-36
.
Example of a common protein motif
In the beta-alpha-beta motif two adjacent parallel
strands that form a β sheet structure are connected by an
α helix. Like the hairpin β motif highlighted
in , this motif is found in many different proteins.
Because there are only a limited number of ways of combining
α helices and β sheets to make a globular structure, certain combinations of these
elements, called motifs, occur repeatedly in the core of many unrelated proteins. One
example is the
hairpin beta motiffound in BPTI (colored
green in D), which consists of two antiparallel
β strands joined by a sharp turn formed by a loop of polypeptide chain. Another example is the
beta-alpha-beta motif,in which two adjacent parallel β strands are connected by a length of
α helix ( ). Several other common motifs are discussed in
Chapter 9, where
we consider the various DNA-binding motifs found in several families of gene
regulatory proteins.
Figure 3-37
.
Ribbon models of the three-dimensional structure of several differently organized protein domains
(A) Cytochrome
b 562, a single-domain protein composed almost entirely of
α helices. (B) The NAD-binding domain of lactic dehydrogenase, composed of
a mixture of α helices and β sheets. (C) The variable domain of
an immunoglobin light chain, composed of a sandwich of two
β sheets. In these examples the α helices are shown
in green,while strands organized as β sheets are denoted by red arrows. Note that the polypeptide chain generally traverses back and
forth across the entire domain, making sharp turns only at the
protein surface. The protruding loop
regions ( yellow) often form the binding
sites for other molecules. (Drawings courtesy of Jane Richardson.)
Various combinations of motifs form the protein domain itself, in which
the polypeptide chain tends to wind its way back and forth across the entire
structure, either as a β sheet or an α helix, reversing direction suddenly by making
a tight turn when it reaches the surface of the domain. As a result, a typical
domain is a compact structure whose surface is covered by protruding loops of
polypeptide chain ( ). The
loop
regions, which vary in length and have an irregular shape, often form the binding sites for other molecules. Because the
loop regions are exposed to water, they are rich in hydrophilic amino acids, and
on this basis their positions can frequently be predicted from a careful
examination of the amino acid sequence of a protein.
Relatively Few of the Many Possible Polypeptide Chains Would Be Useful
Since each of the 20 amino acids is chemically distinct and each can, in
principle, occur at any position in a protein chain, there are 20
x 20 x 20 x 20 = 160,000 different possible polypeptide chains 4 amino acids long, or
20 n different possible polypeptide chains n amino acids long. For a typical protein length of about
300 amino acids, more than 10 390 different proteins can be made.
We know, however, that only a very small fraction of these possible
proteins would adopt a stable three-dimensional conformation. The vast majority
would have many different conformations of roughly equal energy, each with
different chemical properties. Proteins with such variable properties would not be
useful and would therefore be eliminated by natural selection in the course of
evolution. Present-day proteins have an amazingly sophisticated structure and
chemistry because of their unique folding properties. Not only is the amino acid
sequence such that a single conformation is extremely stable, but this
conformation has the precise chemical properties that enable the protein to perform a
specific catalytic or structural function in the cell. Proteins are so precisely built that
the change of even a few atoms in one amino acid can sometimes disrupt the
structure and cause a catastrophic change in function.
New Proteins Usually Evolve by Alterations of Old
Ones 25
Cells have genetic mechanisms that allow genes to be duplicated, modified,
and recombined in the course of evolution. Consequently, once a protein with
useful surface properties has evolved, its basic structure can be incorporated in
many other proteins. Proteins of different but related function in present-day
organisms often have similar amino acid sequences. Such families of proteins are
believed to have evolved from a single ancestral gene that duplicated in the
course of evolution to give rise to other genes in which mutations gradually
accumulated to produce related proteins with new functions.
Figure 3-38
.
(A) Comparison of the amino acid sequences of two members of the serine protease family of enzymes
The carboxyl-terminal portions of the two
proteins are shown (amino acids 149 to 245). Identical amino acids are
connected by colored bars, and the serine residue in the active site at
position 195 is highlighted. In the
yellow
boxed sections of the polypeptide chains, each amino acid occupies a
closely equivalent position in the three-dimensional structures of the
two enzymes (see ). (B) The standard one-letter and
three-letter codes for amino acids. (Modified from J. Greer,
Proc. Natl. Acad. Sci. USA 77:3393-3397, 1980.)
Figure 3-39
.
Comparison of
the conformations of the two serine proteases shown in
Elastase is shown in (A) and chymotrypsin in (B). Although
only those amino acid residues in the polypeptide chain shaded in
green are the same in the two proteins,
their conformations are very similar everywhere. The active site, which
is circled in
red, contains an activated serine residue (see ). Chymotrypsin contains more than two chain termini because it
is formed by the proteolytic cleavage of chymotrypsinogen, an
inactive precursor.
Consider the serine proteases, a family of protein-cleaving
(proteolytic) enzymes that includes the digestive enzymes chymotrypsin, trypsin, and
elastase and some of the proteases in the blood-clotting and complement
enzymatic cascades. When two of these enzymes are compared, about 40% of the
positions in their amino acid sequences are found to be occupied by the same amino
acid ( ). The similarity of their three-dimensional conformations as
determined by x-ray crystallography is even more striking: most of the detailed
twists and turns in their polypeptide chains, which are several hundred amino
acids long, are identical ( ).
Figure 3-40
.
Comparison of DNA-binding homeodomains from two organisms separated by more than a billion years of evolution
(A) Schematic of structure. (B) Trace of the α-carbon positions. The
three-dimensional structures shown were determined by x-ray
crystallography for the yeast α2 protein
( green) and the Drosophila engrailed
protein ( red). (C) Comparison of amino acid sequences for the region of
the proteins shown in (A) and (B). Orange dotsdemark the position of a three amino acid insert in the
α2 protein. (Adapted from C. Wolberger, et al., Cell67:517-528, 1991.)
The story that we have told for the serine proteases could be repeated
for hundreds of other protein families. In many cases the amino acid sequences
have diverged much further than for the serine proteases, so that one cannot be
sure of a family relationship between two proteins without determining their
three-dimensional structures. The yeast
α2 protein and the
Drosophilaengrailed protein, for example, are both gene regulatory proteins in the homeodomain
family. Because they are identical in only 17 of their 60 amino acid residues,
their relationship became certain only when their three-dimensional structures
were compared ( ).
The various members of a large protein family will often have distinct
functions. Some of the amino acid changes that make these proteins different
were no doubt selected in the course of evolution because they resulted in changes
in biological activity, giving the individual family members the different
functional properties that they have today. Other amino acid changes are likely to be
"neutral," having neither a beneficial nor a damaging effect on the basic structure
and function of the protein. Since mutation is a random process, there must also
have been many deleterious changes that altered the three-dimensional structure
of these proteins sufficiently to inactivate them. Such inactive proteins would
have been lost whenever the individual organisms making them were at enough of
a disadvantage to be eliminated by natural selection. It is not surprising, then,
that cells contain whole sets of structurally related polypeptide chains that have
a common ancestry but different functions.
New Proteins Can Evolve by Recombining Preexisting Polypeptide
Domains 26
Once a number of stable protein surfaces have been made in a cell, new
surfaces with different binding properties can be generated by joining two or more
proteins together by noncovalent interactions between them, producing a protein complex. This combining of proteins to make larger, functional protein
assemblies is common. Many protein complexes have molecular weights of a
million or more, even though an average polypeptide chain has a molecular weight
of 40,000 (about 300 to 400 amino acids), and relatively few polypeptide chains
are more than three times this size.
Figure 3-41
.
The evolution of new ligand-binding sites
The general principle by which the
juxtaposition of separate protein surfaces in
the course of evolution has given rise to proteins that contain new
binding sites for other molecules (
ligandssee p. 129). As indicated here,
the ligand-binding sites often lie at the interface between two
protein domains and are formed from loop regions on the protein surface
(see also ).
Figure 3-42
.
The structure of the glycolytic enzyme glyceraldehyde 3-phosphate dehydrogenase
The protein is composed of two
domains, each shown in a different color, with regions of
α helix represented by cylinders and regions of b
sheet represented by arrows. The details of the reaction catalyzed by the
enzyme are shown in Figure 2-22. Note that the three bound substrates lie at
an interface between the two domains. (Courtesy of Alan J. Wonacott.)
An alternative way of making a new protein from existing chains is to join
the corresponding DNA sequences to make a gene that encodes a single
large polypeptide chain. Proteins in which different parts of the polypeptide chain
fold independently into separate globular domains are believed to have evolved in
this way, perhaps after existing for a prolonged period as a protein complex
formed from separate polypeptides. Many proteins have such "multidomain"
structures, and, as might be expected from the evolutionary considerations discussed
above, the binding sites for substrate molecules frequently lie where the separate
domains are juxtaposed ( ). Thus, for the multidomain protein
whose three-dimensional structure is shown in , a protein surface on
one domain that binds NAD
+ was apparently combined with a surface on a
second domain that binds a sugar, as part of the process of evolving an active site
that uses the NAD
+ to catalyze sugar oxidation.
Another way of reutilizing an amino acid sequence is especially
widespread among long fibrous proteins such as collagen (see ). In these cases
a structure is formed from multiple internal repeats of an ancestral amino
acid sequence. Putting together amino acid sequences by joining preexisting
coding DNA sequences is clearly a much more efficient strategy for a cell than the
alternative of deriving new protein sequences from scratch by random DNA mutation.
Structural Homologies Can Help Assign Functions
to Newly Discovered Proteins 27
Figure 3-43
.
Domain shuffling
An extensive shuffling of blocks of protein sequence (protein
modules) has occurred during the evolution
of proteins. Those portions of a protein denoted by the same shape and
color are evolutionarily related but not identical. (A) The bacterial
catabolite gene activator protein (CAP) contains one domain
(
blue triangle) that binds a specific DNA sequence and
a second domain (
red rectangle) that binds cyclic AMP (see ). The DNA-binding domain here is related to the DNA-binding
domains of many other gene regulatory proteins, including the lac
repressor and cro repressor proteins. In addition, two copies of the
cyclic-AMP-binding domain are found in eucaryotic protein kinases
regulated by the binding of cyclic nucleotides. (B) Serine proteases
like chymotrypsin are formed from two domains
(
brown). In some related proteases that are highly
regulated and more specialized, the two protease domains are connected
to one or more domains homologous to domains found in epidermal
growth factor (
green hexagon), to a calcium-binding protein
(
yellow triangle), or to a "kringle" domain
(
blue square) that contains three internal
disufide bridges.
The development of techniques for rapidly sequencing DNA molecules has
made it possible to determine the amino acid sequences of many thousands of
proteins from the nucleotide sequences of their genes. A rapidly enlarging
protein data base is therefore available that biologists routinely scan by computer to
search for possible sequence homologies between a newly sequenced protein and
previously studied ones. Although sequences have so far been determined for
only a few percent of the proteins in eucaryotic organisms, it is common to find
that a newly sequenced protein is homologous to some other, known protein over
part of its length, indicating that most proteins may have descended from
relatively few ancestral types. As expected, the sequences of many large proteins often
show signs of having evolved by the joining of preexisting domains in new
combinations - a process called
domain
shuffling ( ).
These protein comparisons are important because related structures
often imply related functions. Many years of experimentation can be saved by
discovering an amino acid sequence homology with a protein of known function.
Such sequence homologies, for example, first indicated that certain cell-cycle
regulatory genes in yeast cells and certain genes that cause mammalian cells to
become cancerous are protein kinases. In the same way many of the proteins that
control pattern formation in the fruit fly Drosophila were recognized to be gene regulatory proteins, while another protein involved in pattern formation was
identified as a serine protease.
The discovery of domain homologies can also be useful in another way. It
is much more difficult to determine the three-dimensional structure of a
protein than to determine its amino acid sequence. But the conformation of a
newly sequenced protein domain can be guessed if it is homologous to a domain of
a protein whose conformation has already been determined by x-ray
diffraction analysis. By assuming that the twists and turns of the polypeptide chain will
be conserved in the two proteins despite the presence of discrepancies in amino
acid sequence, one can often sketch the structure of the new protein with
reasonable accuracy (see ).
Many new protein sequences are being added to the data base each
year, each one increasing the chance of finding useful homologies.
Protein-sequence comparisons have therefore become a very important tool in cell biology.
Protein Subunits Can Assemble into Large
Structures 28
The same principles that enable several protein domains to associate to
form binding sites for small molecules operate to generate much larger structures
in the cell. Supramolecular structures such as enzyme complexes, ribosomes,
protein filaments, viruses, and membranes are not made as single, giant,
covalently linked molecules; instead they are formed by the noncovalent assembly of
many preformed molecules, which are called subunits of the final structure.
There are several advantages to the use of smaller subunits to build
larger structures: (1) building a large structure from one or a few repeating
smaller subunits reduces the amount of genetic information required; (2) both
assembly and disassembly can be readily controlled, since the subunits
associate through multiple bonds of relatively low energy; and (3) errors in the
synthesis of the structure can be more easily avoided, since correction mechanisms
can operate during the course of assembly to exclude malformed subunits.
A Single Type of Protein Subunit Can Interact with Itself
to Form Geometrically Regular Assemblies 29
Figure 3-44
.
The formation of a dimer from a single type of protein subunit
A protein with a binding site that recognizes itself will often
form symmetrical dimers. These may then pair with other subunits to form tetramers and
larger assemblies (not shown).
Figure 3-45
.
Ribbon model of a dimer formed from two identical protein subunits (monomers)
The protein shown is the bacterial catabolite gene activator
protein (CAP) illustrated previously in . (Courtesy of Jane Richardson.)
If a protein has a binding site that is complementary to a region of its own
surface, it will assemble spontaneously to form a larger structure. In the
simplest case, a binding site recognizes itself and forms a symmetrical
dimer. Many enzymes and other proteins form dimers of this kind, which frequently act as
subunits in the formation of larger assemblies ( and ).
Figure 3-46
.
Rings or helices can form if a single type of protein subunit interacts with itself repeatedly
The formation of a helix was illustrated in ; a
ring forms instead of a helix if the subunits run into one another,
stopping further growth of the chain.
Figure 3-47
.
An actin filament
There are about two globular protein subunits per turn in this
important filament, which is discussed in detail in Chapter 16.
Hexagonally packed globular protein subunits can form either a flat sheet or a tube
If the binding site of a protein is complementary to a region of its surface
that does not include the binding site itself, a chain of subunits will be
formed. For certain special orientations of the two binding sites, the chain will soon
run into itself and terminate, forming a closed ring of subunits ( ).
More commonly, an extended polymer of subunits will result, and provided that
each subunit is bound to its neighbor in an identical way, the subunits in
the polymer will be arranged in a helix that can be extended indefinitely (see ). An
actin filament, for example, is a helical structure formed from a
single globular protein subunit called
actin;actin filaments are major components in the cytosol of most eucaryotic cells ( ). As we discuss below,
globular proteins may also associate with like neighbors to form extended sheets or
tubes (see ).
Coiled-Coil Proteins Help Build Many Elongated Structures in
Cells 30
Figure 3-48
.
The structure of a coiled-coil
In (A) a single α helix is shown, with successive amino
acid side chains labeled in a sevenfold sequence "abcdefg" (from bottom
to top). Amino acids "a" and "d" in
such a sequence lie close together on the cylinder surface, forming a
"stripe" (shaded in red) that winds
slowly around the α helix. Proteins that form coiled-coils typically
have hydrophobic amino acids at positions "a" and "d." Consequently, as
shown in (B), the two α helices can wrap around each other with
the hydrophobic side chains of one α helix interacting with
the hydrophobic side chains of the other, while the more hydrophilic
amino acid side chains are left exposed to the aqueous environment. (C)
The atomic structure of a coiled-coil determined by x-ray
crystallography. The red side chains are
hydrophobic. (C, from T. Alber, Curr. Opin.
Genet. Devel.2:205-210, 1992. ©
Current Science.)
Where mechanical strength is of major importance, supramolecular
assemblies are usually made from fibrous rather than globular subunits. Such assemblies
can be stabilized by extensive regions of protein-protein contact when the
subunits are wound around one another as a multistranded helix. A particularly
stable structural unit that is used repeatedly for this purpose is known as the
coiled-coil. It forms by the pairing of two α-helical subunits that have a repeating
arrangement of nonpolar side chains. The two α-helical subunits are usually
identical and run in parallel (that is, in the same direction from amino to carboxyl
terminal). They coil gradually around each other to produce a stiff filament with
a diameter of about 2 nm ( ). Whereas short coiled-coils serve as
dimerization domains in several families of gene regulatory proteins, more
commonly a coiled-coil will extend for more than 100 nm and serve as a building block
for a large fibrous structure, such as the thick filaments in a muscle cell.
Proteins Can Assemble into Sheets, Tubes, or
Spheres 31
Some protein subunits assemble into flat sheets in which the subunits are
arranged in hexagonal arrays. Specialized membrane proteins are sometimes
arranged in this way in lipid bilayers. With a slight change in the geometry of
the individual subunits, a hexagonal sheet can be converted into a tube ( ) or, with more changes, into a hollow sphere. Protein tubes and spheres
that bind specific RNA and DNA molecules form the coats of viruses.
Figure 3-50
.
The structure of a spherical virus
In many viruses, identical protein subunits
pack together to create a spherical shell (a capsid) that encloses the
viral genome, composed of either RNA or DNA (see Figure 6-72). For
geometric reasons, no more than 60 identical subunits can pack together in
a precisely symmetrical way. If slight irregularities are allowed,
however, more subunits can be used to produce a larger capsid. The
tomato bushy stunt virus (TBSV) shown here, for example, is a spherical virus
about 33 nm in diameter that is formed from 180 identical copies of a
386 amino acid capsid protein plus an RNA genome of 4500 nucleotides.
To form such a large capsid, the protein must be able to fit into
three somewhat different environments, each of which is differently colored
in the particle shown here. The postulated pathway of assembly
is shown; the precise three-dimensional structure has been determined by
x-ray diffraction. (Courtesy of Steve Harrison.)
The formation of closed structures, such as rings, tubes, or spheres,
provides additional stability because it increases the number of bonds that can form
between the protein subunits. Moreover, because such a structure is formed
by mutually dependent, cooperative interactions between subunits, it can be
driven to assemble or disassemble by a relatively small change that affects the
subunits individually. These principles are dramatically illustrated in the protein
capsid of many simple viruses, which takes the form of a hollow sphere. These coats
are often made of hundreds of identical protein subunits that enclose and
protect the viral nucleic acid ( ). The protein in such a capsid must have a
particularly adaptable structure, since it must make several different kinds of
contacts and also change its arrangement to let the nucleic acid out to initiate
viral replication once the virus has entered a cell.
Many Structures in Cells Are Capable of
Self-assembly 32
Figure 3-51
.
The structure of tobacco mosaic virus (TMV)
(A) Electron micrograph of a tobacco mosaic
virus (TMV), which consists of a single long RNA molecule enclosed in
a cylindrical protein coat composed of a tight helical array of
identical protein subunits. (B) A model showing part of the structure of
TMV. A single-stranded RNA molecule of 6000 nucleotides is packaged in
a helical coat constructed from 2130 copies of a coat protein 158
amino acids long. Fully infective virus particles can self-assemble in a
test tube from purified RNA and protein molecules. (A, courtesy of
Robley Williams; B, courtesy of Richard J. Feldmann.)
The information for forming many of the complex assemblies of
macromolecules in cells must be contained in the subunits themselves, since under
appropriate conditions the isolated subunits can spontaneously assemble in a test tube
into the final structure. The first large macromolecular aggregate shown to be
capable of self-assembly from its component parts was
tobacco mosaic virus (TMV). This virus is a long rod in which a cylinder of protein is arranged around a helical
RNA core ( ). If the dissociated RNA and protein subunits are mixed
together in solution, they recombine to form fully active virus particles. The
assembly process is unexpectedly complex and includes the formation of double
rings of protein, which serve as intermediates that add to the growing virus coat.
Another complex macromolecular aggregate that can reassemble from
its component parts is the bacterial ribosome. These ribosomes are composed
of about 55 different protein molecules and 3 different rRNA molecules. If the
individual components are incubated under appropriate conditions in a test
tube, they spontaneously re-form the original structure. Most important, such
reconstituted ribosomes are able to carry out protein synthesis. As might be
expected, the reassembly of ribosomes follows a specific pathway: certain proteins first
bind to the RNA, and this complex is then recognized by other proteins, and so on
until the structure is complete.
Figure 3-52
.
Three ways in which a large protein assembly can be made to a fixed length
(A) Coassembly along an elongated core protein
or other macromolecule that acts as a measuring device. (B) Termination
of assembly because of strain that accumulates in the
polymeric structure as additional subunits are added, so that beyond a
certain length the energy required to fit another subunit onto the
chain becomes excessively large. (C) A vernier type of assembly, in
which two sets of rodlike molecules differing in length form a staggered
complex that grows until their ends exactly match.
Figure 3-53
.
Electron micrograph of bacteriophage lambda
The tip of the virus tail attaches to a specific
protein on the surface of a bacterial cell, following which the tightly
packaged DNA in the head is injected through the tail into the cell. The tail has
a precise length, which is determined by the mechanism shown in .
It is still not clear how some of the more elaborate self-assembly
processes are regulated. Many structures in the cell, for example, appear to have a
precisely defined length that is many times greater than that of their component
macromolecules. How such length determination is achieved is in most cases a
mystery. Three possible mechanisms are illustrated in . In the
simplest case a long core protein or other macromolecule provides a scaffold that
determines the extent of the final assembly. This is the mechanism that
determines the length of the TMV particle, where the RNA chain provides the core.
Similarly, a core protein is thought to determine the length of the thin filaments in
muscle, as well as the long tails of some bacterial viruses ( ).
Not All Biological Structures Form by
Self-assembly 33
Figure 3-54
.
The polypeptide hormone insulin cannot spontaneously re-form if its disulfide bonds are disrupted
It is synthesized as a
larger protein ( proinsulin) that is cleaved by a proteolytic enzyme after
the protein chain has folded into a specific shape. Excision of part of
the proinsulin polypeptide chain causes an irretrievable loss of the
information needed for the protein to fold spontaneously into its normal conformation.
Some cellular structures held together by noncovalent bonds are not capable
of self-assembly. A mitochondrion, a cilium, or a myofibril, for example,
cannot form spontaneously from a solution of their component macromolecules
because part of the information for their assembly is provided by special enzymes
and other cellular proteins that perform the function of jigs or templates but do
not appear in the final assembled structure. Even small structures may lack some
of the ingredients necessary for their own assembly. In the formation of some
bacterial viruses, for example, the head structure, which is composed of a
single protein subunit, is assembled on a temporary scaffold composed of a
second protein. The second protein is absent from the final virus particle, and so the
head structure cannot spontaneously reassemble once it is taken apart. Other
examples are known in which proteolytic cleavage is an essential and irreversible step
in the assembly process. This is the case for the coats of some bacterial viruses
and even for some simple protein assemblies, including the structural protein
collagen and the hormone insulin ( ).
From these relatively simple
examples, it seems very likely that the assembly of a structure as complex as a
mitochondrion or a cilium will involve both temporal and spatial ordering
imparted by other cellular components, as well as irreversible processing steps
catalyzed by degradative enzymes.
Summary
The three-dimensional conformation of a protein molecule is determined by
its amino acid sequence. The folded structure is stabilized by noncovalent
interactions between different parts of the polypeptide chain. The amino acids with
hydrophobic side chains tend to cluster in the interior of the molecule, and local
hydrogen-bond interactions between neighboring peptide bonds give rise to
α helices and β sheets. Globular regions known as domains are the modular units from which
many proteins are constructed; small proteins typically contain only a single domain,
while large proteins contain several domains linked together by short lengths of
polypeptide chain. As proteins evolved, domains were modified and combined with
other domains to construct new proteins.
Proteins are brought together into larger structures by the same
noncovalent forces that determine protein folding. Proteins with binding sites for their own
surface can assemble into dimers, closed rings, spherical shells, or helical polymers.
Although mixtures of proteins and nucleic acids can assemble spontaneously into
complex structures in the test tube, many assembly processes involve irreversible
steps. Consequently, not all structures in the cell are capable of spontaneous
reassembly after they are dissociated into their component parts.
Proteins as
Catalysts 34
Introduction
The chemical properties of a protein molecule depend almost entirely on
its exposed surface residues, which are able to form weak, noncovalent bonds
with other molecules. When a protein molecule binds to another molecule, the
second molecule is commonly referred to as a
ligand. Because an effective interaction between a protein molecule and a ligand requires that many weak bonds
be formed simultaneously between them, the only ligands that can bind tightly
to a protein are those that fit precisely onto its surface.
Figure 3-55
.
The ligand-binding site of the catabolite gene activator protein (CAP)
Hydrogen bonding between CAP and its ligand,
cyclic AMP (
green), was determined by x-ray crystallographic analysis of
the complex. As indicated, the two identical subunits of the
dimer cooperate to form this binding site (see also ). (Courtesy
of Tom Steitz.)
The region of a protein that associates with a ligand, known as its
binding site, usually consists of a cavity formed by a specific arrangement of amino
acids on the protein surface. These amino acids often belong to widely
separated regions of the polypeptide chain ( ), and they represent only a
minor fraction of the total amino acids present. The rest of the protein molecule is
presumably necessary to maintain the polypeptide chain in the correct position
and to provide additional binding sites for regulatory purposes; the interior of the
protein is often important only insofar as it gives the surface of the molecule
the appropriate shape and rigidity.
A Protein's Conformation Determines Its
Chemistry 20
Neighboring surface residues on a protein often interact in a way that alters
the chemical reactivity of selected amino acid side chains. These interactions are
of several types.
Figure 3-56
.
Competition for hydrogen bonding
The ability of water molecules to make
favorable hydrogen bonds with groups on the protein surface greatly reduces
the tendency of these groups to pair with each other.
First, neighboring parts of the polypeptide chain may interact in a way
that restricts the access of water molecules to other parts of the protein surface.
Because water molecules tend to form hydrogen bonds, they compete with
ligands for selected side chains on the protein surface ( ). The tightness
of hydrogen bonds (and ionic interactions) between proteins and their ligands
is therefore greatly increased if water molecules are excluded. At first sight it is
hard to imagine a mechanism that would exclude a molecule as small as water
from a protein surface without affecting the access of the ligand itself. Because of
their strong tendency for hydrogen bonding, however, water molecules exist in a
large hydrogen-bonded network (see
Panel 2-1, pp. 48-49), and it is often
energetically unfavorable for individual molecules to break away from this network to
reach into a crevice on the protein surface.
Figure 3-57
.
An unusually reactive amino acid at the active site of an enzyme
The example shown is the "catalytic triad" found
in chymotrypsin, elastase, and other serine proteases (see ). The aspartic acid side chain induces the histidine to remove the
proton from serine 195; this activates the serine to form a covalent bond
with the enzyme substrate, hydrolyzing a peptide bond as illustrated later
in .
Second, the clustering of neighboring polar amino acid side chains can
alter their reactivity. If a number of negatively charged side chains are forced
together against their mutual repulsion by the way the protein folds, for
example, the affinity of the site for a positively charged ion is greatly increased.
Selected amino acid side chains can also interact with one another through
hydrogen bonds, which can activate normally unreactive side groups (such as the
CH
2OH on the serine shown in ) so that they are able to enter into
reactions that make or break selected covalent bonds.
The surface of each protein molecule therefore has a unique chemical
reactivity that depends not only on which amino acid side chains are exposed,
but also on their exact orientation relative to one another. For this reason even
two slightly different conformations of the same protein molecule may differ
greatly in their chemistry.
Figure 3-58
.
Coenzymes
Coenzymes, such as thiamine
pyrophosphate (TPP), shown here in gray, are
small molecules that bind to an enzyme's surface and enable it to
catalyze specific reactions. The reactivity of TPP depends on its "acidic"
carbon atom, which readily exchanges its hydrogen atom for a carbon atom of
a substrate molecule. Other regions of the TPP molecule act as "handles"
by which the enzyme holds the coenzyme in the correct
position. Coenzymes presumably evolved first in an "RNA world," where they
were bound to RNA molecules to help with catalysis (discussed in Chapter 1).
Figure 3-59
.
Computer-generated space-filling models of two enzymes
In (A) cytochrome c is shown with its bound heme coenzyme. In (B)
egg-white lysozyme is shown with a bound oligosaccharide substrate.
In both cases the bound ligand is red. (Courtesy of Richard J. Feldmann.)
Where side-chain reactivities are insufficient, proteins often enlist the
help of selected nonpolypeptide molecules that the proteins bind to their
surface. These ligands serve as coenzymes in enzyme-catalyzed reactions, and they
may be so tightly bound to the protein that they are effectively part of the protein
itself. Examples are the iron-containing
hemes in hemoglobin and cytochromes,
thiamine pyrophosphate in enzymes involved in aldehyde-group transfers, and
biotin in enzymes involved in carboxyl-group transfers. Most coenzymes are very
complex organic molecules that have been selected for the unique chemical
reactivity they acquire when bound to a protein surface. Besides its reactive center
such a coenzyme has other residues designed to bind it to its host protein ( ). A space-filling model of an enzyme bound to a coenzyme is shown in .
Substrate Binding Is the First Step in Enzyme
Catalysis 35
Figure 3-60
.
Enzyme kinetics
The rate of an enzyme reaction
( V) increases as the substrate concentration increases until
a maximum value ( Vmax) is reached.
At this point all substrate-binding sites on the enzyme molecules are
fully occupied, and the rate of reaction is limited by the rate of the
catalytic process on the enzyme surface. For most enzymes the concentration
of substrate at which the reaction rate is half-maximal
( KM) is a measure of how tightly the substrate is
bound, with a large value of KM corresponding to weak binding.
One of the most important functions of proteins is to act as enzymes that
catalyze specific chemical reactions. The ligand in this case is called a
substrate molecule, and the binding of the substrate to the enzyme is an essential
prelude to the chemical reaction (see ). If we denote the enzyme by E,
the substrate by S, and the product by P, the basic reaction path is E + S
![[right harpoon over left harpoon]](corehtml/pmc/pmcents/x21CC.gif)
ES
![[right harpoon over left harpoon]](corehtml/pmc/pmcents/x21CC.gif)
EP
![[right harpoon over left harpoon]](corehtml/pmc/pmcents/x21CC.gif)
E + P. From this simple outline of an enzyme-catalyzed reaction, we see
that there is a limit to the amount of substrate that a single enzyme molecule
can process in a given time. If the concentration of substrate is increased, the rate
at which product is formed also increases, up to a maximum value ( ). At that point the enzyme molecule is saturated with substrate and the rate
of reaction (denoted
Vmax) depends only on how rapidly the substrate molecule
can be processed. This rate divided by the enzyme concentration is called the
turnover number. The turnover number is often about 1000 substrate molecules
processed per second per enzyme molecule, but it can be much greater in
extreme cases.
The other kinetic parameter frequently used to characterize an enzyme is
its
KM, which is the substrate concentration that allows the reaction to proceed
at one-half its maximum rate (see ). A
low KM value means that the enzyme reaches its maximum catalytic rate at a
low concentration of substrate and generally indicates that the enzyme binds its substrate very tightly.
Enzymes Speed Reactions by Selectively Stabilizing Transition
States 36
Figure 3-61
.
Enzymes accelerate chemical reactions by decreasing the activation energy
Often both
the uncatalyzed reaction (A) and the enzyme-catalyzed reaction (B)
go through several transition states.
It is the transition state with the highest energy
(S T and ES T) that determines the activation energy
and limits the rate of the reaction.
(S = substrate; P = product of the reaction.)
Extremely high rates of chemical reaction are achieved by enzymes - far
higher than for any synthetic catalysts. This efficiency is attributable to several
factors. The enzyme serves, first, to increase the local concentration of substrate
molecules at the catalytic site and to hold all of the appropriate atoms in the
correct orientation for the reaction that is to follow. More important, however, some
of the binding energy contributes directly to the catalysis. Substrate molecules
pass through a series of intermediate forms of altered geometry and electron
distribution before they form the ultimate products of the reaction, and the free
energies of these intermediate forms - especially of those in the most
unstable transition states - are the major determinants of the rate of reaction.
Enzymes have a much greater affinity for these transition states of the substrate than
they have for the stable forms. Because this binding interaction lowers the
energies of crucial transition states, the enzyme greatly accelerates one particular
reaction ( ).
Figure 3-62
.
Catalytic antibodies
The stabilization of a transition
state by an antibody creates an enzyme. (A) The reaction path for hydrolysis of
an amide bond goes through a tetrahedral intermediate,which is the high-energy transition state for
the reaction. (B) The molecule shown on the left was covalently linked to
a protein and used as an antigen to generate an antibody that
binds tightly to the region of the molecule shown in yellow. Because this antibody also bound tightly to
the transition state in (A), it was found to function as an enzyme that
efficiently catalyzed the hydrolysis of the amide bond in the molecule shown on
the right.
A dramatic demonstration of how stabilizing a transition state can
greatly increase reaction rates is provided by the intentional production of
antibodies that act like enzymes. Consider, for example, the hydrolysis of an amide
bond, which is similar to the peptide bond that joins adjacent amino acids in a
protein. In an aqueous solution an amide bond hydrolyzes very slowly by the
mechanism illustrated in . In the central intermediate, or transition state,
the carbonyl carbon is bonded to four atoms that are arranged at the corners of
a tetrahedron. By generating monoclonal antibodies that bind tightly to a
stable analogue of this very unstable
tetrahedral intermediate,as illustrated in , an antibody that functions like an enzyme can be obtained. This
catalytic antibody binds to and stabilizes the tetrahedral intermediate and thereby
increases the spontaneous rate of amide-bond hydrolysis more than 10,000-fold.
Enzymes Can Promote the Making and Breaking
of Covalent Bonds Through Simultaneous Acid
and Base Catalysis 37
Figure 3-63
.
Acid catalysis and base catalysis
(A) The start of the uncatalyzed reaction shown in is diagrammed, with
blueshading as a schematic indicator
of electron distribution in the water and carbonyl bonds. (B) An acid likes
to donate a proton (H
+) to other atoms. By pairing with the carbonyl
oxygen, an acid causes electrons to move away from the carbonyl
carbon, making this atom much more attractive to the
electronegative oxygen of an attacking water molecule. (C) A base likes to take
up H
+; by pairing with a hydrogen of
the attacking water molecule, a base causes electrons to move toward
the water oxygen, making it a better attacking group for the
carbonyl carbon. (D) By having appropriately positioned atoms on its surface,
an enzyme can carry out both acid catalysis and base catalysis at
the same time.
Enzymes are better catalysts than catalytic antibodies. In addition to
binding tightly to the transition state, the active site of an enzyme contains precisely
positioned atoms that speed up the reaction by altering the distribution of
electrons in those atoms involved in the making and breaking of covalent bonds.
Peptide bonds, for example, can be hydrolyzed in the absence of an enzyme by
exposing a polypeptide to either a strong acid or a strong base, as explained in . Enzymes are unique, however, in being able to use acid and
base catalysis simultaneously, since the acidic and basic residues required are
prevented from combining with each other (as they would do in solution) by
being tied to the rigid framework of the protein itself ( D).
The fit between an enzyme and its substrate needs to be precise. A
small change introduced by genetic engineering in the active site of an enzyme
can have a profound effect. Replacing a glutamic acid with an aspartic acid in
one enzyme, for example, shifts the position of the catalytic carboxylate ion by
only 1 Å (about the radius of a hydrogen atom), and yet this is enough to reduce
the activity of the enzyme a thousandfold.
Enzymes Can Further Increase Reaction Rates by
Forming Covalent Intermediates with Their
Substrates 38
In addition to the above roles, many enzymes further speed the reaction
they catalyze by interacting covalently with one of their substrates, thereby
temporarily attaching the substrate to an amino acid or to a coenzyme molecule.
Generally, one substrate enters the binding site, becomes covalently bound, and
then reacts with a second molecule on the enzyme surface that breaks the
covalent attachment just made. At the end of each reaction cycle, the free enzyme is
regenerated.
Figure 3-64
.
Some enzymes form covalent bonds with their substrates
In the example shown here,
a carbonyl group in a polypeptide chain (shown in
green) forms a covalent bond with a specially activated
serine residue (see ) of a serine protease (shown in
gray), which cleaves the polypeptide chain.
When the unbound portion of the polypeptide chain has diffused
away, a second step occurs in which a water molecule hydrolyzes the
newly formed covalent bond, thereby releasing the portion of
the polypeptide bound to the enzyme surface and freeing the serine
for another cycle of reaction. Note that two unstable
tetrahedral intermediates (shaded in
yellow)
serve as transition states in this reaction and both are stabilized by
the enzyme.
Consider, for example, the mechanism of action of the serine proteases.
The reaction they catalyze, the hydrolysis of a peptide bond, is greatly accelerated
by the enzymes' affinity for the tetrahedral intermediate of the reaction. But a
serine protease does more than a typical catalytic antibody: instead of waiting for
an oxygen from a water molecule to attack the carbonyl carbon, it makes the
reaction go much more quickly by first using a precisely positioned amino acid
side chain for this purpose (the activated serine in ). This step breaks
the peptide bond, but it leaves the enzyme covalently linked to the carboxyl
group. Then, in a rapid second step, this covalent intermediate is destroyed by the
enzyme-catalyzed addition of water, completing the reaction and regenerating
the free enzyme ( ). Even though this two-step reaction is less direct
than a one-step reaction (in which water is added to the peptide bond), it is
faster because each step has a relatively low activation energy.
Enzymes Accelerate Chemical Reactions but Cannot Make Them Energetically More Favorable
No matter how sophisticated an enzyme is, it cannot make the chemical
reaction it catalyzes either more or less energetically favorable. It cannot alter the
free-energy difference between the initial substrates and the final products of
the reaction. Like the simple binding interactions already discussed, any given
chemical reaction has an
equilibrium point, at which the backward and forward
reaction fluxes are equal, so that no net change occurs (see ). If an
enzyme speeds up the rate of the forward reaction, A + B
→
AB, by a factor of 10
8, it must speed up the rate of the backward reaction, AB
→
A + B, by a factor of 10
8 as
well. The
ratio of the forward to the backward rates of reaction depends only on
the concentrations of A, B, and AB. The equilibrium point remains precisely the
same whether or not the reaction is catalyzed by an enzyme.
Enzymes Determine Reaction Paths by Coupling
Selected Reactions to ATP Hydrolysis 39
The living cell is a chemical system that is far from equilibrium. The product
of each enzyme usually serves as a substrate for another enzyme in the
metabolic pathway and is rapidly consumed. More important, by means of a reaction
pathway that is determined by enzymes, many reactions are driven in one
direction by being coupled to the energetically favorable hydrolysis of ATP to ADP and
inorganic phosphate, as previously described in Chapter 2. To make this
strategy effective, the ATP pool is itself maintained at a level far from its equilibrium
point, with a high ratio of ATP to its hydrolysis products (discussed in Chapter 14).
This ATP pool thereby serves as a "storage battery" that keeps energy and atoms
continually passing through the cell directed along pathways determined by the
enzymes present. For a living system, approaching chemical equilibrium
means decay and death.
Multienzyme Complexes Help to Increase
the Rate of Cell Metabolism 40
The efficiency of enzymes in accelerating chemical reactions is crucial to
the maintenance of life. Cells, in effect, must race against the unavoidable
processes of decay, which run downhill toward chemical equilibrium. If the rates of
desirable reactions were not greater than the rates of competing side reactions, a
cell would soon die. Some idea of the rate at which cellular metabolism proceeds
can be obtained by measuring the rate of ATP utilization. A typical mammalian
cell turns over (that is, completely degrades and replaces) its entire ATP pool
once every 1 or 2 minutes. For each cell this turnover represents the utilization
of roughly 10 7 molecules of ATP per second (or, for the human body, about a
gram of ATP every minute).
The rates of cellular reactions are rapid because of the effectiveness of
enzyme catalysis. Many important enzymes have become so efficient that there
is no possibility of further useful improvement: the factor limiting the reaction
rate is no longer the intrinsic speed of action of the enzyme, rather it is the
frequency with which the enzyme collides with its substrate. Such a reaction is said to
be diffusion-limited.
If a reaction is diffusion-limited, its rate will depend on the concentration
of both the enzyme and its substrate. For a sequence of reactions to occur very
rapidly, each metabolic intermediate and enzyme involved must therefore be
present in high concentration. Given the enormous number of different reactions
carried out by a cell, there are limits to the concentrations of substrates that can
be achieved. In fact, most metabolites are present in micromolar
(10 -6 M) concentrations, and most enzyme concentrations are much lower. How is it
possible, therefore, to maintain very fast metabolic rates?
The answer lies in the spatial organization of cell components. Reaction
rates can be increased without raising substrate concentrations by bringing the
various enzymes involved in a reaction sequence together to form a large
protein assembly known as a multienzyme complex. In this way the product of
enzyme A is passed directly to enzyme B and so on to the final product, and diffusion
rates need not be limiting even when the concentration of substrate in the cell as
a whole is very low. Such enzyme complexes are very common (the structure
of one, pyruvate dehydrogenase, was shown in Figure 2-41), and they are
involved in nearly all aspects of metabolism, including the central genetic processes
of DNA, RNA, and protein synthesis. In fact, it may be that few enzymes in
eucaryotic cells diffuse freely in solution; instead, most may have evolved binding
sites that concentrate them with other proteins of related function in particular
regions of the cell, thereby increasing the rate and efficiency of the reactions that
they catalyze.
Figure 3-65
.
Compartmentalization
A large increase in the
concentration of interacting molecules can be achieved by confining them to
the same membrane-bounded compartment in a eucaryotic cell.
Cells have another way of increasing the rate of metabolic reactions. It
depends on the extensive intracellular membrane systems of eucaryotic cells.
These membranes can segregate certain substrates and the enzymes that act on
them into the same membrane-bounded compartment, such as the
endoplasmic reticulum or the cell nucleus. If, for example, the compartment occupies a
total of 10% of the volume of the cell, the concentration of reactants in the
compartment can be 10 times greater than in a similar cell with no
compartmentalization ( ). Reactions that would otherwise be limited by the speed
of diffusion can thereby be speeded up by the same factor.
Further details of protein structure and function will be presented in Chapter 5, where we discuss how cells construct tiny machines out of proteins.
Summary
The biological function of a protein depends on the detailed chemical properties
of its surface. Binding sites for ligands are formed as surface cavities in which
precisely positioned amino acid side chains are brought together by protein folding. In
this way, normally unreactive amino acid side chains can be activated. Enzymes
greatly speed up reaction rates by binding the high-energy transition states in a
reaction especially tightly; they also carry out acid catalysis and base catalysis
simultaneously. The rates of enzyme reactions are often so fast that they are limited only by
diffusion; rates can be further increased if enzymes that act sequentially on a substrate
are joined into a single multienzyme complex or if the enzymes and their substrates
are confined to the same compartment of the cell.
Copyright © 1994 Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson. Published by Garland Publishing, a member of the Taylor & Francis Group. No part of the publication may be reproduced or used in any form or by any means known now or invented hereafter without the permission of the publisher.