RNA and Protein Synthesis
Introduction
Proteins constitute more than half the total dry mass of a cell, and their
synthesis is central to cell maintenance, growth, and development. Protein synthesis
occurs on ribosomes. It depends on the collaboration of several classes of RNA
molecules and begins with a series of preparatory steps. First, a molecule of messenger RNA (mRNA) must be copied from the DNA that encodes the protein.
Meanwhile, in the cytoplasm, each of the 20 amino acids from which the protein is to be
built must be attached to its specific transfer RNA
(tRNA) molecule, and the subunits of the ribosome on which the new protein is to be made must be preloaded
with auxiliary protein factors. Protein synthesis begins when all of these
components come together in the cytoplasm to form a functioning ribosome. As a single
molecule of mRNA moves stepwise through a ribosome, the sequence of
nucleotides in the mRNA molecule is translated into a corresponding sequence of amino
acids to produce a distinctive protein chain, as specified by the DNA sequence
of its gene. We begin by considering how the many different RNA molecules in a
cell are made.
RNA Polymerase Copies DNA into RNA:
The Process of DNA Transcription 1
RNA is synthesized on a DNA template by a process known as
DNA transcription. Transcription generates the mRNAs that carry the information for
protein synthesis, as well as the transfer, ribosomal, and other RNA molecules that
have structural or catalytic functions. All of these RNA molecules are synthesized
by RNA polymerase enzymes, which make an RNA copy of a DNA sequence.
In eucaryotes three kinds of RNA polymerase molecules synthesize different
types of RNA, as described in Chapter 8. These RNA polymerases are thought to
have derived during evolution from the single enzyme present in bacteria that
mediates all bacterial RNA synthesis.
Figure 6-2
.
The synthesis of an RNA molecule by RNA polymerase
The enzyme binds to the
promoter sequence on the DNA and begins its synthesis at a start site within
the promoter. It completes its synthesis at a stop (termination)
signal, whereupon both the polymerase and its completed RNA chain are
released. During RNA chain elongation, polymerization rates average about
30 nucleotides per second at 37°C. Therefore, an RNA chain of
5000 nucleotides takes about 3 minutes to complete.
Figure 6-3
.
The chain elongation reaction catalyzed by an RNA polymerase enzyme
In each step an incoming ribonucleoside triphosphate is selected for its
ability to base-pair with the exposed DNA template strand; a
ribonucleoside monophosphate is then added to the growing,
3'-OH end of the RNA chain (red arrow), and pyrophosphate
is released (red atoms). The new RNA chain therefore grows by
one nucleotide at a time in the 5'-to-3' direction, and it is complementary
in sequence to the DNA template strand. The reaction is driven both by
the favorable free-energy change that accompanies the release
of pyrophosphate and by the subsequent hydrolysis of
the pyrophosphate to inorganic phosphate (see Figure 2-30).
The bacterial RNA polymerase is a large multisubunit enzyme associated
with several additional protein subunits that enter and leave the
polymerase-DNA complex at different stages of transcription. Free RNA polymerase
molecules collide randomly with the bacterial chromosome, sliding along it but sticking
only weakly to most DNA. The polymerase binds very tightly, however, when it
contacts a specific DNA sequence, called the
promoter, that contains the
start site for RNA synthesis and signals where RNA synthesis should begin. The
reactions that ensue are outlined in . After binding to the promoter, the
RNA polymerase opens up a local region of the double helix to expose the
nucleotides on a short stretch of DNA on each strand. One of the two exposed DNA
strands acts as a template for complementary base-pairing with incoming
ribonucleoside triphosphate monomers, two of which are joined together by the
polymerase to begin an RNA chain. The RNA polymerase molecule then moves stepwise
along the DNA, unwinding the DNA helix just ahead to expose a new region
of the template strand for complementary base-pairing. In this way the
growing RNA chain is extended by one nucleotide at a time in the
5'-to-3' direction (). The chain elongation process continues until the enzyme encounters
a second special sequence in the DNA, the stop (termination)
signal, where
the polymerase halts and releases both the DNA template and the newly
made RNA chain.
By convention, when a DNA sequence associated with a gene is specified,
it is the sequence of the nontemplate strand that is given, and it is written in
the 5'-to-3' direction. This convention is adopted because the sequence of
the nontemplate strand corresponds to the sequence of the RNA that is made.
Figure 6-4
.
Start and stop signals for RNA synthesis by a bacterial RNA polymerase
Here, the lower strand
of DNA is the template strand, whereas the upper strand corresponds
in sequence to the RNA that is made (note the substitution of U in RNA
for T in DNA). (A) The polymerase begins transcribing at the start site.
Two short sequences (shaded red), about
-35 and -10 nucleotides from the start, determine where
the polymerase binds; close relatives of these two hexanucleotide
sequences, properly spaced from each other, specify the promoter for most E. coli genes. (B) A stop (termination)
signal. The E. coliRNA polymerase stops when it synthesizes a run of
U residues (shaded blue) from a complementary run of A residues
on the template strand, provided that it has just synthesized a
self-complementary RNA nucleotide sequence (shaded
green), which rapidly forms a hairpin helix that is crucial
for stopping transcription. The sequence of nucleotides in the
self-complementary region can vary widely.
Nucleotide sequences that act as start sites and stop signals for the
bacterial RNA polymerase are illustrated in . Nucleotide sequences that
are found in many examples of a particular type of region in DNA (such as a
promoter) are called consensus sequences. In bacteria strong promoters (those
associated with genes that produce large amounts of mRNA) have sequences
that match the promoter consensus sequences closely (as in ),
whereas weak promoters (those associated with genes that produce relatively
small amounts of mRNA) match these sequences less well.
Only Selected Portions of a Chromosome Are Used to Produce RNA
Molecules 2
Figure 6-5
.
DNA unwinding and rewinding by RNA polymerase
A moving RNA polymerase molecule
is continuously unwinding the DNA helix ahead of the polymerization
site while rewinding the two DNA strands behind this site to displace the
newly formed RNA chain. A short region of DNA/RNA helix is therefore
formed only transiently, and the final RNA product is released as a
single-stranded copy of one of the two DNA strands.
As an RNA polymerase molecule moves along the DNA, an RNA/DNA
double helix is formed at the enzyme's active site. This helix is very short because
the RNA just made is displaced, allowing the DNA/DNA helix immediately at the
rear of the polymerase to rewind (). As a result, each completed RNA
chain is released from the DNA template as a free, single-stranded RNA molecule,
typically between 70 and 10,000 nucleotides long.
Figure 6-6
.
RNA polymerase orientation determines which DNA strand serves as template
The DNA strand serving as template must
be traversed from its 3' end to its 5'
end, as illustrated in . Thus the direction of RNA
polymerase movement determines which of the two DNA strands will serve as
a template for the synthesis of RNA, as shown here. Polymerase direction
is, in turn, determined by the orientation of the promoter sequence, where
the RNA polymerase initially binds.
Figure 6-7
.
Directions of transcription along a short portion of a bacterial chromosome
Note that some genes are transcribed from
one DNA strand, while others are transcribed from the other
DNA strand. Approximately 0.2% of the E. colichromosome is depicted here. (Adapted from D.L. Daniels et
al., Science257:771-777, 1992.)
In principle, any region of the DNA double helix could be copied into
two different RNA molecules - one from each of the two DNA strands. In reality,
only one DNA strand is used as a template in each region. The RNA made is
equivalent in nucleotide sequence to the opposite, nontemplate DNA strand. Which
of the two strands is copied varies along the length of a single DNA molecule
and is determined by the promoter of each gene. As illustrated in , a
promoter is an oriented DNA sequence that points the RNA polymerase in one
direction or the other, and this orientation determines which DNA strand is
copied (). The DNA strand that is copied into RNA can be either
different or the same for neighboring genes ().
Both bacterial and eucaryotic RNA polymerases are large, complicated
molecules, with multiple subunits and a total mass of more than 500,000
daltons. Some bacterial viruses, in contrast, encode single-chain RNA polymerases of
one-fifth this mass that catalyze RNA synthesis at least as well as the host-cell
enzyme. Presumably, the multiple subunit composition of the cellular RNA
polymerases is important for various regulatory aspects of cellular RNA synthesis that have
not yet been well defined.
This brief outline of DNA transcription omits many details. Other
complex steps usually must occur before an mRNA molecule is produced. Gene regulatory proteins, for example, help to determine which regions of DNA are
transcribed by the RNA polymerase and thereby play a major part in determining which
proteins are made by a cell. Moreover, although mRNA molecules are produced
directly by DNA transcription in procaryotes, in higher eucaryotic cells most
RNA transcripts are altered extensively - by a process called RNA splicing - before they leave the cell nucleus and enter the cytoplasm as mRNA molecules. All of
these aspects of mRNA production are discussed in Chapters 8 and 9, where we
consider the cell nucleus and the control of gene expression, respectively. For
now, let us assume that functional mRNA molecules have been produced and
proceed to examine how they direct protein synthesis.
Transfer RNA Molecules Act as Adaptors That
Translate Nucleotide Sequences into Protein
Sequences 3
All cells contain a set of transfer RNAs
(tRNAs), each of which is a small RNA molecule (most have a length between 70 and 90 nucleotides). The tRNAs,
by binding at one end to a specific codon in the mRNA and at their other end to
the amino acid specified by that codon, enable amino acids to line up according
to the sequence of nucleotides in the mRNA. Each tRNA is designed to carry
only one of the 20 amino acids used for protein synthesis: a tRNA that carries
glycine is designated tRNAGly and so on. Each of the 20 amino acids has at least one
type of tRNA assigned to it, and most have several tRNAs. Before an amino acid is
incorporated into a protein chain, it is attached by its carboxyl end to the
3' end of an appropriate tRNA molecule. This attachment serves two purposes. First,
and most important, it covalently links the amino acid to a tRNA containing the
correct anticodon - the sequence of three nucleotides that is complementary to
the three-nucleotide codon that specifies that amino acid on an mRNA
molecule. Codon-anticodon pairings enable each amino acid to be inserted into a
growing protein chain according to the dictates of the sequence of nucleotides in
the mRNA, thereby allowing the genetic code to be used to translate nucleotide
sequences into protein sequences. This is the essential "adaptor" function of
the tRNA molecule: with one end attached to an amino acid and the other paired
to a codon, the tRNA converts sequences of nucleotides into sequences of
amino acids.
The second function of the amino acid attachment is to activate the
amino acid by generating a high-energy linkage at its carboxyl end so that it can
react with the amino group of the next amino acid in the protein sequence to form
a peptide bond. The activation process is necessary for protein synthesis
because nonactivated amino acids cannot be added directly to a growing
polypeptide chain. (In contrast, the reverse process, in which a peptide bond is
hydrolyzed by the addition of water, is energetically favorable and can occur spontaneously.)
Figure 6-8
.
The "cloverleaf" structure of tRNA
This is a view
of the molecule shown in after it has been partially
unfolded. There are many different tRNA molecules, including at least one
for each kind of amino acid. Although they differ in nucleotide
sequence, they all have the three stem loops shown plus an amino
acid-accepting arm. The particular tRNA molecule shown binds phenylalanine and
is therefore denoted tRNA
Phe. In all tRNA molecules the amino acid
is attached to the A residue of a CCA sequence at the
3' end of the molecule. Complementary base-pairings are shown by
red bars.
Figure 6-9
.
The folded structure of a typical tRNA molecule
Two views of the three-dimensional
conformation determined by x-ray diffraction are shown. Note that the molecule is
L-shaped; one end is designed to accept the amino acid, while the other
end contains the three nucleotides of the anticodon. Each loop is colored
to match .
The function of a tRNA molecule depends on its precisely folded
three-dimensional structure. A few tRNAs have been crystallized and their
complete structures determined by x-ray diffraction analyses. Both intramolecular
complementary base-pairings and unusual base interactions are required to fold a
tRNA molecule (see
Figure 3-18). The nucleotide sequences of tRNA molecules
from many types of organisms reveal that tRNAs can form the loops and
base-paired stems of a "cloverleaf" structure (), and all are thought to fold
further to adopt the L-shaped conformation detected in crystallographic analyses. In
the native structure the amino acid is attached to one end of the "L," while the
anticodon is located at the other ().
Figure 6-10
.
A few of the unusual nucleotides found in tRNA molecules
These nucleotides
are produced by covalent modification of a normal nucleotide after it has
been incorporated into an RNA chain. In most tRNA molecules about 10%
of the nucleotides are modified (see ).
The nucleotides in a completed nucleic acid chain (like the amino acids
in proteins) can be covalently modified to modulate the biological activity of
the nucleic acid molecule. Such posttranscriptional modifications are
especially common in tRNA molecules, which contain a variety of modified
nucleotides (). Some of the modified nucleotides affect the conformation
and base-pairing of the anticodon and thereby facilitate the recognition of the
appropriate mRNA codon by the tRNA molecule.
Specific Enzymes Couple Each Amino Acid to Its Appropriate tRNA
Molecule 4
Only the tRNA molecule, and not its attached amino acid, determines where
the amino acid is added during protein synthesis. This was established by an
ingenious experiment in which an amino acid (cysteine) was chemically
converted into a different amino acid (alanine) after it was already attached to its
specific tRNA. When such "hybrid" tRNA molecules were used for protein synthesis in
a cell-free system, the wrong amino acid was inserted at every point in the
protein chain where that tRNA was used. Thus the accuracy of protein synthesis is
crucially dependent on the accuracy of the mechanism that normally links
each activated amino acid specifically to its corresponding tRNA molecules.
Figure 6-11
.
Amino acid activation
The two-step process in which an amino acid (with its side
chain denoted by R) is activated for protein synthesis by an
aminoacyl-tRNA synthetase enzyme is shown. As indicated, the energy of
ATP hydrolysis is used to attach each amino acid to its tRNA molecule in
a high-energy linkage. The amino acid is first activated through the
linkage of its carboxyl group directly to an AMP moiety, forming an adenylated amino acid;the linkage of the
AMP, normally an unfavorable reaction, is driven by the hydrolysis of the
ATP molecule that donates the AMP. Without leaving the
synthetase enzyme, the AMP-linked carboxyl group on the amino acid is
then transferred to a hydroxyl group on the sugar at the
3' end of the tRNA molecule. This transfer joins
the amino acid by an activated ester linkage to the tRNA and forms
the final aminoacyl-tRNA molecule. The synthetase enzyme is not shown
in these diagrams.
Figure 6-12
.
The structure of the aminoacyl-tRNA linkage
The carboxyl end of the amino acid forms an ester bond
to ribose. Because the hydrolysis of this ester bond is
associated with a large favorable change in free energy, an amino
acid held in this way is said to be
activated. (A) Schematic drawing of the structure. (B) Actual structure corresponding to
boxed region in (A). As in , the "R-group" indicates
the side chain of the amino acid (see
Panel 2-5, pp. 56-57).
How does a tRNA molecule become covalently linked to the one amino
acid in 20 that is its appropriate partner? The mechanism depends on enzymes
called aminoacyl-tRNA synthetases, which couple each amino acid to its
appropriate set of tRNA molecules. There is a different synthetase enzyme for every
amino acid (20 synthetases in all): one attaches glycine to all
tRNA
Gly molecules, another attaches alanine to all
tRNA
Ala molecules, and so on. The coupling reaction
that creates an aminoacyl-tRNA molecule is catalyzed in two steps, as illustrated
in . The structure of the amino acid-RNA linkage is shown in .
Figure 6-13
.
The recognition of a tRNA molecule by its aminoacyl-tRNA synthetase
For this
tRNA (tRNAGln), specific nucleotides in
both the anticodon (bottom) and the amino acid-accepting arm allow
the correct tRNA to be recognized by the synthetase enzyme
(blue). (Courtesy of Tom Steitz.)
Figure 6-14
.
The genetic code is translated by means of two sequential "adaptors"
The first adaptor is the aminoacyl-tRNA
synth-etase enzyme, which couples a par-ticular amino acid to its
correspond-ing tRNA; the second adaptor is the tRNA molecule, whose anticodon forms base pairs with the
appropriate nucleotide sequence (codon) on
the mRNA. An error in either step will cause the wrong amino acid to
be incorporated into a protein chain.
Although the tRNA molecules serve as the final adaptors in converting
nucleotide sequences into amino acid sequences, the aminoacyl-tRNA
synthetase enzymes are adaptors of equal importance to the decoding process (). Thus the genetic code is translated by two sets of adaptors that act
sequentially, each matching one molecular surface to another with great specificity; it is
their combined action that associates each sequence of three nucleotides in the
mRNA molecule - that is, each codon - with its particular amino acid ().
Amino Acids Are Added to the Carboxyl-Terminal End of a Growing Polypeptide Chain
Figure 6-15
.
The incorporation of an amino acid into a protein
A poly-peptide chain grows by the stepwise addition of amino acids to its
carboxyl-terminal end. The formation of each peptide bond is energetically
favorable because the growing carboxyl terminus has been activated by the
covalent attachment of a tRNA molecule. The peptidyl-tRNA linkage that
activates the growing end is regenerated in each cycle. The amino acid side
chains have been abbreviated as R1,
R2, R3, and R4; as a reference point, all of
the atoms in the second amino acid in the polypeptide chain are shaded gray.
The fundamental reaction of protein synthesis is the formation of a peptide
bond between the carboxyl group at the end of a growing polypeptide chain and a
free amino group on an amino acid. Consequently, a protein is synthesized
stepwise from its amino-terminal end to its carboxyl-terminal end. Throughout the
entire process the growing carboxyl end of the polypeptide chain remains activated
by its covalent attachment to a tRNA molecule (a
peptidyl-tRNA molecule). This high-energy covalent linkage is disrupted in each cycle but is immediately
replaced by the identical linkage on the most recently added amino acid
(). In this way each amino acid added carries with it the activation energy
for the addition of the
next amino acid rather than the energy for its own
addition - an example of the "head growth" type of polymerization described in Chapter
2 (see
Figure 2-36).
The Genetic Code Is Degenerate 5
Figure 6-16
.
Decoding an mRNA molecule
Each amino acid added to the growing end of a polypeptide chain is selected by complementary
base-pairing between the anticodon on its attached tRNA molecule and the
next codon on the mRNA chain.
In the course of protein synthesis, the translation machinery moves in the
5'-to-3' direction along an mRNA molecule and the mRNA sequence is read
three nucleotides at a time. As we have seen, each amino acid is specified by the
triplet of nucleotides (codon) in the mRNA molecule that pairs with a sequence of
three complementary nucleotides at the anticodon tip of a particular tRNA.
Because only one of the many types of tRNA molecules in a cell can base-pair with
each codon, the codon determines the specific amino acid residue to be added to
the growing polypeptide chain end ().
Figure 6-17
.
The genetic code
The standard one-letter abbreviation for each amino acid is presented
below its three-letter abbreviation. Codons are written with the
5'-terminal nucleotide on the left. Note that
most amino acids are represented by more than one codon and that variation
is common at the third nucleotide (see also Figure 3-16).
Since RNA is constructed from four types of nucleotides, there are 64
possible sequences composed of three nucleotides (4 × 4 × 4). Three of these 64 sequences do not code for amino acids but instead specify the termination of
a polypeptide chain; they are known as
stop
codons. That leaves 61 codons to specify only 20 different amino acids. For this reason, most of the amino
acids are represented by more than one codon () and the genetic code
is said to be
degenerate. Two amino acids, methionine and tryptophan, have
only one codon each, and they are the least abundant amino acids in proteins.
The degeneracy of the genetic code implies either that there is more than
one tRNA for each amino acid or that a single tRNA molecule can base-pair with
more than one codon. In fact, both situations occur. For some amino acids there
is more than one tRNA molecule, and some tRNA molecules are constructed so
that they require accurate base-pairing only at the first two positions of the codon
and can tolerate a mismatch (or wobble) at the third. This
wobble base-pairing explains why so many of the alternative codons for an amino acid differ only in
their third nucleotide (see ). The standard wobble pairings make it
possible to fit the 20 amino acids to 61 codons with as few as 31 kinds of tRNA
molecules; in animal mitochondria a more extreme wobble allows protein synthesis
with only 22 tRNAs (discussed in
Chapter 14).
The Events in Protein Synthesis Are Catalyzed
on the Ribosome 6
Figure 6-18
.
The ribosome
A three-dimensional model of the
bacterial ribosome as viewed from two angles. The positions of many
ribosomal proteins in this structure have been determined by using an
electron microscope to visualize the positions where specific antibodies bind,
as well as by measuring the neutron scattering from ribosomes containing
one or more deuterated proteins. (After J.A. Lake, Annu. Rev. Biochem. 54:507-530, 1985. © 1985 by Annual Reviews Inc.)
The protein synthesis reactions just described require a complex catalytic
machinery to guide them. The growing end of the polypeptide chain, for
example, must be kept in register with the mRNA molecule to ensure that each
successive codon in the mRNA engages precisely with the anticodon of a tRNA molecule
and does not slip by one nucleotide, thereby changing the reading frame (see
Figure 3-17). This precise movement and the other events in protein synthesis are
catalyzed by ribosomes, which are large complexes of RNA and protein
molecules. Eucaryotic and procaryotic ribosomes are very similar in design and
function. Both are composed of one large and one small subunit that fit together to
form a complex with a mass of several million daltons (). The small
subunit binds the mRNA and tRNAs, while the large subunit catalyzes peptide
bond formation.
Figure 6-19
.
The structure of the rRNA in the small subunit
This model of E.
coli 16S rRNA is indicative of the complex folding that
underlies the catalytic activities of the RNAs in the ribosome. The 16S rRNA
molecule contains 1540 nucleotides, and it is folded into three domains:
5' (blue), central (red), and
3' (green). (Adapted from S. Stern, B. Weiser, and
H.F. Noller, J. Mol. Biol.
204:447-481, 1988.)
Figure 6-20
.
A comparison of the structures of procaryotic and eucaryotic ribosomes
Ribosomal components are commonly designated by their "S values,"
which indicate their rate of sedimentation in an ultracentrifuge. Despite
the differences in the number and size of their rRNA and protein
components, both types of ribosomes have nearly the same structure and they
function in very similar ways. Although the 18S and 28S rRNAs of the
eucaryotic ribosome contain many extra nucleotides not present in
their bacterial counterparts, these nucleotides are present as
multiple insertions that are thought to protrude as loops and leave the
basic structure of each rRNA largely unchanged.
More than half of the weight of a ribosome is RNA, and there is
increasing evidence that the ribosomal RNA (rRNA) molecules play a central part in its
catalytic activities. Although the rRNA molecule in the small ribosomal subunit
varies in size depending on the organism, its complicated folded structure is
highly conserved (); there are also close homologies between the rRNAs
of the large ribosomal subunits in different organisms. Ribosomes contain a
large number of proteins (), but many of these have been relatively
poorly conserved in sequence during evolution, and a surprising number seem not
to be essential for ribosome function. Therefore, it has been suggested that the
ribosomal proteins mainly enhance the function of the rRNAs and that the
RNA molecules rather than the protein molecules catalyze many of the reactions
on the ribosome.
A Ribosome Moves Stepwise Along the mRNA
Chain 7
Figure 6-21
.
The three major RNA-binding sites on a ribosome
An empty ribosome is shown on the
left and a loaded ribosome on the right. The representation of a
ribosome used here and in the next three figures is highly schematic; for a
more accurate view, see and .
A ribosome contains three binding sites for RNA molecules: one for mRNA
and two for tRNAs. One site, called the peptidyl-tRNA-binding
site, or P-site, holds the tRNA molecule that is linked to the growing end of the polypeptide
chain. Another site, called the aminoacyl-tRNA-binding
site, or A-site, holds the incoming tRNA molecule charged with an amino acid. A tRNA molecule is held
tightly at either site only if its anticodon forms base pairs with a complementary
codon on the mRNA molecule that is bound to the ribosome. The A- and P-sites are
so close together that the two tRNA molecules are forced to form base pairs
with adjacent codons in the mRNA molecule ().
Figure 6-22
.
The elongation phase of protein synthesis on a ribosome
The three-step cycle shown is repeated over and over during the
synthesis of a protein chain. An aminoacyl-tRNA molecule binds to the A-site on
the ribosome in step 1, a new peptide bond is formed in step 2, and the
ribo-some moves a distance of three nucleotides along the mRNA chain in
step 3, ejecting an old tRNA molecule and "resetting" the ribosome so that
the next aminoacyl-tRNA molecule can bind. As indicated in , the
P-site is drawn on the left side of the ribosome, with the A-site on the right.
The process of polypeptide chain elongation on a ribosome can be
thought of as a cycle with three discrete steps ():
1. In step 1, an aminoacyl-tRNA molecule becomes bound to a vacant
ribosomal A-site (adjacent to an occupied P-site) by forming base pairs with
the three mRNA nucleotides (codon) exposed at the A-site.
2. In step 2, the carboxyl end of the polypeptide chain is uncoupled from
the tRNA molecule in the P-site and joined by a peptide bond to the amino
acid linked to the tRNA molecule in the A-site. This central reaction of
protein synthesis (see ) is catalyzed by a
peptidyl transferase enzyme. Recent experiments with ribosomes that have been experimentally
stripped of proteins show that this catalysis is mediated not by a protein but by
a specific region of the major rRNA molecule in the large subunit (see
Figure 3-23).
3. In step 3, the new peptidyl-tRNA in the A-site is translocated to the
P-site as the ribosome moves exactly three nucleotides along the mRNA
molecule. This step requires energy and is driven by a series of
conformational changes induced in one of the ribosomal components by the hydrolysis
of a GTP molecule.
As part of the translocation process of step 3, the free tRNA molecule that
was generated in the P-site during step 2 is released from the ribosome to reenter
the cytoplasmic tRNA pool. Upon completion of step 3, the unoccupied A-site is
free to accept a new tRNA molecule linked to the next amino acid, which starts
the cycle again. In a bacterium each cycle requires about
1/20th of a second under optimal conditions, so that the complete synthesis of an average-sized
protein of 400 amino acids is accomplished in about 20 seconds. Ribosomes move
along an mRNA molecule in the 5'-to-3' direction, which is also the direction of
RNA synthesis (see ).
Figure 6-31
.
Kinetic proofreading selects for the correct tRNA molecule on the ribosome
This more detailed view of step 1 of the elongation
phase of protein synthesis shows how, in the initial binding event, an
aminoacyl-tRNA molecule that is tightly bound to an elongation factor
pairs transiently with the codon at the A-site. This pairing triggers
GTP hydrolysis by the elongation factor, enabling the factor to dissociate
from the aminoacyl-tRNA molecule, which can now participate in
chain elongation (see ). A delay between aminoacyl tRNA binding
and its availability for protein synthesis is thereby inserted into the
protein synthesis mechanism. As a result, only those tRNAs with the
correct anticodon are likely to remain paired to the mRNA long enough to
be added to the growing polypeptide chain.
The elongation factor, which is an abundant protein, is called EF-Tu
in procaryotes and EF-1 in eucaryotes. The dramatic change in the
three-dimensional structure of EF-Tu that is caused by GTP hydrolysis
is illustrated in Figure 5-20.
In most cells protein synthesis consumes more energy than any other
biosynthetic process. At least four high-energy phosphate bonds are split to
make each new peptide bond: two of these are required to charge each tRNA
molecule with an amino acid (see ), and two more drive steps in the cycle
of reactions occurring on the ribosome during synthesis itselfone for
the aminoacyl-tRNA binding in step 1 (see ) and one for the
ribosome translocation in step 3.
A Protein Chain Is Released from the Ribosome
When Any One of Three Stop Codons Is Reached 8
Figure 6-23
.
The final phase of protein synthesis
The binding of release factor to a stop
codon terminates translation. The completed polypeptide is
released, and the ribosome dissociates into its two separate subunits.
Of the 64 possible codons in an mRNA molecule, 3 (UAA, UAG, and UGA) are
stop codons, which terminate the translation process. Cytoplasmic proteins
called
release factors bind directly to any stop codon that reaches the A-site on the
ribosome. This binding alters the activity of the peptidyl transferase, causing it
to catalyze the addition of a water molecule instead of an amino acid to
the peptidyl-tRNA. This reaction frees the carboxyl end of the growing
polypeptide chain from its attachment to a tRNA molecule, and since only this
attachment normally holds the growing polypeptide to the ribosome, the completed
protein chain is immediately released into the cytoplasm. The ribosome releases
the mRNA and dissociates into its two separate subunits (), which
can assemble on another mRNA molecule to begin a new round of protein
synthesis by the process to be described next.
The Initiation Process Sets the Reading Frame
for Protein Synthesis 9
In principle, an RNA sequence can be translated in any one of three
reading frames, each of which will specify a completely different polypeptide chain
(see Figure 3-17). Which of the three frames is actually read is determined by the
RNA sequence, which determines how the ribosome assembles. During the initiation phase of protein synthesis, the two subunits of the ribosome are brought
together at the exact spot on the mRNA where the polypeptide chain is to begin.
The initiation process is complicated, involving a number of steps
catalyzed by proteins called initiation factors
(IFs), many of which are themselves composed of several polypeptide chains. Because the process is so complex, many
of the details of initiation are still uncertain. It is clear, however, that each
ribosome is assembled onto an mRNA chain in two steps: only after the small
ribosomal subunit loaded with initiation factors finds the start codon (AUG, see below)
does the large subunit bind.
Before a ribosome can begin a new protein chain, it must bind an
aminoacyl tRNA molecule in its P-site, where normally only peptidyl tRNA molecules
are bound. (As explained previously, the peptidyl tRNA is translocated to the
P-site during step 3 of the elongation reaction.) A special tRNA molecule is required
for this purpose. This initiator tRNA provides the amino acid that starts a
protein chain, and it always carries methionine (aminoformyl methionine in
bacteria). In eucaryotes the initiator tRNA molecule must be loaded onto the small
ribosomal subunit before this subunit can bind to an mRNA molecule. An
initiation factor called eucaryotic initiation factor 2
(eIF-2) is required to position the initiator tRNA on the small subunit. One molecule of eIF-2 becomes tightly
bound to each initiator tRNA molecule as soon as this tRNA acquires its methionine,
and in some cells the overall rate of protein synthesis is controlled by this factor
(see Figure 9-82).
Figure 6-24
.
The initiation phase of protein synthesis in eucaryotes
Step 1 and step 2 refer to steps in
the elongation reaction shown in .
Figure 6-25
.
A three-dimensional model of a functioning bacterial ribosome
The small (dark green) subunit and the large
(light green) subunit form a complex
through which the messenger RNA is threaded. Although the exact paths
of the mRNA and the nascent polypeptide chain are unknown,
the addition of amino acids occurs in the general region shown, with the
tRNAs held in the pocket formed between the large and small subunit.
(Modified from J.A. Lake, Annu. Rev.
Biochem. 54:507-530, 1985. © 1985 by
Annual Reviews Inc.)
As described in more detail in the next section, the small ribosomal
subunit helps its bound initiator tRNA molecule find a special AUG codon (the
start codon) on an mRNA molecule. Once this has occurred, the several initiation
factors that were previously associated with the small ribosomal subunit are
discharged to make way for the binding of a large ribosomal subunit to the
small one. Because the initiator tRNA molecule is bound to the P-site of the
ribosome, the synthesis of a protein chain can begin directly with the binding of a
second aminoacyl-tRNA molecule to the A-site of the ribosome (). Thus
a complete functional ribosome is assembled, with the mRNA molecule
threaded through it (). Further steps in the elongation phase of protein
synthesis then proceed as described previously (see ). Because an
initiator tRNA molecule has begun each polypeptide chain, all newly made proteins
have a methionine (or the aminoformyl derivative of methionine in bacteria) as
their amino-terminal residue. The methionine is often removed shortly after its
incorporation by a specific aminopeptidase; this trimming process is important
because the amino acid left at the amino terminus can determine the protein's
lifetime in the cell by its effects on a ubiquitin-dependent
protein-degradation pathway (see
Figure 5-39).
Evidently the correct initiation site on the mRNA molecule must be
selected by the small subunit acting in concert with initiation factors but in the
absence of the large subunit. This requirement helps to explain why all ribosomes
are formed from two separate subunits. We shall now consider how the correct
start codon is selected.
Only One Species of Polypeptide Chain Is Usually Synthesized from Each mRNA Molecule in
Eucaryotes 10
A messenger RNA molecule will typically contain many AUG sequences, each
of which can code for methionine. In eucaryotes, however, only one of these
AUG sequences will normally be recognized by the initiator tRNA and thereby
serve as a start codon. How does the ribosome distinguish this start codon?
Figure 6-26
.
The structure of the
cap at the 5 ' end of eucaryotic mRNA molecules
Note the unusual 5'-to-5' linkage to the positively charged
7-methylguanosine and the methylation of the
2' hydroxyl group on the first ribose sugar in the
RNA. (The second sugar may or may not be methylated.)
Eucaryotic RNAs (except those that are synthesized in mitochondria
and chloroplasts) are extensively modified in the nucleus immediately after their
transcription (discussed in
Chapter 8). Two general modifications are the addition
of a unique "cap" structure, composed of a 7-methylguanosine residue linked to
a triphosphate at the 5' end () and the addition of a run of about
200 adenylic residues ("poly A") at the
3' end. What part the poly A plays in the
translation process is uncertain (see
Figure 9-87), but the
5' cap structure is essential for efficient protein synthesis. Experiments carried out with extracts of
eucaryotic cells have shown that the small ribosomal subunit first binds at the
5' end of an mRNA chain, aided by recognition of the
5' cap (see ). This subunit then propels itself along the mRNA chain in a scanning mode, carrying
its bound initiator tRNA in a search of an AUG start codon. The requirements for
a start codon apparently are not very stringent, since the small subunit
usually selects the first AUG it encounters; however, a few nucleotides in addition to
the AUG are also important for the selection process. For most eucaryotic RNAs,
once a start codon near the 5' end has been selected, none of the many other
AUG codons farther down the chain will serve as initiation sites. As a result, only
a single species of polypeptide chain is usually synthesized from an mRNA
molecule (for exceptions see p. 467).
Figure 6-27
.
A comparison of the structures of procaryotic and eucaryotic messenger RNA molecules
Although both mRNAs are synthesized with a
triphosphate group at the 5' end, the
eucaryotic RNA molecule immediately acquires a
5' cap, which is part of the structure recognized by the small
ribosomal subunit. Protein synthesis therefore begins at a start codon near the
5' end of the mRNA (see ). In procaryotes, by contrast, the
5' end has no special significance, and there can be multiple
ribosome-binding sites (called
Shine-Dalgarno
sequences) in the interior of an mRNA chain, each resulting in the
synthesis of a different protein.
The mechanism for selecting a start codon in bacteria is different.
Bacterial mRNAs have no 5' cap structure. Instead, they contain a specific
ribosome-binding site sequence, up to six nucleotides long, which can occur at several
places in the same mRNA molecule. These sequences are located four to seven
nucleo-tides upstream from an AUG, and they form base pairs with a specific region
of the rRNA in a ribosome to signal the initiation of protein synthesis at this
nearby start codon. Bacterial ribosomes, unlike eucaryotic ribosomes, bind directly
to start codons in the interior of an mRNA molecule to initiate protein synthesis.
As a result, bacterial messenger RNAs are commonly
polycistronicthat is, they encode multiple proteins that are separately translated from the same mRNA
molecule. Eucaryotic mRNAs, in contrast, are typically
monocistronic, with only one species of polypeptide chain being translated per messenger molecule ().
The Binding of Many Ribosomes to an Individual
mRNA Molecule Generates Polyribosomes 11
Figure 6-28
.
A polyribosome
Schematic drawing showing how a series of ribosomes can
simul-taneously translate the same mRNA molecule.
Figure 6-29
.
Freeze-etch (A) and transmission (B) electron micrographs of typical polyribosomes in a eucaryotic cell
The cell cytoplasm is generally crowded with such polyribosomes, some free in the cytosol
and some membrane-bound. (A, courtesy of John Heuser; B, courtesy of
George Palade.)
Figure 6-30
.
The isolation of polyribosomes
Polyribosomes are separated from single ribosomes
(and their subunits) by sedimentation in a centrifuge. This method is based
on the fact that large molecular aggregates move faster than
small ones in a strong gravitational field. Generally, the sedimentation is
done through a gradient of sucrose to stabilize the solution
against convective mixing. Note that most of the growing polypeptide chains
(red line) are associated with the polyribosomes.
The complete synthesis of a protein takes 20 to 60 seconds on average. But
even during this very short period, multiple initiations usually take place on
each mRNA molecule being translated. A new ribosome hops onto the
5' end of the mRNA molecule almost as soon as the preceding ribosome has translated
enough of the amino acid sequence to be out of the way. Such mRNA molecules are
thus found in the cell as polyribosomes, or
polysomes, formed by several ribosomes spaced as close as 80 nucleotides apart along a single messenger molecule
( and ). Polyribosomes are a common feature of cells. They can
be isolated and separated from single ribosomes in the cytosol by
ultracentrifugation after cell lysis (). The mRNA purified from these
polyribosomes can be used to determine if the protein encoded by a particular DNA
sequence is being actively synthesized in the cells used to prepare the polyribosomes.
These mRNA molecules can also serve as the starting material for the preparation
of specialized cDNA libraries (discussed in
Chapter 7).
In eucaryotes the nuclear envelope keeps transcription and protein
synthesis separate. But in procaryotes, RNA is accessible to ribosomes as soon as it is
made. Thus, ribosomes will begin synthesizing a polypeptide chain at the
5' end of a nascent mRNA molecule and then follow behind the RNA polymerase as it
completes an mRNA chain.
The Overall Rate of Protein Synthesis in Eucaryotes
Is Controlled by Initiation Factors 12
As we discuss in Chapter 17, the cells in a multicellular organism proliferate
only when they are stimulated to do so by specific growth factors. Although
the mechanisms by which growth factors act are incompletely understood, one
of their major effects must be to increase the overall rate of protein synthesis,
for cells must double their contents before they divide. What determines the rate
of protein synthesis? When eucaryotic cells in culture are starved of nutrients,
there is a marked reduction in the rate of polypeptide chain initiation. This is the
result of inactivation of the protein synthesis initiation factor eIF-2 (see Figure 9-82). The initiation factors required for protein synthesis are much more
numerous and complex in eucaryotes than in procaryotes, even though they
perform the same basic functions. Many of the extra components may be regulatory
proteins that respond to growth factors and help coordinate cell growth and
proliferation in multicellular organisms. Less complex controls are needed in
bacteria, which generally grow as fast as the nutrients in their environment allow.
The Fidelity of Protein Synthesis Is Improved
by Two Proofreading Mechanisms 13
The error rate in protein synthesis can be estimated by monitoring the
frequency of incorporation of an amino acid into a protein that normally lacks that
amino acid. Error rates of about 1 amino acid misincorporated for every
10
4 amino acids polymerized are observed, which means that only about 1 in every 25
protein molecules of average size (400 amino acids) should contain an error. The
fidelity of protein synthesis depends on the accuracy of the two adaptor
mechanisms previously discussed: the linking of each amino acid to its corresponding
tRNA molecule and the base-pairing of the codons in mRNA to the anticodons in
tRNA (see ). Not surprisingly, cells have evolved "proofreading"
mechanisms to reduce the number of errors in both these crucial steps of protein synthesis.
Proofreading during DNA replication
Two fundamentally different proofreading mechanisms are used, each
representative of strategies used in other processes in the cell. Both involve
expenditure of free energy, since, as discussed in
Chapter 2, a price must be paid
for any increase in order in the cell. A relatively simple mechanism is used to
improve the accuracy of amino acid attachment to tRNA. Many aminoacyl tRNA
synthetases have two active sites, one that carries out the loading reaction
shown earlier () and one that recognizes an incorrect amino acid
attached to its tRNA molecule and removes it by hydrolysis. The correction process
is energetically costly because to be effective it must remove an appreciable
fraction of correctly attached amino acids as well. The same type of costly
two-step proofreading process is used in DNA replication (see ).
A more subtle "kinetic proofreading" mechanism is used to improve the
fidelity of codon-anticodon pairing. Thus far we have given a simplified
account of this pairing. In fact, once tRNA molecules have acquired an amino acid,
they form a complex with an abundant protein called an
elongation factor (EF), which binds tightly to both the amino acid end of a tRNA and to a molecule of GTP.
It is this complex, and not free tRNA, that pairs with the appropriate codon in
an mRNA molecule. The bound elongation factor allows correct
codon-anticodon pairing to occur but prevents the amino acid from being incorporated into
the growing polypeptide chain. The initial codon recognition, however, triggers
the elongation factor to hydrolyze its bound GTP (to GDP and inorganic
phosphate), whereupon the factor dissociates from the ribosome without its tRNA,
allowing protein synthesis to proceed. The elongation factor thereby introduces a
short delay between codon-anticodon base-pairing and polypeptide chain
elongation, which provides an opportunity for the bound tRNA molecule to exit from
the ribosome. An incorrect tRNA molecule forms a smaller number of
codon-anticodon hydrogen bonds than a correct one; it therefore binds more weakly to
the ribosome and is more likely to dissociate during this period. Because the
delay introduced by the elongation factor causes most incorrectly bound tRNA
molecules to leave the ribosome without being used for protein synthesis, this
factor increases the ratio of correct to incorrect amino acids incorporated into
protein ().
Many Inhibitors of Procaryotic Protein Synthesis
Are Useful as Antibiotics 14
Table 6-1
Inhibitors of Protein or RNA Synthesis
| Acting Only on Procaryotes* |
| Tetracycline | blocks binding of aminoacyl-tRNA to A-site of ribosome |
| Streptomycin | prevents the transition from initiation complex to chain-elongating ribosome
and also causes miscoding |
| Chloramphenicol | blocks the peptidyl transferase reaction on ribosomes (step 2 in Figure 6-22) |
| Erythromycin | blocks the translocation reaction on ribosomes (step 3 in Figure 6-22) |
| Rifamycin | blocks initiation of RNA chains by binding to RNA polymerase (prevents RNA
synthesis) |
| Acting on Procaryotes and Eucaryotes |
| Puromycin | causes the premature release of nascent polypeptide chains by its addition to
growing chain end |
| Actinomycin D | binds to DNA and blocks the movement of RNA polymerase (prevents RNA synthesis) |
| Acting Only on Eucaryotes |
| Cycloheximide | blocks the translocation reaction on ribosomes (step 3 in Figure 6-22) |
| Anisomycin | blocks the peptidyl transferase reaction on ribosomes (step 2 in Figure 6-22) |
| α-Amanitin | blocks mRNA synthesis by binding preferentially to RNA polymerase II |
Many of the most effective antibiotics used in modern medicine are
compounds made by fungi that act by inhibiting bacterial protein synthesis. A number
of these drugs exploit the structural and functional differences between
procaryotic and eucaryotic ribosomes so as to interfere with the function of procaryotic
ribosomes preferentially. Thus some of these compounds can be taken in
high doses without undue toxicity to humans. Because different antibiotics bind to
different regions of bacterial ribosomes, they often inhibit different steps in the
synthetic process. Some of the more common antibiotics of this kind are listed
in
Table 6-1 along with several other commonly used inhibitors of protein
synthesis, some of which act on eucaryotic cells and therefore cannot be used as antibiotics.
Because they block specific steps in the processes that lead from DNA
to protein, many of the compounds listed in
Table 6-1 are useful for cell
biological studies. Among the most commonly used drugs in such experimental
studies are
chloramphenicol, cycloheximide, and
puromycin, all of which specifically inhibit protein synthesis. In a eucaryotic cell, for example, chloramphenicol
inhibits protein synthesis on ribosomes only in mitochondria (and in
chloroplasts in plants), presumably reflecting the procaryotic origins of these organelles
(discussed in
Chapter 14). Cycloheximide, on the other hand, affects only
ribosomes in the cytosol. The difference in the sensitivity of protein synthesis to these
two drugs provides a powerful way to determine in which cell compartment a
particular protein is translated. Puromycin is especially interesting because it is
a structural analogue of a tRNA molecule linked to an amino acid; the
ribosome mistakes it for an authentic amino acid and covalently incorporates it at the
carboxyl terminus of the growing polypeptide chain, thereby causing the
premature termination and release of the polypeptide (see
Figure 3-23). As might be
expected, puromycin inhibits protein synthesis in both procaryotes and eucaryotes.
How Did Protein Synthesis Evolve? 15
The molecular processes underlying protein synthesis seem inexplicably
complex. Although we can describe many of them, they do not make conceptual sense
in the way that DNA transcription, DNA repair, and DNA replication do. As we
have seen, protein synthesis in present-day organisms centers on the ribosome,
which consists of proteins arranged around a core of rRNA molecules. Why should
rRNA molecules exist at all, and how did they come to play such a dominant part in
the structure and function of the ribosome?
Before the discovery of mRNA in the early 1960s, it was suspected that
the large amount of RNA in ribosomes served a "messenger" function, carrying
genetic information from DNA to proteins. Now we know, however, that all of
the ribosomes in a cell contain an identical set of rRNA molecules that have no
such informational role. In bacterial ribosomes, rRNA molecules have been shown
to have catalytic functions in protein synthesis. As mentioned earlier, the
major rRNA of the large ribosomal subunit appears to be the peptidyl transferase; in
addition, the rRNA of the small ribosomal subunit forms a short base-paired
helix with the initiation site sequence on bacterial mRNA molecules, positioning
the neighboring AUG start codon at the P-site. A variety of specific base-pair
interactions likewise form between tRNA molecules and bacterial rRNAs,
although these interactions involve individual bases on the rRNA that are far apart in
the nucleotide sequence, suggesting complex sets of interactions that depend on
the tertiary structure of the rRNA.
Protein synthesis also relies heavily on a large number of proteins that
are bound to the rRNAs in a ribosome (see ). The complexity of a
process with so many interacting components has made many biologists despair of
ever understanding the pathway by which protein synthesis evolved. The
discovery that RNA molecules can act as enzymes, however, has provided a new way
of viewing the pathway. As discussed in
Chapter 1, early biological reactions
probably used RNA molecules rather than protein molecules as catalysts. In the
earliest cells tRNA molecules on their own may have formed catalytic surfaces
that allowed them to bind and activate specific amino acids without
requiring aminoacyl-tRNA synthetase enzymes. Likewise, rRNA molecules may have
served by themselves as the entire "ribosome," folding up in complex ways to
generate an intricate set of surfaces that both guided tRNA pairings with mRNA
codons and catalyzed the polymerization of the tRNA-linked amino acids (see
Figure 1-7). Over the course of evolution individual proteins have been added to
this machinery, each one making the process a little more accurate and efficient,
or adding regulatory controls. In this view the large amount of RNA in
present-day ribosomes is a remnant of a very early stage in evolution, before proteins
dominated biological catalysis.
Summary
Before the synthesis of a particular protein can begin, the corresponding
mRNA molecule must be produced by DNA transcription. Then a small ribosomal
subunit binds to the mRNA molecule at a start codon (AUG) that is recognized by a
unique initiator tRNA molecule. A large ribosomal subunit binds to complete the
ribosome and initiate the elongation phase of protein synthesis. During this phase
aminoacyl tRNAs, each bearing a specific amino acid, sequentially bind to the appropriate
codon in mRNA by forming complementary base pairs with the tRNA anticodon.
Each amino acid is added to the carboxyl-terminal end of the growing polypeptide
by means of a cycle of three sequential steps: aminoacyl-tRNA binding, followed by
peptide bond formation, followed by ribosome translocation. The ribosome
progresses from codon to codon in the
5'-to-3' direction along the mRNA molecule until one
of three stop codons is reached. A release factor then binds to the stop codon,
terminating translation and releasing the completed polypeptide from the ribosome.
Eucaryotic and procaryotic ribosomes are highly homologous, despite
substantial differences in the number and size of their rRNA and protein components. The
predominant role of rRNA in ribosome structure and function is likely to reflect
the ancient origin of protein synthesis, which is thought to have evolved in an
environment dominated by RNA-mediated catalysis.
DNA Repair 16
Introduction
The long-term survival of a species may be enhanced by genetic changes, but
the survival of the individual demands genetic stability. Maintaining genetic
stability requires not only an extremely accurate mechanism for replicating the
DNA before a cell divides, but also mechanisms for repairing the many accidental
lesions that occur continually in DNA. Most such spontaneous changes in DNA
are temporary because they are immediately corrected by processes
collectively called DNA repair. Only rarely do the cell's DNA maintenance processes fail
and allow a permanent change in the DNA. Such a change is called a
mutation, and it can destroy an organism if the change occurs in a vital position in the DNA
sequence.
Before examining the mechanisms of DNA repair, we briefly discuss
the maintenance of DNA sequences from one generation to the next.
DNA Sequences Are Maintained with Very High
Fidelity 17
The rate at which stable changes occur in DNA sequences (the mutation rate) can be estimated only indirectly. One way is to compare the amino acid sequence
of the same protein in several species. The fraction of the amino acids that are
different can then be compared with the estimated number of years since each
pair of species diverged from a common ancestor, as determined from the
fossil record. In this way one can calculate the number of years that elapse, on
average, before an inherited change in the amino acid sequence of a protein
becomes fixed in the species. Because each such change will commonly reflect the
alteration of a single nucleotide in the DNA sequence of the gene encoding that
protein, this value can be used to estimate the average number of years required
to produce a single, stable mutation in the gene.
Such calculations always will substantially underestimate the actual
mutation rate because most mutations will spoil the function of the protein and
vanish from the population through natural selection. But there is one family of
proteins whose sequence does not seem to matter, and so the genes that encode them
can accumulate mutations without being selected against. These proteins are
the fibrinopeptides - 20-residue-long fragments that are discarded from the
protein fibrinogen when it is activated to form fibrin during blood clotting. Since the function of fibrinopeptides apparently does not depend on their amino acid
sequence, they can tolerate almost any amino acid change. Sequence analysis
of the fibrinopeptides indicates that an average-sized protein 400 amino acids
long would be randomly altered by an amino acid change roughly once every
200,000 years. More recently, DNA sequencing technology has made it possible to
compare corresponding nucleotide sequences in regions of the genome that do
not code for protein. Comparisons of such sequences in several mammalian
species produce estimates of the mutation rate during evolution that are in
excellent agreement with those obtained from the fibrinopeptide studies.
The Observed Mutation Rates in Proliferating Cells
Are Consistent with Evolutionary Estimates 18
The mutation rate can be estimated more directly by observing the rate at
which spontaneous genetic changes arise in a large population of cells followed over
a relatively short period of time. This can be done either by estimating the
frequency with which new mutants arise in very large animal populations (in
a colony of fruit flies or mice, for example) or by screening for changes in
specific proteins in cells growing in culture. Although they are only approximate,
the numbers obtained in both cases are consistent with an error frequency of 1
base-pair change in roughly 109 base pairs for each cell generation. Consequently,
a single gene that encodes an average-sized protein (containing about
103 coding base pairs) would suffer a mutation once in about
106 cell generations. This number is at least roughly consistent with the evolutionary estimate described
above, in which one mutation appears in an average gene in the germ line every
200,000 years.
Most Mutations in Proteins Are Deleterious
and Are Eliminated by Natural Selection 19
Figure 6-32
.
Different proteins evolve at very different rates
A comparison of the rates of amino
acid change found in hemoglobin, cytochrome c, and the
fibrinopeptides. Hemoglobin and cytochrome c have changed much more slowly
during evolution than the fibrinopeptides. In determining rates of change per
year (as in
Table 6-2), it is important to realize that two species that
diverged from a common ancestor 100 million years ago are separated by 200
million years of evolutionary time.
Table 6-2
Observed Rates of Change of the Amino Acid Sequences in Four Proteins over Evolutionary Time
| Fibrinopeptide | 0.7 |
| Hemoglobin | 5 |
| Cytochrome c | 21 |
| Histone H4 | 500 |
When the number of amino acid differences in a particular protein is plotted
for several pairs of species against the time since the species diverged, the result
is a reasonably straight line. That is, the longer the period since divergence,
the larger the number of differences. For convenience, the slope of this line can
be expressed in terms of the "unit evolutionary time" for that protein, which is
the average time required for 1 amino acid change to appear in a sequence of
100 amino acid residues. When various proteins are compared, each shows a
different but characteristic rate of evolution (). Since all DNA base pairs
are thought to be subject to roughly the same rate of random mutation, these
different rates must reflect differences in the probability that an organism with a
random mutation over the given protein will survive and propagate. Changes
in amino acid sequence are evidently much more harmful for some proteins
than for others. From
Table 6-2 we can estimate that about 6 of every 7 random
amino acid changes are harmful over the long term in hemoglobin, about 29 of
every 30 amino acid changes are harmful in cytochrome c, and virtually all amino
acid changes are harmful in histone H4. We assume that individuals who
carried such harmful mutations have been eliminated from the population by natural
selection.
Low Mutation Rates Are Necessary for Life as We Know
It 19
Since most mutations are deleterious, no species can afford to allow them
to accumulate at a high rate in its germ cells. We discuss later why the
observed mutation frequency, low though it is, nevertheless, is thought to limit the
number of essential proteins that any organism can encode in its germ line to
about 60,000. By an extension of the same arguments, a mutation frequency
tenfold higher would limit an organism to about 6000 essential proteins. In this
case evolution would probably have stopped at an organism no more complex
than a fruit fly.
While germ cells must be protected against high rates of mutation in
order to maintain the species, the other cells of a multicellular organism (its somatic cells) must be protected from genetic change to safeguard the individual.
Nucleotide changes in somatic cells can give rise to variant cells, some of
which, through a process of natural selection, grow rapidly at the expense of the rest
of the organism. In the extreme case the uncontrolled cell proliferation known
as cancer results, which is responsible for about 30% of the deaths that occur in
Europe and North America. These deaths are due largely to the accumulation
of changes in the DNA sequences of somatic cells (discussed in Chapter 24). A
tenfold increase in the mutation frequency would presumably cause a
disastrous increase in the incidence of cancer by accelerating the rate at which somatic
cell variants arise. Thus, both for the perpetuation of a species with 60,000
proteins (germ cell stability) and for the prevention of cancer resulting from mutations
in somatic cells (somatic cell stability), eucaryotes depend on the remarkably
high fidelity with which DNA sequences are maintained.
Low Mutation Rates Mean That Related Organisms
Must Be Made from Essentially the Same
Proteins 20
Humans, as a genus distinct from the great apes, have existed for only a
few million years. Each human gene has therefore had the chance to
accumulate relatively few nucleotide changes since our inception, and most of these
have been eliminated by natural selection. A comparison of humans and monkeys,
for example, shows that their cytochrome c molecules differ in about 1% and
their hemoglobins in about 4% of their amino acid positions. Clearly, a great deal
of our genetic heritage must have been formed long before Homo sapiens appeared, during the evolution of mammals (which started about 300 million years ago)
and even earlier. Because the proteins of mammals as different as whales and
humans are very similar, the evolutionary changes that have produced such striking
morphological differences must involve relatively few changes in the molecules
from which we are made. Instead, it is thought that the morphological differences
arise from differences in the temporal and spatial pattern of gene expression
during embryonic development, which then determine the size, shape, and other
characteristics of the adult. At the end of Chapter 8 we discuss the mechanisms
that are thought to underlie such evolutionary changes in gene expression.
If Left Uncorrected, Spontaneous DNA Damage Would Rapidly Change DNA
Sequences 21
The physicist Erwin Schroedinger pointed out in 1945 that, whatever its
chemical nature (at that time unknown), a gene must be extremely small and
composed of few atoms. Otherwise the very large number of genes thought to be
necessary to generate an organism would not fit in the cell nucleus. On the other
hand, because it was so small, a gene would be expected to undergo significant
changes as a result of spontaneous reactions induced by random thermal collisions
with solvent molecules. This poses a serious dilemma, since genetic data imply
that genes are composed of a remarkably stable substance in which
spontaneous changes (mutations) occur rarely.
Figure 6-33
.
Deamination and depurination
These hydrolytic reactions are the two most
frequent spontaneous chemical reactions known to create serious DNA
damage in cells. Only a single example is shown for each type of reaction.
(See also .)
Figure 6-34
.
A summary of spontaneous alterations likely to require DNA repair
(A) The sites on each nucleotide that are known to be modified
by spontaneous oxidative damage (
red arrows), hydrolytic attack
(
blue arrows), and uncontrolled methylation by the methyl group donor
S-adenosyl-methionine (
green arrows) are indicated, with the size of each
arrow indicating the relative frequency of each event. The two most
frequent types of hydrolytic events are illustrated in more detail in .
(B) The thymine dimer, a type of damage introduced into DNA in cells
that are exposed to ultraviolet irradiation (as in sunlight). A similar dimer
will form between any two neighboring pyrimidine bases (C or T residues)
in DNA. (A, after T. Lindahl,
Nature 362:709-715, 1993. © 1993
Macmillan Magazines Ltd.)
This dilemma is real. DNA does undergo major changes as a result of
thermal fluctuations. We now know, for example, that about 5000 purine bases
(adenine and guanine) are lost per day from the DNA of each human cell
because of the thermal disruption of their
N-glycosyl linkages to deoxyribose
(
depurination). Similarly, spontaneous
deamination of cytosine to uracil in DNA is estimated to occur at a rate of 100 bases per genome per day (). DNA bases are also subject to change by reactive metabolites (including
reactive forms of oxygen) that can alter their base-pairing abilities and by ultraviolet
light from the sun, which promotes a covalent linkage of two adjacent
pyrimidine bases in DNA (forming, for example, the
thymine dimers shown in ). These are only a few of many changes that can occur in our DNA (). Most of them would be expected to lead either to deletion of one or more
base pairs in the daughter DNA chain after DNA replication or to a base-pair
substitution (each C →
U deamination, for example, would eventually change a
C-G base pair to a T-A base pair, since U closely resembles T and forms a
complementary base pair with A). As we have seen, a high rate of such random changes
would have disastrous consequences for an organism.
The Stability of Genes Depends on DNA Repair 22
Figure 6-35
.
DNA repair
The three steps common to most types of
repair are excision (step 1), resynthesis (step 2), and ligation (step 3). In step 1
the damage is excised; in steps 2 and 3 the original DNA sequence is
restored.
DNA polymerasefills in the gap created by the excision events, and
DNA ligaseseals the nick left in the repaired strand. Nick sealing consists of
the re-formation of a broken phosphodiester bond (see ).
Despite the thousands of random changes created every day in the DNA of
a human cell by heat energy and metabolic accidents, only a few stable
changes (mutations) accumulate in the DNA sequence of an average cell in a year. We
now know that fewer than one in a thousand accidental base changes in DNA
causes a mutation; the rest are eliminated with remarkable efficiency by
DNA repair. There are a variety of repair mechanisms, each catalyzed by a different set
of enzymes. Nearly all of these mechanisms depend on the existence of two
copies of the genetic information, one in each strand of the DNA double helix: if
the sequence in one strand is accidentally changed, information is not lost
irretrievably because a complementary copy of the altered strand remains in the
sequence of nucleotides in the other strand. The basic pathway for DNA repair
is illustrated schematically in . As indicated, it involves three steps:
1. The altered portion of a damaged DNA strand is recognized and
removed by enzymes called DNA repair
nucleases, which hydrolyze the phospho-diester bonds that join the damaged nucleotides to the rest of the
DNA molecule, leaving a small gap in the DNA helix in this region.
2. Another enzyme, DNA polymerase, binds to the
3'-OH end of the cut DNA strand and fills in the gap by making a complementary copy of the
information stored in the "good"
(template) strand.
3. The break or "nick" in the damaged strand left when the DNA
polymerase has filled in the gap is sealed by a third type of enzyme,
DNA ligase, which completes the restoration process.
Figure 6-36
.
The DNA polymerase enzyme
(A) The reaction catalyzed by DNA polymerase. This
enzyme catalyzes the stepwise addition of a deoxyribonucleotide to the
3'-OH end of a polynucleotide chain (the
primer strand) that is paired to a
second,
template strand. The new DNA strand therefore grows in the
5'-to-3' direction. Because each
incoming deoxyribonucleoside triphosphate must pair with the template strand
in order to be recognized by the polymerase, this strand
determines which of the four possible deoxyribonucleotides (A, C, G, or
T) will be added. As in the case of RNA polymerase, the reaction is driven
by a large favorable free-energy change (see ). (B) The structure
of an
E. coli DNA polymerase molecule has been determined by
x-ray crystallography. This drawing illustrates how the polymerase
is thought to function during the DNA synthesis involved in DNA repair.
(B, adapted from L.S. Beese,
V. Derbyshire, and T.A. Steitz,
Science 260:352-355, 1993. © 1993 the AAAS.)
Figure 6-37
.
The reaction catalyzed by DNA ligase
This enzyme seals a broken phosphodiester bond. As shown, DNA ligase uses a molecule of
ATP to activate the 5' end at the nick (step 1) before forming the new bond
(step 2). In this way the energetically unfavorable nick-sealing reaction is
driven by being coupled to the energetically favorable process of ATP
hydrolysis. In Bloom's syndrome, an inherited human disease, individuals are
partially defective in DNA ligation and consequently are deficient in DNA repair; as
a consequence, they have a dramatically increased incidence of cancer.
Both DNA polymerase and DNA ligase have important general roles in
DNA metabolism; both function in DNA replication as well as in DNA repair, for
example. The reactions that these two enzymes catalyze are illustrated in and , respectively.
DNA Damage Can Be Removed by More Than
One Pathway 23
The details of the excision step in DNA repair depend on the type of
damage. Depurination, for example, which is by far the most frequent lesion that
occurs in DNA, leaves a deoxyribose sugar with a missing base (see ).
This exposed sugar is rapidly recognized by the enzyme
AP endonuclease, which cuts the DNA phosphodiester backbone at the
5' side of the altered site. After excision of the sugar phosphate residue by a phosphodiesterase enzyme, an
undamaged DNA sequence is restored by DNA polymerase and DNA ligase (see ).
Figure 6-38
.
Comparison of two major DNA repair pathways
(A)
Base excision repair. This pathway starts with a
DNA glycosylase. Here the enzyme uracil DNA glycosylase removes an accidentally deaminated cytosine in DNA. After
the action of this glycosylase (or another DNA glycosylase that recognizes a different kind of damage) the sugar
phosphate with the missing base is cut out by the sequential action of AP endonuclease and a phosphodiesterase, the
same enzymes that initiate the repair of depurinated sites. The gap of a single nucleotide is then filled by DNA
polymerase and DNA ligase. The net result is that the U that was created by accidental deamination is restored to a C. The
AP endonuclease derives its name from the fact that it recognizes any site in the DNA helix that contains a
deoxyribose sugar with a missing base; such sites can arise either by the loss of a purine
(
apurinic sites) or by the loss of a pyrimidine
(
apyriminic sites). (B)
Nucleotide excision
repair. After a multienzyme complex recognizes a bulky
lesion such as a pyrimidine dimer (see ), one cut is made on each side of the lesion, and an associated
DNA helicase then removes the entire portion of the damaged strand. The multienzyme complex in bacteria leaves the
gap of 12 nucleotides shown; the gap produced in human DNA is more than twice this size.
A related repair pathway, called base excision
repair, involves a battery of enzymes called
DNA
glycosylases. Each DNA glycosylase recognizes an
altered base in DNA and catalyzes its hydrolytic removal. There are at least six types
of these enzymes, including those that remove deaminated Cs, deaminated As,
different types of alkylated or oxidized bases, bases with opened rings, and
bases in which a carbon-carbon double bond has been accidentally converted to a
carbon-carbon single bond. As an example of the general mechanism that
operates in all cases, the removal of a deaminated C by uracil DNA glycosylase is
shown in . The DNA glycosylase reaction produces a deoxyribose sugar
with a missing base. Because this sugar phosphate is the same substrate
recognized by the AP endonuclease, the subsequent steps in the repair process proceed
in the same way as for depurinated sites. The importance of removing
accidentally deaminated DNA bases has been directly demonstrated. In mutant bacteria
that lack the enzyme uracil DNA glycosylase, the normally low spontaneous rate
of change of a C-G to a T-A base pair is increased about twentyfold.
Cells have a separate nucleotide excision
repair pathway capable of removing almost any type of DNA damage that creates a large change in the
DNA double helix. Such "bulky lesions" include those created by the covalent
reaction of DNA bases with large hydrocarbons (such as the carcinogen benzopyrene),
as well as the various pyrimidine dimers (T-T, T-C, and C-C) caused by sunlight.
In these cases a large multienzyme complex scans the DNA for a distortion in
the double helix rather than for a specific base change. Once a bulky lesion is
found, the phosphodiester backbone of the abnormal strand is cleaved on both sides
of the distortion, and the portion of the strand containing the lesion (an
oligonucleotide) is peeled away from the DNA double helix by a
DNA helicase enzyme (discussed later). The gap produced in the DNA helix is then repaired in the
usual manner by DNA polymerase and DNA ligase ().
The importance of these repair processes is indicated by the large
investment that cells make in DNA repair enzymes. A comprehensive genetic analysis of
a yeast suggests that these cells contain more than 50 different genes that code
for DNA repair functions. DNA repair pathways are likely to be at least as
complex in humans. Individuals with the genetic disease xeroderma pigmentosum, for example, are defective in a nucleotide excision repair process that can be
shown by genetic analysis to require at least seven different gene products. Such
individuals develop severe skin lesions, including skin cancer, because of the
accumulation of pyrimidine dimers in cells that are exposed to sunlight.
Cells Can Produce DNA Repair Enzymes in Response
to DNA Damage 24
Cells have evolved a number of mechanisms to help them survive in a
hazardous world. Often an extreme environmental insult activates a battery of
genes whose products protect the cell from its effects. One such mechanism shared
by all cells is the heat-shock response, which is evoked by the exposure of cells
to unusually high temperatures. The induced "heat-shock proteins" include
some that are thought to help stabilize and repair partially denatured cell proteins
(see Figure 5-29).
Many cells also have mechanisms that enable them to synthesize DNA
repair enzymes as an emergency response to severe DNA damage. The
best-studied example is the SOS response in E.
coli. In this bacterium any block to DNA replication caused by DNA damage produces a signal (thought to be an excess
of single-stranded DNA) that induces an increase in the transcription of more
than 15 genes, many of which code for proteins that function in DNA repair. The
signal first activates the E. coli RecA protein (discussed later), which then destroys
a negatively acting gene regulatory protein (a repressor) that normally suppresses the transcription of the entire set of SOS response genes. Studies of mutant
bacteria deficient in different parts of the response indicate that the newly
synthesized proteins have two effects. First, as would be expected, the induction of
new DNA repair enzymes increases cell survival. When the mutants deficient in
this part of the SOS response are treated with a DNA-damaging agent such as
ultraviolet radiation, an unusually high proportion of them die. Second, several of
the induced proteins transiently increase the mutation rate by greatly increasing
the number of errors made in copying DNA sequences. While this has little effect
on short-term survival, it is presumably advantageous in the long term because
it produces a burst of genetic variability in the bacterial population and hence
increases the chance that a mutant cell with increased fitness will arise.
The DNA repair system activated in the SOS response is not the only
inducible DNA repair system known. Bacteria have another system that is
activated specifically by the presence of methylated nucleotides in DNA, and there is
at least one inducible DNA repair system in yeast cells. Some higher eucaryotic
cells have been reported to adapt to DNA damage in similar ways.
The Structure and Chemistry of the DNA Double
Helix Make It Easy to Repair 25
The DNA double helix seems to be optimally constructed for repair. As
discussed in
Chapter 1, RNA is thought to have evolved before DNA, and it seems likely
that the genetic code was initially carried in the four nucleotides A, C, G, and U.
This raises the question of why the U in RNA has been replaced in DNA by T
(which is 5-methyl U). We have seen that spontaneous C deamination converts C to
U but that this event is rendered harmless by uracil DNA glycosylase (see ). One can imagine how any repair enzyme designed to recognize and
excise such accidents would be confused by the normal U nucleotides in a
U-containing DNA molecule. Thus it is not surprising that U is not used in DNA.
Figure 6-39
.
The deamination of DNA nucleotides
In each case the oxygen atom added from the
reaction with water is colored
red. (A) The spontaneous deamination products
of A and G are recognizable as unnatural when they occur in DNA and thus
are readily recognized and repaired. The deamination of C to U was
illustrated in , and T has no
amino group to deaminate. (B) A few percent of the C nucleotides in
vertebrate DNAs are methylated to help control gene expression. When these
5-methyl C nucleotides are accidentally deaminated, they form
T. This T will be paired with a G on the opposite strand, forming
a mismatched base pair.
This line of argument is strengthened by the observation that every
possible deamination event in DNA yields an unnatural base, which can therefore be
directly recognized and removed by a specific DNA glycosylase.
Hypoxanthine, for example, is the simplest purine base capable of pairing specifically with C,
but hypoxanthine is the direct deamination product of A. The addition of a
second amino group to hypoxanthine produces G, which cannot be formed from A
by spontaneous deamination and whose deamination product is likewise
unique ().
A special situation occurs in vertebrate DNA, where selected C
nucleotides are methylated at specific CG sequences associated with inactive genes
(discussed in
Chapter 9). As illustrated in , the accidental deamination of
these methylated C nucleotides produces the natural nucleotide T, which forms a
mismatched base pair with a G on the opposite DNA strand. To help protect
methylated C nucleotides against such mutations, a special DNA glycosylase
recognizes a mismatched base pair involving T in the sequence TG and removes the T.
This DNA repair mechanism must be relatively ineffective, however, as methylated
C nucleotides are common sites for mutations in vertebrate DNA. Even though
only about 3% of the C nucleotides in human DNA are methylated, mutations in
these methylated nucleotides account for about one-third of the single-base
mutations that have been observed in inherited human diseases (see also
Figure 9-71).
Whereas the chemistry of the bases ensures that deamination will be
detected, accurate repair - and the fundamental answer to Schroedinger's
dilemma - depends on the existence of separate copies of the genetic
information in the two strands of the double helix. Only in the very unlikely event that
both strands are damaged simultaneously at the same base pair is the cell left
without one good copy to serve as a template for DNA repair. Even in this case
mechanisms have evolved that are sometimes able to repair the damage. These
repair mechanisms require that a second DNA helix of the same sequence be
present in the cell, and they use genetic recombination mechanisms to transfer the
missing information from one DNA helix to another - a process called gene conversion, which we discuss later.
Genetic information is stored in single-stranded DNA or RNA molecules
only in some very small viruses with genomes of a few thousand nucleotides.
The types of repair processes that we have described cannot operate on such
nucleic acids, and the chance of a nucleotide change occurring in these viruses is
very high. It seems that only organisms with tiny genomes can afford to encode
their genetic information in a structure other than a DNA double helix.
Summary
The fidelity with which DNA sequences are maintained in higher eucaryotes can
be estimated from the rates at which changes have occurred in nonessential protein
and DNA sequences over evolutionary time. This fidelity is so high that a
mammalian germ-line cell with a genome of 3 ×
109 base pairs is subjected on average to
only about 10 to 20 base-pair changes per year. But unavoidable chemical processes
damage thousands of DNA nucleotides in a typical mammalian cell every day.
Genetic information can be stored stably in DNA sequences only because a large variety
of DNA repair enzymes continuously scan the DNA and replace the damaged
nucleotides.
The process of DNA repair depends on the presence of a separate copy of the
genetic information in each strand of the DNA double helix. An accidental lesion on
one strand can therefore be cut out by a repair enzyme and a good strand
resynthesized from the information in the undamaged strand. Most of the damage to DNA
bases is excised by one of two major pathways. In base excision repair an altered base
is removed by a DNA glycosylase enzyme, followed by excision of the resulting
sugar phosphate. In nucleotide excision repair a small region of the strand surrounding
the damage is removed from the DNA helix as an oligonucleotide. In both cases the
small gap left in the DNA helix is filled in by the sequential action of DNA polymerase
and DNA ligase.
DNA Replication 26
Introduction
Besides maintaining the integrity of DNA sequences by DNA repair, all
organisms must duplicate their DNA accurately before every cell division. DNA replication occurs at polymerization rates of about 500 nucleotides per second in
bacteria and about 50 nucleotides per second in mammals. Clearly, the proteins that
catalyze this process must be both accurate and fast. Speed and accuracy are
achieved by means of a multienzyme complex that guides the process and constitutes
an elaborate "replication machine."
Base-pairing Underlies DNA Replication
as well as DNA Repair 27
DNA templating is the process in which the nucleotide sequence of a DNA
strand (or selected portions of a DNA strand) is copied by complementary
base-pairing (A with T or U, and G with C) into a complementary nucleic acid sequence
(either DNA or RNA). The process entails the recognition of each nucleotide in
the DNA strand by an unpolymerized complementary nucleotide and requires
that the two strands of the DNA helix be separated, at least transiently, so that
the hydrogen bond donor and acceptor groups on each base become exposed
for base-pairing. The appropriate incoming single nucleotides are thereby
aligned for their enzyme-catalyzed polymerization into a new nucleic acid chain. In
1957 the first such nucleotide polymerizing enzyme,
DNA polymerase,was discovered. The substrates for this enzyme were found to be deoxyribonucleoside
triphosphates, which are polymerized on a single-stranded DNA template. The
stepwise mechanism of this reaction is the one previously illustrated in
in connection with DNA repair. The discovery of DNA polymerase led to the
isolation of
RNA polymerase, which was correctly inferred to use ribonucleoside
tri-phosphates as its substrates.
During DNA replication each of the two old DNA strands serves as a
template for the formation of an entire new strand. Because each of the two daughters
of a dividing cell inherits a new DNA double helix containing one old and one
new strand (see Figure 3-13), DNA is said to be replicated "semiconservatively"
by DNA polymerase.
The DNA Replication Fork Is Asymmetrical 28
Autoradiographic analyses carried out in the early 1960s on whole
replicating chromosomes labeled with a short pulse of the radioactive DNA precursor 3H-thymidine revealed a localized region of replication that moves along the
parental DNA double helix. Because of its Y-shaped structure, this active region is
called a DNA replication fork. At a replication fork the DNA of both new
daughter strands is synthesized by a multienzyme complex that contains the DNA
polymerase.
Figure 6-40
.
An incorrect model for DNA replication
Although it might appear to be the simplest
mechanism for DNA replication, the mechanism illustrated here is not the one
that cells use. Note that in this scheme both daughter DNA strands
would grow continuously, using the energy of hydrolysis of the yellow phosphates to add the next nucleotide on
each strand. This would require chain growth in both the
5'-to-3' direction (bottom) and the
3'-to-5' direction (top). No enzyme that catalyzes
3'-to-5' nucleotide polymerization has
ever been found.
Initially, the simplest mechanism of DNA replication appeared to be
continuous growth of both new strands, nucleotide by nucleotide, at the replication
fork as it moves from one end of a DNA molecule to the other. But because of
the antiparallel orientation of the two DNA strands in the DNA double helix
(see
Figure 3-10 and
Panel 3-2, pp. 100-101), this mechanism would require
one daughter strand to grow in the
5'-to-3' direction and the other in the
3'-to-5'
direction. Such a replication fork would require two different DNA
polymerase enzymes. One would polymerize in the
5'-to-3' direction (see ),
where each incoming deoxyribonucleoside triphosphate carries the triphosphate
activation needed for its own addition. The other would move in the
3'-to-5'
direction and work by so-called "head growth," in which the end of the
growing DNA chain carries the triphosphate activation required for the addition of
each subsequent nucleotide. Although head-growth polymerization occurs
elsewhere in biochemistry (see
Figure 2-36), it does not occur in DNA synthesis;
no 3'-to-5' DNA polymerase has ever been found ().
How, then, is 3'-to-5' DNA synthesis achieved? The answer was first
suggested in the late 1960s by experiments in which highly radioactive
3H-thymidine was added to dividing bacteria for a few seconds so that only the most recently
replicated DNA, just behind the replication fork, became radiolabeled. This
selective labeling method revealed the transient existence of pieces of DNA that
were 1000 to 2000 nucleotides long, now commonly known as
Okazaki fragments, at the bacterial growing fork. (Such replication intermediates were later found
in eucaryotes, where they are only 100 to 200 nucleotides long.) The Okazaki
fragments were shown to be synthesized only in the
5'-to-3' chain direction and to be joined together after their synthesis to create long DNA chains by the
same
DNA ligase enzyme that seals nicks during DNA repair (see ).
Figure 6-41
.
The structure of a DNA replication fork
Because both daughter DNA strands
(colored) are synthesized in the
5'-to-3' direction, the DNA synthesized on the
lagging strand must be made initially as a series of short DNA molecules,
called Okazaki fragments.
A replication fork has an asymmetric structure. The DNA daughter
strand that is synthesized continuously is known as the
leading strand, and its synthesis slightly precedes the synthesis of the daughter strand that is synthesized
discontinuously, which is known as the lagging
strand. The synthesis of the lagging strand is delayed because it must wait for the leading strand to expose the
template strand on which each Okazaki fragment is synthesized ().
The synthesis of the lagging strand by a discontinuous, "backstitching"
mechanism means that only the 5'-to-3' type of DNA polymerase is needed for DNA
replication.
The High Fidelity of DNA Replication Requires
a Proofreading Mechanism 29
The fidelity of copying that is observed after DNA replication has occurred is
such that only about 1 error is made in every
109 base-pair replications, as required to maintain the mammalian genome of
3 × 109 DNA base pairs. This fidelity
is much higher than expected, given that the standard complementary base
pairs are not the only ones possible. With small changes in helix geometry, for
example, two hydrogen bonds will form between G and T in DNA. In addition, rare
tautomeric forms of the four DNA bases occur transiently in ratios of 1 part to
104 or 105. These forms will mispair without a change in helix geometry: the rare
tautomeric form of C pairs with A instead of G, for example. If the DNA
polymerase accepts a mispairing that occurs between an incoming
deoxyribonucleoside triphosphate and the DNA template, the wrong nucleotide can be
incorporated into the new DNA chain, producing a mutation. The high fidelity of DNA
replication depends on several "proofreading" mechanisms that act sequentially
to remove errors brought about in these ways.
One important proofreading process depends on special properties of
the DNA polymerase enzyme. Unlike RNA polymerases, DNA polymerases do
not begin a new polynucleotide chain by linking two nucleoside triphosphates
together. They absolutely require the 3'-OH end of a base-paired
primer strand on which to add further nucleotides (see ). Moreover, DNA
molecules with a mismatched (not base-paired) nucleotide at the
3'-OH end of the primer strand are not effective as templates. DNA polymerase molecules are able to
deal with such mismatched DNAs by means of either a separate catalytic subunit
or a covalently linked, separate catalytic site that clips off any unpaired residues
at the primer terminus. Clipping by this
3'-to-5' proofreading exonuclease
activity continues until enough nucleotides have been removed from the
3' end to regenerate a base-paired terminus that can prime DNA synthesis. In this way
DNA polymerase functions as a "self-correcting" enzyme that removes its own
polymerization errors as it moves along the DNA. illustrates how
this proofreading process can correct a base-pairing error.
The requirement for a perfectly base-paired terminus is essential to the
self-correcting properties of the DNA polymerase. For such an enzyme to start
synthesis in the complete absence of a primer without losing any of its
discrimination between base-paired and unpaired growing
3'-OH termini is apparently not possible. By contrast, the RNA polymerase enzymes involved in gene
transcription need not be self-correcting: errors in making RNA are not passed on to
the next generation, and an occasional defective molecule has no significance.
RNA polymerases are able to start new polynucleotide chains without a primer,
and an error frequency of about 1 in
104 is found both in RNA synthesis and in
the separate process of translating mRNA sequences into protein sequences.
Only DNA Replication in the 5'-to-3' Direction Allows Efficient Error Correction
The need for accuracy probably explains why DNA replication occurs only in
the 5'-to-3' direction. If there were a DNA polymerase that added deoxyribonucleoside triphosphates in such a way as to cause chains to grow in the
3'-to-5' chain direction, the growing
5'-chain end rather than the incoming mononucleotide would carry the activating triphosphate. In this case the mistakes
in polymerization could not be simply hydrolyzed away, since the bare
5'-chain end thus created would immediately terminate DNA synthesis. It is much
easier, therefore, to correct a mismatched base that has just been added to the
3' end than one that has just been added to the
5' end of a DNA chain. Although the type of mechanism for DNA replication shown in seems at first sight
much more complex than the incorrect mechanism depicted in ,
it is much more accurate because it involves DNA synthesis only in the
5'-to-3' direction.
A Special Nucleotide Polymerizing Enzyme
Synthesizes Short RNA Primer Molecules on the Lagging
Strand 30
Figure 6-43
.
RNA primer synthesis
A schematic view of the reaction catalyzed by DNA primase,
the enzyme that synthesizes the short RNA primers made on the
lagging strand. Unlike DNA polymerase, this enzyme can start a
new polynucleotide chain by joining two nucleoside triphosphates
together. The primase stops after a short polynucleotide has been
synthesized and makes the 3' end of this
primer available for the DNA polymerase.
Figure 6-44
.
The synthesis of one of the many DNA fragments on the lagging strand
In eucaryotes the RNA primers are made at intervals
spaced by about 200 nucleotides on the lagging strand, and each RNA primer is
10 nucleotides long. This primer is erased by a special DNA repair enzyme
that recognizes an RNA strand in an RNA/DNA helix and excises it; this leaves
a gap that is filled in by DNA polymerase and DNA ligase, as we saw for
the DNA repair process (see ).
For the leading strand a special primer is needed only at the start of
replication; once a replication fork is established, the DNA polymerase is continuously
presented with a base-paired chain end on which to add new nucleotides. But
the DNA polymerase on the lagging side of the fork requires only about 4 seconds
to complete each short DNA fragment, after which it must start synthesizing a
completely new fragment at a site farther along the template strand (see ). A special mechanism is needed to produce the base-paired primer strand
required by this DNA polymerase molecule. The mechanism involves an
enzyme called
DNA primase, which uses ribonucleoside triphosphates to synthesize
short RNA primers (). These primers are about 10 nucleotides long in
eucaryotes, and they are made at intervals on the lagging strand, where they
are elongated by the DNA polymerase to begin each Okazaki fragment. The
synthesis of each Okazaki fragment ends when this DNA polymerase runs into the
RNA primer attached to the 5' end of the previous fragment. To produce a
continuous DNA chain from the many DNA fragments made on the lagging strand,
a special DNA repair system acts quickly to erase the old RNA primer and
replace it with DNA. DNA ligase then joins the
3' end of the new DNA fragment to the 5' end of the previous one to complete the process ().
Why might an erasable RNA primer be preferred to a DNA primer that
need not be erased? The argument that a self-correcting polymerase cannot
start chains de novo also implies its converse: an enzyme that starts chains de novo cannot be efficient at self-correction. Thus any enzyme that primes the
synthesis of Okazaki fragments will of necessity make a relatively inaccurate copy (at
least 1 error in 105). Even if the copies retained in the final product constituted as
little as 5% of the total genome (for example, 10 nucleotides per 200-nucleotide
DNA fragment), the resulting increase in overall mutation rate would be enormous.
It therefore seems likely that the evolution of RNA rather than DNA for
priming entailed a powerful advantage, since the ribonucleotides in the primer
automatically mark these sequences as "bad copy" to be removed.
Special Proteins Help Open Up the DNA Double Helix
in Front of the Replication Fork 31
The DNA double helix must be opened up ahead of the replication fork so
that the incoming deoxyribonucleoside triphosphates can form base pairs with
the template strand. The DNA double helix is very stable under normal conditions:
the base pairs are locked in place so strongly that temperatures approaching
that of boiling water are required to separate the two strands in a test tube. For
this reason most DNA polymerases can copy DNA only when the template strand
has already been separated from its complementary strand. Additional proteins
are needed to help open the double helix and thus provide the appropriate
exposed DNA template for the DNA polymerase to copy. Two types of replication
proteins contribute to this processDNA helicases and single-strand DNA-binding
proteins.
Figure 6-45
.
The assay used to test for DNA helicase enzymes
A short DNA fragment is annealed to a long DNA single strand to form a region of
DNA double helix. The double helix is melted as the helicase runs along the
DNA single strand, releasing the short DNA fragment in a reaction that
requires the presence of both the helicase protein and ATP. The movement of
the helicase is powered by its ATP hydrolysis (see Figure 5-22).
DNA helicases were first isolated as proteins that hydrolyze ATP when
they are bound to single strands of DNA. As described in
Chapter 5, the hydrolysis
of ATP can change the shape of a protein molecule in a cyclical manner that
allows the protein to perform mechanical work. DNA helicases utilize this principle
to move rapidly along a DNA single strand; where they encounter a region of
double helix, they continue to move along their strand, thereby prying apart the
helix (). We have previously described how a special DNA repair
helicase functions in nucleotide excision repair (see ).
The unwinding of the template DNA helix at a replication fork could in
principle be catalyzed by two DNA helicases acting in concert - one running
along the leading strand and one along the lagging strand. These two helicases
would need to move in opposite directions along a DNA single strand and
therefore would have to be different enzymes. Both types of DNA helicase, in fact, do
exist, although in bacteria the DNA helicase on the lagging strand plays the
predominant role, for reasons that will become clear shortly.
Figure 6-46
.
The effect of single-strand binding proteins on the structure of single-stranded DNA
Because each protein molecule prefers to bind next to a
previously bound molecule (cooperative
binding) long rows of this protein will form on a DNA single strand.
This cooperative binding straightens out the DNA template and facilitates
the DNA polymerization process. The "hairpin helices" shown in the
bare single-stranded DNA result from a chance matching of short regions
of complementary nucleotide sequence; they are similar to the short
helices that typically form in RNA molecules.
Single-strand DNA-binding (SSB) proteins - also called
helix-destabilizing proteins - bind to exposed DNA strands without covering the bases, which
therefore remain available for templating. These proteins are unable to open a
long DNA helix directly, but they aid helicases by stabilizing the unwound,
single-stranded conformation. In addition, their cooperative binding completely
coats the regions of single-stranded DNA on the lagging strand, thereby preventing
formation of the short hairpin helices that would otherwise impede synthesis by
the DNA polymerase ().
A Moving DNA Polymerase Molecule Is Kept Tethered
to the DNA by a Sliding Ring 32
On their own, most DNA polymerase molecules will synthesize only a short
string of nucleotides before falling off a DNA template. This tendency to leave a
DNA molecule quickly allows the DNA polymerase molecule that has just
finished synthesizing one Okazaki fragment on the lagging strand to be recycled
quickly to begin the synthesis of the next Okazaki fragment on the same strand.
This rapid dissociation, however, would make it difficult for the polymerase to
synthesize long DNA strands at a replication fork were it not for an accessory
protein that functions as a regulated clamp. This clamp keeps the polymerase
firmly on the DNA when it is moving, but releases it as soon as the polymerase stops.
Figure 6-47
.
The regulated sliding clamp that holds DNA polymerase on the DNA
(A) The structure of the sliding clamp from E. coli, with a DNA helix added to indicate how
the protein fits around DNA. A similar protein is present in eucaryotic
cells. (B) Schematic illustration of how the clamp is thought to hold a
moving DNA polymerase molecule on the DNA. (A, from X.-P. Kong et
al. , Cell 69:425-437, 1992. © Cell Press.)
How can a clamp prevent the polymerase from dissociating without at
the same time impeding the polymerase's rapid movement along the DNA
molecule? The three-dimensional structure of a clamp protein, determined by x-ray
diffraction, indicates that it forms a large ring around the DNA helix. One side of the
ring binds to the back of the DNA polymerase, and the whole ring slides freely as
the polymerase moves along a DNA strand (). The assembly of the
clamp around DNA requires ATP hydrolysis by special accessory proteins that bind
both to the clamp protein and to DNA; it is not known how the clamp is
disassembled to remove it from the DNA.
The Proteins at a Replication Fork Cooperate
to Form a Replication Machine 33
Although we have discussed DNA replication as though it were carried out by
a mixture of replication proteins that act independently, in reality most of the
proteins are held together in a large multienzyme complex that moves rapidly
along the DNA. This complex can be likened to a tiny sewing machine composed
of protein parts and powered by nucleoside triphosphate hydrolyses. Although
the replication complex has been best characterized in E. coli and several of its viruses, a very similar complex operates in eucaryotes (see p. 358).
Figure 6-48
.
The proteins at a DNA replication fork
The major types of proteins that act at a DNA
replication fork are illustrated, showing their positions on the DNA.
The functions of the subunits of the replication machine are summarized
in the two-dimensional diagram of the complete replication fork shown in . Two identical DNA polymerase molecules work at the fork, one on the
leading strand and one on the lagging strand. The DNA helix is opened by a
DNA polymerase molecule clamped on the leading strand, acting in concert with
a DNA helicase molecule running along the lagging strand; helix opening is
aided by cooperatively bound molecules of single-strand DNA-binding protein.
While the DNA polymerase molecule on the leading strand can operate in a
continuous fashion, the DNA polymerase molecule on the lagging strand must restart
at short intervals, using a short RNA primer made by a DNA primase molecule.
Figure 6-49
.
A replication fork in three dimensions
This diagram shows a current view of how
the replication proteins are arranged at a replication fork when the fork
is moving. The two-dimensional structure of has
been altered by folding the DNA on the lagging strand to bring the
lagging-strand DNA polymerase molecule into a complex with the
leading-strand DNA polymerase molecule. This folding process also brings the
3' end of each completed Okazaki fragment close to the start site for the
next Okazaki fragment (compare with ). Because the
lagging-strand DNA polymerase molecule is held to the rest of the
replication proteins, it can be reused to synthesize successive
Okazaki fragments; thus it is about to let go of its completed DNA fragment
and move to the RNA primer that will be synthesized nearby, as required
to start the next DNA fragment. Note that one daughter DNA helix
extends toward the bottom right and the other toward the top left in this diagram.
The efficiency of replication is greatly increased by the close association
of all these protein components. The primase molecule is linked directly to the
DNA helicase to form a unit on the lagging strand called a
primosome. Powered by the DNA helicase, the primosome moves with the fork, synthesizing RNA primers
as it goes. Similarly, the DNA polymerase molecule that synthesizes DNA on
the lagging strand moves in concert with the rest of the proteins, synthesizing a
succession of new Okazaki fragments. To accommodate this arrangement, its
DNA template strand is thought to be folded back in the manner shown in . The replication proteins are thus linked together into a single large unit
(total mass > 10
6 daltons) that moves rapidly along the DNA, enabling DNA to be
synthesized on both sides of the fork in a coordinated and efficient manner.
This DNA replication machine leaves behind on the lagging strand a
series of unsealed Okazaki fragments, which still contain the RNA that primed
their synthesis at their 5' ends. This RNA must be removed and the fragments
joined up by DNA repair enzymes that operate behind the replication fork (see ).
A Mismatch Proofreading System Removes Replication Errors That Escape from the Replication
Machine 34
Bacteria such as
E. coli are capable of dividing once every 30 minutes, so it
is relatively easy to screen large populations to find rare mutants that are
altered in a specific process. One interesting class of mutants contains alterations in
so-called
mutator genes, which greatly increase the rate of spontaneous
mutation. Not surprisingly, one such mutant encodes a defective form of the
3'-to-5' proofreading exonuclease (discussed earlier) that is a subunit of the DNA
polymerase enzyme (see ). When this protein is defective, the DNA
polymerase no longer proofreads effectively, and many replication errors that
would otherwise have been removed accumulate in the DNA.
The study of other E. coli mutants that exhibit abnormally high
mutation rates has uncovered another proofreading system that removes replication
errors missed by the proofreading exonuclease. This
mismatch proofreading system (also called a mismatch
repair system) differs from most DNA repair systems
in that it does not depend on the presence in the DNA of abnormal nucleotides
that can be recognized and excised. Instead, it detects the distortion on the
outside of the helix that results from the misfit between noncomplementary base
pairs. But if the proofreading system simply recognized a mismatch in newly
replicated DNA and randomly excised one of the two mismatched nucleotides, it
would make the mistake of "correcting" the original template strand to match the
error exactly half the time and would not therefore lower the overall error rate.
To be effective, the proofreading system must be able to distinguish and remove
the mismatched nucleotide only on the new strand, where the replication error
occurred.
The recognition system used by the mismatch proofreading system in E. coli depends on the methylation of selected A residues in the DNA. Methyl groups
are added to all A residues in the sequence GATC, but not until some time after
the A has been incorporated into a newly synthesized DNA chain. Because only
the new strands just behind a replication fork will contain GATC sequences that
have not yet been methylated, these new DNA strands can be distinguished from
old ones.
Figure 6-50
.
A model for mismatch proofreading in eucaryotes
The two proteins shown are present in both bacteria and eucaryotic cells:
MutS binds specifically to a mismatched base pair, while MutL scans the
nearby DNA for a nick. Once a nick is found, MutL triggers the degradation of
the nicked strand all the way back through the mismatch. Because nicks
are largely confined to newly replicated strands in eucaryotes,
replication errors are selectively removed. In bacteria the mechanism is the
same except that an additional protein in the complex (MutH)
nicks unmethylated (and therefore newly replicated) GATC sequences
and thereby begins the process that is illustrated here. We know the
mechanism because these reactions have been reconstituted in a cell-free
system containing purified bacterial proteins and DNA.
More recently, eucaryotic proteins have been discovered that are
homologous in their amino acid sequence to several of the bacterial proteins that
catalyze mismatch proofreading. As expected, when the genes that encode these
proteins are deleted in a yeast cell, mutation rates can increase by 100-fold or more.
There must, however, be some important differences between the bacterial and
eucaryotic proofreading mechanisms, as the mechanism for distinguishing the
newly synthesized strand from the parental template strand at the site of a
mismatch cannot depend on DNA methylation as in bacteria, since some eucaryotes,
such as yeasts and
Drosophila, do not methylate any of their DNA. Newly
synthesized DNA strands are known to be preferentially
nicked, and it has been suggested that such nicks (single-strand breaks) provide the signal that directs mismatch
proofreading to the appropriate strand in a eucaryotic cell ().
Replication Forks Initiate at Replication
Origins 35
Figure 6-51
.
Replication fork initiation
The figure outlines the processes involved in the initiation
of replication forks at replication origins. (See also .)
In both bacteria and mammals replication forks originate at a structure called
a
replication bubble, a local region where the two strands of the parental DNA
helix have been separated from each other to serve as templates for DNA
synthesis (). For bacteria, yeasts, and several viruses that grow in
mammalian cells, replication bubbles have been shown to form at special DNA
sequences called replication origins, which can be as long as 300 nucleotides. For
reasons that are not clear, the replication origins in mammalian chromosomes have
thus far been very difficult to characterize at the molecular level.
Figure 6-52
.
The proteins that initiate DNA replication
The major types of proteins involved in
the formation of replication forks at the
E.
coli and bacteriophage lambda replication origins are indicated.
The mechanism shown was established by
in
vitro studies utilizing a mixture of highly purified proteins.
Subsequent steps result in the initiation of three more DNA chains (see )
by a pathway that is not yet clear. For
E.
coli DNA replication, the major initiator protein is the dnaA
protein; for both lambda and
E. coli, the primosome is composed of the
dnaB (DNA helicase) and dnaG (DNA primase) proteins.
For several well-defined replication origins, it has been possible to
reproduce the fork initiation reaction
in
vitro. The
in vitro studies reveal that fork
initiation in bacteria and bacterial viruses starts in the manner indicated in .
Initiator proteins bind in multiple copies to specific sites at the replication
origin, wrapping the DNA around them to form a large protein-DNA complex.
This complex then binds the DNA helicase and loads it onto an exposed DNA
single strand in an adjacent region of helix. The DNA primase also binds, forming
the primosome, which moves away from the origin and makes an RNA primer
that starts the first DNA chain. This quickly leads to assembly of the remaining
proteins to create two replication protein complexes moving away from the
origin in opposite directions (see ); these continue to synthesize DNA
until all of the DNA template downstream of each fork has been replicated.
Replication fork initiation in eucaryotic chromosomes is discussed in
detail in Chapter 8.
DNA Topoisomerases Prevent DNA Tangling
During Replication 36
Figure 6-53
.
The "winding problem" that arises during DNA replication
For a bacterial replication
fork moving at 500 nucleotides per second, the parental DNA helix
ahead of the fork must rotate at 50 revolutions per second.
When we draw the DNA helix (incorrectly) as a flat, ladderlike structure, we
are ignoring the "winding problem" that arises during DNA replication. Every 10
base pairs replicated at the fork correspond to one complete turn about the axis of
the parental double helix. Therefore, for a replication fork to move, the entire
chromosome ahead of the fork would normally have to rotate rapidly (), which would require large amounts of energy for long chromosomes. An
alternative strategy is used during DNA replication: a swivel is formed in the DNA
helix by proteins known as DNA topoisomerases.
A DNA topoisomerase can be viewed as a reversible nuclease that adds
itself covalently to a DNA phosphate, thereby breaking a phosphodiester bond in
a DNA strand. Because the covalent linkage that joins a topoisomerase to a
DNA phosphate retains the energy of the cleaved phosphodiester bond, the
cleavage reaction is reversible; resealing is rapid and does not require additional
energy input. The rejoining mechanism is different in this respect from that of the
enzyme DNA ligase, discussed previously (see ).
Figure 6-54
.
The reversible nicking reaction catalyzed by a eucaryotic DNA topoisomerase I enzyme
As indicated, these enzymes form a transient covalent bond with DNA
so as to allow free rotation about the covalent bonds linked to
the blue phosphate.
One type of topoisomerase (
topoisomerase
I) causes a
single-strand break (or
nick), which can allow the two sections of DNA helix on either side of the
nick to rotate freely relative to each other, using the phosphodiester bond in the
strand opposite the nick as a swivel point (). Any tension in the DNA
helix will drive this rotation in the direction that relieves the tension. As a result,
DNA replication can occur with the rotation of only a short length of helix - the
part just ahead of the fork. The analogous problem that arises during DNA
transcription is solved in a similar way.
Figure 6-55
.
DNA topoisomerase II
An example of a DNA-helix-passing reaction catalyzed by a type II
DNA topoisomerase. Unlike type I topoisomerases, these
enzymes require ATP hydrolysis for their function, and some of the
bacterial versions can introduce superhelical tension into DNA (see p. 438). Type
II topoisomerases are largely confined to proliferating cells in
eucaryotes; partly for that reason, they have been popular targets for anticancer drugs.
A second type of DNA topoisomerase (
topoisomerase
II) forms a covalent linkage to both strands of the DNA helix at the same time, making a
transient
double-strand break in the helix. These enzymes are activated by sites on
chromosomes where two double helices cross over each other. When the
topo-isomerase binds to such a crossing site, it (1) breaks one double helix
reversibly to create a DNA "gate," (2) causes the second, nearby double helix to pass
through this break, and (3) reseals the break and dissociates from the DNA. In this
way type II DNA topoisomerases can efficiently separate two interlocked DNA
circles (). The same reaction prevents the severe DNA tangling problems
that would otherwise arise during DNA replication. For example, mutant yeast
cells have been isolated that produce, in place of the normal topoisomerase II, a
version that is inactive at 37°C. When the mutant yeast cells are warmed to this
temperature, their chromosomes remain intertwined at mitosis and are unable
to separate. The usefulness of topoisomerase II for untangling chromosomes
can readily be appreciated by anyone who has struggled to remove a tangle from
a fishing line without the aid of scissors.
DNA Replication Is Basically Similar in Eucaryotes
and Procaryotes 37
Much of what we know about DNA replication comes from studies of
purified bacterial and bacteriophage multienzyme systems capable of DNA replication in vitro. The development of these systems in the 1970s was greatly facilitated
by the prior isolation of mutants in a variety of replication genes; these mutants
were exploited to identify and purify the corresponding replication proteins.
Less is known about the detailed enzymology of DNA replication in
eucaryotes, largely because it is difficult to obtain replication-deficient mutants.
Nevertheless, the basic mechanisms of DNA replication, including both the
geometry of the replication fork and the protein components of the
multiprotein replication machine, are similar for procaryotes and eucaryotes (see Figure 8-35). The major difference is that eucaryotic DNA is replicated not as bare DNA
but as chromatin, in which the DNA is complexed with tightly bound proteins
called histones. As described in Chapter 8, histones form disclike structures
around which the eucaryotic DNA is wound, creating a repeating structural unit
called a nucleosome. Nucleosomes are spaced at intervals of about 200 base pairs
along the DNA, which may be why new Okazaki fragments are synthesized on the
lagging strand at intervals of 100 to 200 nucleotides in eucaryotes instead of at
intervals of 1000 to 2000 nucleotides as in bacteria. Nucleosomes may also act
as barriers that slow down the movement of DNA polymerase molecules,
which could explain why eucaryotic replication forks move only one-tenth as fast
as bacterial replication forks.
Summary
A self-correcting DNA polymerase catalyzes nucleotide polymerization in a
5'-to-3' direction, copying a DNA template with remarkable fidelity. Since the two strands
of a DNA double helix are antiparallel, this
5'-to-3' DNA synthesis can take place
continuously on only one of the strands at a replication fork (the leading strand). On
the lagging strand short DNA fragments are made by a "backstitching" process.
Because the self-correcting DNA polymerase cannot start a new chain, these
lagging-strand DNA fragments are primed by short RNA primer molecules that are
subsequently erased and replaced with DNA.
DNA replication requires the cooperation of many proteins, including (1)
DNA polymerase and DNA primase to catalyze nucleoside triphosphate
polymerization, (2) DNA helicases and single-strand binding proteins to help open up the DNA
helix so that it can be copied, (3) DNA ligase and an enzyme that degrades RNA
primers to seal together the discontinuously synthesized lagging-strand DNA
fragments, (4) DNA topoisomerases to help relieve helical winding and tangling problems,
and (5) initiator proteins that bind to specific DNA sequences at a replication origin
and catalyze the formation of a replication fork at that site. At a replication origin a
specialized protein-DNA structure is formed that subsequently loads a DNA helicase
onto the DNA template; other proteins are then added to form the multienzyme
"replication machine" that catalyzes DNA synthesis.
Genetic
Recombination 38
Introduction
In the two preceding sections we discussed the mechanisms by which DNA
sequences in cells are maintained from generation to generation with very
little change. Although such genetic stability is crucial for the survival of
individuals, in the longer term the survival of organisms may depend on genetic
variation, through which they can adapt to a changing environment. Thus an
important property of the DNA in cells is its ability to undergo rearrangements that can
vary the particular combination of genes present in any individual genome, as well
as the timing and the level of expression of these genes. These DNA
rearrangements are caused by genetic
recombination. Two broad classes of genetic
recombination are commonly recognized - general recombination and site-specific
recombination.
In general recombination, genetic exchange takes place between any pair
of homologous DNA sequences, usually located on two copies of the same
chromosome. One of the most important examples is the exchange of sections of
homologous chromosomes (homologues) in the course of meiosis. This "crossing-over" occurs between tightly apposed chromosomes early in the development
of eggs and sperm (discussed in Chapter 20), and it allows different versions
(alleles) of the same gene to be tested in new combinations with other genes,
increasing the chance that at least some members of a mating population will survive in
a changing environment. Although meiosis occurs only in eucaryotes, the
advantage of this type of gene mixing is so great that mating and the reassortment
of genes by general recombination is also widespread in bacteria.
Figure 6-80
.
The life cycle of bacteriophage lambda
The lambda genome contains about
50,000 nucleotide pairs and encodes about 50 proteins. Its double-stranded
DNA can exist in either linear or circular forms. As shown, the
bacteriophage can multiply by either a lytic or a lysogenic pathway in the
E. coli bacterium. When the bacteriophage
is growing in the lysogenic state, damage to the cell causes
the integrated viral DNA (provirus) to exit from the host chromosome and
shift to lytic growth. The entrance and exit of the DNA from the chromosome
are site-specific genetic recombination events catalyzed by the
lambda
integrase protein (see ).
DNA homology is not required in
site-specific
recombination. Instead, exchange occurs at short, specific nucleotide sequences (on either one or both
of the two participating DNA molecules) that are recognized by a variety of
site-specific recombination enzymes. Site-specific recombination therefore alters
the relative positions of nucleotide sequences in genomes. In some cases
these changes are scheduled and organized, as when an integrated bacterial virus
is induced to leave a chromosome of a bacterium under stress (see ); in others they are haphazard, as when the DNA sequence of a transposable
element is inserted at a randomly selected site in a chromosome.
As for DNA replication, most of what we know about the biochemistry
of genetic recombination has come from studies of procaryotic organisms,
especially of E. coli and its viruses.
General Recombination Is Guided by Base-pairing Interactions Between Complementary Strands
of Two Homologous DNA Molecules 39
Figure 6-56
.
General recombination
The breaking and rejoining of two homologous DNA double
helices creates two DNA molecules that have "crossed over."
Figure 6-57
.
A heteroduplex joint
This structure unites two DNA molecules where they have
crossed over. Such a joint is often thousands of nucleotides long.
General recombination involves DNA strand-exchange intermediates that
require some effort to understand. Although the exact pathway followed is
likely to be different in different organisms, detailed genetic analyses of
viruses, bacteria, and fungi suggest that the major outcome of general recombination
is always the same. (1) Two homologous DNA molecules "cross over"; that is,
their double helices break and the two broken ends join to their opposite partners
to re-form two intact double helices, each composed of parts of the two initial
DNA molecules (). (2) The site of exchange (that is, where a
red double helix is joined to a
green double helix in ) can occur anywhere in the
homologous nucleotide sequences of the two participating DNA molecules. (3)
At the site of exchange, a strand of one DNA molecule becomes base-paired to
a strand of the second DNA molecule to create a
staggered joint (usually called a
heteroduplex
joint) between the two double helices (). The
heteroduplex region can be thousands of base pairs long; we shall explain later how
it forms. (4) No nucleotide sequences are altered at the site of exchange; the
cleavage and rejoining events occur so precisely that not a single nucleotide is lost
or gained. Despite this precision, general recombination creates DNA molecules
of novel sequence: the heteroduplex joint can contain a small number of
mismatched base pairs, and, more important, the two DNAs that cross over are
usually not exactly the same on either side of the joint.
The mechanism of general recombination ensures that two regions of
DNA double helix undergo an exchange reaction only if they have extensive
sequence homology. The formation of a heteroduplex joint requires that such
homology be present because it involves a long region of complementary base-pairing
between a strand from one of the two original double helices and a
complementary strand from the other. But how does this heteroduplex joint arise, and
how do the two homologous regions of DNA at the site of crossing-over recognize
each other? As we shall see, recognition takes place by means of a direct
base-pairing interaction. The formation of base pairs between complementary strands
from the two DNA molecules then guides the general recombination process,
allowing it to occur only between long regions of matching DNA sequence.
General Recombination Can Be Initiated at a Nick
in One Strand of a DNA Double Helix 40
Each of the two strands in a DNA molecule is helically wound around the
other. As a result, extensive base-pair interactions can occur between two
homologous DNA double helices only if a nick is first made in a strand of one of them,
freeing that strand for the unwinding and rewinding events required to form a
heteroduplex with another DNA molecule. For the same reason, any
exchange of strands between two DNA double helices requires at least two nicks, one in
a strand of each interacting double helix. Finally, to produce the heteroduplex
joint illustrated in , each of the four strands present must be cut to
allow each to be joined to a different partner. In general recombination, these
nicking and resealing events are coordinated so that they occur only when two
DNA helices share an extensive region of matching DNA sequence.
Figure 6-58
.
One way to start a recombination event
The RecBCD protein is an enzyme required
for general genetic recombination in
E.
coli. The protein enters the DNA from one end of the double helix and
then uses energy derived from the hydrolysis of bound ATP molecules
to propel itself in one direction along the DNA at a rate of about
300 nucleotides per second. A special recognition site (a DNA sequence
of eight nucleotides scattered throughout the
E.
coli chromosome) is cut in the traveling loop of
DNA created by the RecBCD protein, and thereafter a single-stranded whisker
is displaced from the helix, as shown. This whisker is thought to
initiate genetic recombination by pairing with a homologous helix, as in .
Figure 6-59
.
The initial strand exchange in general recombination
A nick in a single DNA strand frees
the strand, which then invades a homologous DNA double helix
to form a short pairing region with one of the strands in the second
helix. Only two DNA molecules that are complementary in
nucleotide sequence can base-pair in this way and thereby initiate a
general recombination event. All of the steps shown here can be catalyzed
by known enzymes (see and ).
There is evidence from a number of sources that a single nick in only
one strand of a DNA molecule is sufficient to initiate general recombination.
Chemical agents or types of irradiation that introduce single strand nicks, for example,
will trigger a genetic recombination event. Moreover, one of the special
proteins required for general recombination in
E.
colithe
RecBCD proteinhas been shown to make single strand nicks in DNA molecules. The RecBCD protein is
also a DNA helicase, hydrolyzing ATP and traveling along a DNA helix
transiently exposing its strands. By combining its nuclease and helicase activities,
the RecBCD protein will create a single-stranded "whisker" on the DNA double
helix (). shows how such a whisker could initiate a
base-pairing interaction between two complementary stretches of DNA
double helix.
DNA Hybridization Reactions Provide a Simple Model
for the Base-pairing Step in General
Recombination 41
Figure 6-60
.
DNA hybridization
DNA double helices re-form from
their separated strands in a reaction that depends on the random collision
of two complementary strands (see
p. 300). Most such collisions are not productive, as shown at the left, but
a few result in a short region where complementary base pairs
have formed (helix nucleation). A rapid zippering then leads to the
formation of a complete double helix. A DNA strand can use this
trial-and-error process to find its complementary partner in the midst of millions
of nonmatching DNA strands. Trial-and-error recognition of a
complementary partner DNA sequence appears to initiate all general
recombination events.
In its simplest form, the type of base-pairing interaction central to general
recombination can be mimicked in a test tube by allowing a DNA double helix to
re-form from its separated single strands. This process, called
DNA renaturation or hybridization, occurs when a rare random collision juxtaposes
complementary nucleotide sequences on two matching DNA single strands, allowing the
formation of a short stretch of double helix between them. This relatively slow
helix nucleation step is followed by a very rapid "zippering" step as the region
of double helix is extended to maximize the number of base-pairing
interactions ().
Formation of a new double helix in this way requires that the
annealing strands be in an open, unfolded conformation. For this reason
in vitro hybridization reactions are carried out at high temperature or in the presence of
an organic solvent such as formamide; these conditions "melt out" the short
hairpin helices formed where base-pairing interactions occur within a single
strand that folds back on itself. Bacterial cells could not survive such harsh
conditions and instead use a single-strand binding protein, the
SSB protein, to open their helices. This protein is essential for DNA replication as well as for general
recombination in
E. coli;it binds tightly and cooperatively to the sugar-phosphate
backbone of all single-stranded regions of DNA, holding them in an extended
conformation with their bases exposed (see ). In this extended
conformation a DNA single strand can base-pair efficiently with either a nucleoside
triphosphate molecule (in DNA replication) or a complementary section of another
DNA single strand (in genetic recombination). When hybridization reactions are
carried out
in vitro under conditions that mimic those inside a cell, the SSB
protein speeds up the rate of DNA helix nucleation and thereby the overall rate of
strand annealing by a factor of more than 1000.
The RecA Protein Enables a DNA Single Strand to Pair
with a Homologous Region of DNA Double Helix in E. coli42
Figure 6-61
.
The structure of the RecA protein
A string of three RecA monomers is shown, with
the position of each ATP in red. The white spheres show the putative position of the single-strand DNA in the
filament, with three nucleotides (each shown as a sphere) bound per monomer.
(From R.M. Story, I.T. Weber, and T.A. Steitz, Nature 256:318-325, 1992. © 1992 Macmillan Magazines Ltd.)
Figure 6-62
.
DNA synapsis catalyzed by the RecA protein
In vitro experiments show that several
types of complexes are formed between a DNA single strand covered with
RecA protein (
red) and a DNA double helix
(
green). First a non-base-paired complex is formed, which
is converted to a three-stranded structure as soon as a region
of homologous sequence is found. This complex is presumably
unstable because it involves an unusual form of DNA, and it spins out a
DNA heteroduplex (one strand
green and the other strand
red) plus a displaced single strand from the original
helix (
green); thus the structure shown in this diagram migrates to the
left, reeling in the "input DNAs" while producing the "output DNAs."
The net result is a DNA strand exchange identical to that diagrammed
earlier in . (Adapted from S.C. West,
Annu. Rev. Biochem. 61:603-640, 1992. © Annual Reviews Inc.)
General recombination is more complex than the simple hybridization
reactions just described. In the course of general recombination, a single DNA strand
from one DNA double helix must invade another double helix (see ). In
E. coli this requires the RecA protein, produced by the
recA gene, which was identified in 1965 as being essential for recombination between chromosomes.
Long sought by biochemists, this elusive gene product was finally purified to
homogeneity in 1976, a feat that allowed its detailed characterization ().
Like a single-strand binding (SSB) protein, the RecA protein binds tightly and in
large cooperative clusters to single-stranded DNA to form a nucleoprotein
filament. This filament has several distinctive properties. The RecA protein has more
than one DNA-binding site, for example, and it can therefore hold a single strand
and a double helix together. These sites allow the RecA protein to catalyze a
multistep reaction (called synapsis) between a DNA double helix and a
homologous region of single-stranded DNA. The crucial step in synapsis occurs when a
region of homology is identified by an initial base-pairing between
complementary nucleotide sequences. The nucleation step in this case appears to involve a
three-stranded structure, in which the DNA single strand forms nonconventional
base pairs in the major groove of the DNA double helix (). This begins
the pairing shown previously in and so initiates the exchange of
strands between two recombining DNA double helices. Studies
in vitro suggest that the
E. coli SSB protein cooperates with the RecA protein to facilitate these reactions.
Figure 6-63
.
Two types of
DNA branch migration observed in experiments in
vitro
(A) Spontaneous branch migration is a back-and-forth, random-walk type
of process, and it therefore makes little progress over long distances.
(B) RecA-protein-directed branch migration proceeds at a uniform
rate in one direction, and it may be driven by the polarized assembly of the
RecA protein filament on a DNA single strand, which occurs in the
direction indicated. In addition, special DNA helicases that catalyze
protein-directed branch migration even
more efficiently are involved in recombination.
Once synapsis has occurred, a short heteroduplex region where the
strands from two different DNA molecules have begun to pair is enlarged through
protein-directed branch migration, which can also be catalyzed by the RecA
protein. Branch migration can take place at any point where two single DNA strands
with the same sequence are attempting to pair with the same complementary
strand; an unpaired region of one of the single strands will displace a paired region
of the other single strand, moving the branch point without changing the
total number of DNA base pairs. Spontaneous branch migration proceeds equally
in both directions, and so it makes little progress and is unlikely to complete
recombination efficiently (). Because the RecA protein catalyzes
unidirectional branch migration, it readily produces a region of heteroduplex that is
thousands of base pairs long ().
The catalysis of branch migration depends on a further property of the
RecA protein. In addition to having two DNA-binding sites, the RecA protein is a
DNA-dependent ATPase, with an additional site for binding and hydrolyzing ATP.
The protein associates much more tightly with DNA when it has ATP bound
than when it has ADP bound. Moreover, new RecA molecules with ATP bound
are preferentially added at one end of the RecA protein filament, and the ATP is
then hydrolyzed to ADP. The RecA protein filaments that form on DNA may
therefore share many of the dynamic assembly properties displayed by the
cytoskeletal filaments formed from actin or tubulin (discussed in
Chapter 16); an ability of
the protein to "treadmill" unidirectionally along a DNA strand, for example,
could drive the branch migration reaction shown in .
General Genetic Recombination Usually Involves
a Cross-Strand Exchange 43
Figure 6-64
.
The formation of a cross-strand exchange
There are many possible pathways that can
lead from a single-strand exchange (see ) to a
cross-strand exchange, but only one is shown.
Exchanging a single strand between two double helices is presumed to be
the slow and difficult step in a general recombination event (see ).
After this initial exchange, extending the region of pairing and establishing
further strand exchanges between the two closely apposed helices is thought to
proceed rapidly. During these events a limited amount of nucleotide excision and
local DNA resynthesis often occurs, resembling some of the events in DNA
repair. Because of the large number of possibilities, different organisms are likely
to follow different pathways at this stage. In most cases, however, an
important intermediate structure, the cross-strand
exchange, will be formed by the two participating DNA helices. One of the simplest ways in which this structure
can form is shown in .
Figure 6-65
.
The isomerization of a cross-strand exchange
Without isomerization, cutting the two crossing strands would terminate
the exchange and crossing over would not occur. With isomerization (steps
B and C), cutting the two crossing strands creates two DNA molecules
that have crossed over (
bottom). Isomerization is therefore thought to
be required for the breaking and rejoining of two homologous DNA
double helices that result from general genetic recombination. Step A
was illustrated previously (see ).
In the cross-strand exchange (also called a
Holliday junction) the two homologous DNA helices that initially paired are held together by mutual
exchange of two of the four strands present, one originating from each of the helices.
No disruption of base-pairing is necessary to maintain this structure, which has
two important properties(1) the point of exchange between the two
homologous DNA double helices (where the two strands cross in ) can
migrate rapidly back and forth along the helices by a double branch migration; (2)
the cross-strand exchange contains two pairs of strands: one pair of crossing
strands and one pair of noncrossing strands. The structure can
isomerize, however, by undergoing a series of rotational movements, so that the two original
noncrossing strands become crossing strands and vice versa ().
In order to regenerate two separate DNA helices and thus terminate
the pairing process, the two crossing strands must be cut. If the crossing strands
are cut
before isomerization, the two original DNA helices separate from each
other nearly unaltered, with only a very short piece of single-stranded DNA
exchanged. If the crossing strands are cut
after isomerization, however, one section of
each original DNA helix is joined to a section of the other DNA helix; in other
words, the two DNA helices have crossed over (see ).
The isomerization of the cross-strand exchange should occur
spontaneously at some rate, but it may also be enzymatically driven or otherwise regulated
by cells. Some kind of control probably operates during meiosis, when the
two DNA double helices that pair are constrained in an elaborate structure called
the synaptonemal complex (discussed in Chapter 20).
Gene Conversion Results from Combining General Recombination and Limited DNA
Synthesis 44
It is a fundamental law of genetics that each parent makes an equal genetic
contribution to the offspring, one complete set of genes being inherited from
the father and one from the mother. Thus, when a diploid cell undergoes meiosis
to produce four haploid cells (discussed in Chapter 20), exactly half of the genes
in these cells should be maternal (genes that the diploid cell inherited from
its mother) and the other half paternal (genes that the diploid cell inherited
from its father). In a complex animal, such as a human, it is not possible to check
this prediction directly. But in other organisms, such as fungi, where it is possible
to recover and analyze all four of the daughter cells produced from a single cell
by meiosis, one finds cases in which the standard genetic rules have apparently
been violated. Occasionally, for example, meiosis yields three copies of the
maternal version of a gene (allele) and only one copy of the paternal allele, indicating
that one of the two copies of the paternal allele has been changed to a copy of the
maternal allele. This phenomenon is known as gene
conversion. It often occurs in association with general genetic recombination events, and it is thought to be
important in the evolution of certain genes (see Figure 8-74). Gene conversion
is believed to be a straightforward consequence of the mechanisms of general
recombination and DNA repair.
During meiosis heteroduplex joints are formed at the sites of
crossing-over between homologous maternal and paternal chromosomes. If the maternal
and paternal DNA sequences are slightly different, the heteroduplex joint may
include some mismatched base pairs. The resulting mismatch in the double helix
may then be corrected by the DNA repair machinery, which either can erase
nucleotides on the paternal strand and replace them with nucleotides that match
the maternal strand or vice versa. The consequence of this mismatch repair will
be a gene conversion. Gene conversion can also take place by a number of
other mechanisms, but they all require some type of general recombination event
that brings two copies of a closely related DNA sequence together. Because an
extra copy of one of the two DNA sequences is generated, a limited amount of
DNA synthesis must also be involved. Genetic studies show that usually only small
sections of DNA undergo gene conversion, and in many cases only part of a gene
is changed.
Figure 6-66
.
One general recombination pathway that can cause gene conversion
The process begins when a nick is formed in one of the
strands in the
red DNA helix. In step 1 DNA polymerase begins the synthesis of
an extra copy of a strand in the
red helix, displacing the original copy as
a single strand. This single strand then pairs with the homologous region
of the
green helix in the manner shown in . In step 2 the
short region of unpaired
green strand produced in step 1 is degraded,
completing the transfer of nucleotide sequences. The result is normally seen in the
next cell cycle, after DNA replication has separated the two
nonmatching strands (step 3). As described in the text, the repair of mismatched
base pairs in a heteroduplex joint also causes gene conversion.
Gene conversion can also occur in mitotic cells, but it does so more
rarely. As in meiotic cells, some gene conversions in mitotic cells probably result
from a mismatch repair process operating on heteroduplex DNA. Another
likely mechanism in both meiotic and mitotic cells is illustrated in .
Mismatch Proofreading Can Prevent Promiscuous
Genetic Recombination Between Two Poorly Matched
DNA Sequences 45
As previously discussed, general recombination is triggered whenever two
DNA strands of complementary sequence pair to form a heteroduplex joint
between two double helices (see ). Experiments carried out
in vitrowith purified RecA protein show that pairing can occur efficiently even when the
sequences of the two DNA strands do not match well - when, for example, only
four out of every five nucleotides on average can form base pairs. How, then, do
vertebrate cells avoid promiscuous general recombination between the many
thousands of copies of closely related DNA sequences that are repeated in their
genomes (see p. 395)?
Figure 6-67
.
Proofreading prevents general recombination from destabilizing genomes that contain repeated sequences
Studies with bacterial and yeast cells suggest
that the mismatch proofreading system diagrammed previously in has the additional function shown here.
Although the answer is not known, studies with bacteria and yeasts
demonstrate that the same mismatch proofreading system that removes
replication errors (see ) has the additional role of interrupting genetic
recombination events between imperfectly matched DNA sequences. It has long
been known, for example, that homologous genes in two closely related
bacteria,
Escherichia coli and
Salmonella
typhimurium, generally will not recombine,
even though their nucleotide sequences are 80% identical; when the mismatch
proofreading system is inactivated by mutation, however, there is a 1000-fold
increase in the frequency of such interspecies recombination events. It is thought,
then, that the mismatch proofreading system normally recognizes the mispaired
bases in an initial strand exchange and prevents the subsequent steps required to
break and rejoin the two paired DNA helices. This mechanism protects the
bacterial genome from the sequence changes that would otherwise be caused by
recombination with foreign DNA molecules that occasionally enter the cell. In
vertebrate cells, which contain many closely related DNA sequences, the same
type of proofreading is thought to help prevent promiscuous recombination
events that would otherwise scramble the genome ().
Site-specific Recombination Enzymes Move Special
DNA Sequences into and out of Genomes 46
Site-specific genetic recombination, unlike general recombination, is guided
by a recombination enzyme that recognizes specific nucleotide sequences
present on one or both of the recombining DNA molecules. Base-pairing between
the recombining DNA molecules need not be involved, and even when it is, the
heteroduplex joint that is formed is only a few base pairs long. By separating
and joining double-stranded DNA molecules at specific sites, this type of
recombination enables various types of mobile DNA sequences to move about within
and between chromosomes.
Figure 6-68
.
The insertion of bacteriophage lambda DNA into the bacterial chromosome
In this example of site-specific recombination, the lambda
integrase enzyme binds to a specific "attachment site" DNA sequence
on each chromosome, where it makes cuts that bracket a short
homologous DNA sequence; the integrase thereby switches the partner strands
and reseals them so as to form a heteroduplex joint 7 base pairs
long. Each of the four strand-breaking and strand-joining reactions
required resembles that made by a DNA topoisomerase, inasmuch as
the energy of a cleaved phosphodiester bond is stored in a transient
covalent linkage between the DNA and the enzyme (see ).
Site-specific recombination was first discovered as the means by which
a bacterial virus, bacteriophage
lambda, moves its genome into and out of the
E. coli chromosome. In its integrated state the virus is hidden in the
bacterial chromosome and replicated as part of the host's DNA. When the virus enters
a cell, a virus-encoded enzyme called
lambda
integrase is synthesized. This enzyme catalyzes a recombination process that begins when several molecules of
the integrase protein bind tightly to a specific DNA sequence on the circular
bacteriophage chromosome. The resulting DNA-protein complex can now bind to
a related but different specific DNA sequence on the bacterial chromosome,
bringing the bacterial and bacteriophage chromosomes close together. The
integrase then catalyzes the required DNA cutting and resealing reactions, using a short
region of sequence homology to form a tiny heteroduplex joint at the point of
union (). The integrase resembles a DNA topoisomerase in that it forms
a reversible covalent linkage to DNA wherever it breaks a DNA chain.
The same type of site-specific recombination mechanism can also be
carried out in reverse by the lambda bacteriophage, enabling it to exit from its
integration site in the E. coli chromosome in order to multiply rapidly within the
bacterial cell. This excision reaction is catalyzed by a complex of the integrase
enzyme with a second bacteriophage protein, which is produced by the virus
only when its host cell is stressed. If the sites recognized by such a
recombination enzyme are flipped, the DNA between them will be inverted rather than
excised (see Figure 9-57).
Figure 6-69
.
Using a site-specific recombination enzyme to turn on a gene in a group of cells in a transgenic animal
(A) The DNA molecule shown has been
engineered so that the gene of interest is transcribed only when a
site-specific recombination enzyme is activated, which both removes the marker
gene and brings the promoter next to the gene of interest. The
recombination enzyme is encoded by a second DNA molecule (not shown) that
is engineered so that the enzyme is made only when the temperature
is increased. Both DNA molecules are introduced into the chromosomes
of the same transgenic animal. When the temperature of this animal
is transiently increased, there is a brief burst of synthesis of
the recombination enzyme, which causes a DNA rearrangement in
an occasional cell such that the marker gene is removed and the gene
of interest is simultaneously activated. (B) The strategy can be used to
turn on a gene of interest permanently in small clones of cells in a
developing animal. The clones can be identified by their loss of the marker
gene product, which, for example, could cause a change in the pigmentation
of the cells. This technique therefore allows one to study the effect
of expressing any gene of interest in a group of cells in an intact animal.
Many other enzymes that catalyze site-specific recombination
resemble lambda integrase in requiring a short region of identical DNA sequence on
the two regions of DNA helix to be joined. Because of this requirement, each
enzyme in this class is fastidious with respect to the DNA sequences that it
recombines, and it can be expected to catalyze one particular DNA joining event that is
useful to the virus, plasmid, transposable element, or cell that contains it. These
enzymes can be exploited as tools in transgenic animals to study the influence
of specific genes on cell behavior, as illustrated in .
Site-specific recombination enzymes that break and rejoin two DNA
double helices at specific sequences on each DNA molecule often do so in a
reversible way: as for lambda bacteriophage, the same enzyme system that joins two
DNA molecules can take them apart again, precisely restoring the sequences of the
two original DNA molecules. This type of recombination is therefore called conservative site-specific recombination to distinguish it from the mechanistically
distinct transpositional site-specific
recombination that we discuss next.
Transpositional Recombination Can Insert a Mobile Genetic Element into Any DNA
Sequence 47
Figure 6-70
.
Transpositional site-specific recombination
(A) Outline of the strand-breaking and
-rejoining events that lead to integration of the linear double-stranded DNA of
a retrovirus (red) into an animal cell chromosome
(blue). In an initial endonuclease step the
integrase enzyme makes a cut in one strand at each end of the viral DNA
sequence, exposing a protruding 3'-OH group. Each of these
3'-OH ends then directly attacks a
phosphodiester bond on opposite strands of a randomly selected site on a
target chromosome. This inserts the viral DNA sequence into the
target chromosome, leaving short gaps on each side that are filled in by
DNA repair processes. Because of the gap filling, this type of mechanism
leaves short repeats of target DNA sequence [3 to 12 nucleotides in length
(black), depending on the integrase enzyme] on either side of the integrated
DNA segment. (B) An atomic-level view of the attack by one DNA chain end
in (A) on a phosphodiester bond of the target DNA
(blue). This mechanism resembles that used in RNA
splicing, and is distinctly different from the topoisomerase-like activity of
lambda integrase. (Adapted from
K. Mizuuchi, J. Biol. Chem. 267:21273-21276, 1992.)
Many mobile DNA sequences, including many viruses and transposable
elements, encode integrases that insert their DNA into a chromosome by a
mechanism that is different from that used by bacteriophage lambda. Like the
lambda integrase, each of these enzymes recognizes a specific DNA sequence in the
particular mobile genetic element whose recombination it catalyzes. Unlike
the lambda enzyme, however, these integrases do not require a specific DNA
sequence in the "target" chromosome and they do not form a heteroduplex
joint. Instead, they introduce cuts into both ends of the linear DNA sequence of
the mobile genetic element and then catalyze a direct attack by these DNA ends
on the target DNA molecule, breaking two closely spaced phosphodiester bonds
in the target molecule. Because of the way that these breaks are made, two
short single-stranded gaps are left in the recombinant DNA molecule, one at each
end of the mobile element; these are filled in by DNA polymerase to complete
the recombination process. As illustrated in , this mechanism creates
a short duplication of the adjacent target DNA sequence; such flanking
duplications are the hallmark of a transpositional site-specific recombination event.
An integrase enzyme of this type was first purified in active form from
bacteriophage Mu. Like the bacteriophage lambda integrase, it carries out all of
its cutting and rejoining reactions without requiring an energy source (such as
ATP). Very similar enzymes are present in organisms as diverse as bacteria, fruit
flies, and humans - all of which contain mobile genetic elements, as we discuss next.
Summary
Genetic recombination mechanisms allow large sections of DNA double helix to
move from one chromosome to another. There are two broad classes of
recombination events. In general recombination the initial reactions rely on extensive
base-pairing interactions between strands of the two DNA double helices that will recombine.
As a result, general recombination occurs only between two homologous DNA
molecules, and although it moves sections of DNA back and forth between chromosomes, it
does not normally change the arrangement of the genes in a chromosome.
Site-specific recombination, on the other hand, alters the relative positions of nucleotide
sequences in chromosomes because the pairing reactions depend on a
protein-mediated recognition of the two DNA sequences that will recombine, and extensive
sequence homology is not required. Two site-specific recombination mechanisms
are common: (1) conservative site-specific recombination, which produces a very
short heteroduplex and therefore requires some DNA sequence that is the same on the
two DNA molecules, and (2) transpositional site-specific recombination, which
produces no heteroduplex and usually does not require a specific sequence on the target DNA.
Viruses, Plasmids, and Transposable
Genetic Elements 48
Introduction
In our description of the basic genetic mechanisms, we have so far focused
on their selective advantage for the cell. We saw that the short-term survival of
the cell depends on the maintenance of genetic information by DNA repair, while
the multiplication of the cell requires rapid and accurate DNA replication. On
a longer time scale the appearance of genetic variants, on which evolution of
the species depends, is greatly facilitated by the reassortment of genes and the
occasional rearrangement of DNA sequences caused by genetic recombination.
We shall now examine a group of genetic elements that seem to act as
parasites, subverting the genetic mechanisms of the cell for their own benefit. These
genetic elements are interesting in their own right. In addition, because they must
heavily exploit the metabolism of the host cell in order to multiply, they serve as
powerful tools for investigating the normal cell machinery.
Many DNA sequences can replicate independently of the rest of the
genome. Such sequences have widely different degrees of independence from their
host cells. Of these, virus chromosomes are the most independent because they
have a protein coat that allows them to move freely from cell to cell. To varying
degrees, the viruses are closely related to plasmids and transposable elements,
which are DNA sequences that lack a coat and are therefore more
host-cell-dependent and confined to replicate within a single cell and its progeny. More primitive
still are some DNA sequences that are suspected of being mobile because they
are repeated many times in a cell's chromosome. They move or multiply so
rarely, however, that it is not clear if they should be considered as separate genetic
elements at all.
We begin our discussion with viruses, which are the best understood of
the mobile genetic elements. Then we describe the properties of plasmids and
transposable elements, some of which bear a remarkable resemblance to viruses
and may in fact have been their ancestors. The many repetitive DNA sequences
in vertebrate chromosomes are discussed in Chapter 8.
Viruses Are Mobile Genetic Elements 49
Viruses were first described as disease-causing agents that can multiply only
in cells and that by virtue of their tiny size pass through ultrafine filters that
hold back even the smallest bacteria. Before the advent of the electron
microscope, their nature was obscure, although it was suspected that they might be
naked genes that had somehow acquired the ability to move from one cell to
another. The use of ultracentrifuges in the 1930s made it possible to separate viruses
from host cell components, and by the early 1940s the generalization emerged that
all viruses contain nucleic acids. The idea that viruses and genes carry out
similar functions was confirmed by studies on
bacteriophages, which are bacterial viruses. In 1952 it was shown for the bacteriophage T4 that only the phage
DNA, and not the phage protein, enters the bacterial host cell and initiates the
replication events that lead to the production of several hundred progeny viruses
in every infected cell.
These observations led to the notion of viruses as genetic elements
enclosed by a protective coat that enables them to move from one cell to another.
Virus multiplication per se is often lethal to the cells in which it occurs; in many
cases the infected cell breaks open (lyses) and thereby allows the progeny viruses
access to nearby cells. Many of the clinical manifestations of viral infection
reflect this cytolytic effect of the virus. Both the cold sores formed by herpes
simplex virus and the lesions caused by smallpox, for example, reflect the killing of
the epithelial cells in a local area of the skin.
As we shall see, the type of nucleic acid in a virus, the structure of its
coat, its mode of entry into the host cell, and its mechanism of replication once
inside all vary from one type of virus to another.
The Outer Coat of a Virus May Be a Protein Capsid
or a Membrane Envelope 50
Figure 6-71
.
The simplest of all viral life cycles
The hypothetical virus shown consists of a small
double-stranded DNA molecule that codes for only a single viral capsid protein.
No known virus is this simple.
Initially, it was thought that the outer coat of a virus might be constructed
from a single type of protein molecule. Viral infections were believed to start with
the dissociation of the viral chromosome (its nucleic acid) from its protein coat,
followed by replication of the chromosome inside the host cell, to form many
identical copies. After the synthesis of new copies of the virus-specific coat
protein from virally encoded messenger RNA molecules, formation of the progeny
virus particles would occur by the spontaneous assembly of these coat protein
molecules around the progeny viral chromosomes ().
Figure 6-72
.
The capsids of some viruses, all shown at the same scale
(A) Tomato bushy stunt virus;
(B) poliovirus; (C) simian virus 40 (SV40); (D) satellite tobacco necrosis
virus. The structures of all of these capsids have been determined by
x-ray crystallography and are known in atomic detail. (Courtesy of
Robert Grant, Stephan Crainic, and James M. Hogle.)
Figure 6-73
.
Acquisition of a viral envelope
(A) Electron micrograph of a thin section of an animal cell
from which several copies of an enveloped virus (Semliki forest virus)
are budding. (B) Schematic view of the envelope assembly and
budding process. Whereas the lipid bilayer that surrounds the capsid is
parasitized directly from the plasma membrane of the host cell, the only proteins
in this lipid bilayer are those encoded by the viral genome. (A, courtesy of
M. Olsen and G. Griffiths.)
Figure 6-74
.
The coats of viruses
These electron micrographs of negatively stained virus particles are all at the same scale. (A) Bacteriophage T4, a large DNA-containing virus that infects E. coli. The DNA is stored in the bacteriophage head and injected into the bacterium through the
cylindrical tail. (B) Potato virus X, a filamentous plant virus that contains an
RNA genome. (C) Adenovirus, a DNA-containing virus that can infect
human cells. The protein capsid forms the outer surface of this virus. (D) Influenza virus, a large RNA-containing animal virus whose protein capsid is
further enclosed in a lipid-bilayer-based envelope containing protruding spikes
of viral glycoprotein. (A, courtesy of James Paulson; B, courtesy of
Graham Hills; C, courtesy of Mei Lie Wong; D, courtesy of R.C. Williams and
H.W. Fisher.)
It is now known that these ideas vastly oversimplify the diversity of virus
life cycles. The protein shell that surrounds the nucleic acid of most viruses
(the capsid), for example, contains more than one type of polypeptide chain,
often arranged in several layers (). In many viruses, moreover, the
protein capsid is further enclosed by a lipid bilayer membrane that contains
proteins. Many of these
enveloped viruses acquire their envelope in the process of
budding from the plasma membrane (). This budding process allows the
virus particles to leave the cell without disrupting the plasma membrane
and, therefore, without killing the cell. Electron micrographs that emphasize the
differences among viral coats are presented in .
Viral Genomes Come in a Variety of Forms and Can
Be Either RNA or DNA 51
As discussed earlier, the DNA double helix has the advantages of stability and
easy repair. If one polynucleotide chain is accidentally damaged, its
complementary chain permits the damage to be readily corrected. This concern with repair,
however, need not bother small viral chromosomes that contain only several
thousand nucleotides. The chance of accidental damage is very small compared
with the risk to a cell genome containing millions of nucleotides.
Figure 6-75
.
Schematic drawings of several types of viral genomes
The smallest viruses contain only a few genes and can have an RNA or a DNA genome; the
largest viruses contain hundreds of genes and have a double-stranded DNA genome.
Some examples of these types of viruses are as follows: single-stranded RNAtobacco mosaic virus, bacteriophage R17, poliovirus; double-stranded RNAreovirus; single-stranded DNAparvovirus; single-stranded circular
DNAM13 and fX174 bacteriophages; double-stranded circular
DNASV40 and polyomaviruses; double-stranded
DNAT4 bacteriophage, herpes virus; double-stranded DNA with covalently linked
terminal proteinadenovirus; double-stranded DNA with covalently sealed
endspoxvirus. The peculiar ends (as well as the circular forms) overcome the difficulty of replicating
the last few nucleotides at the end of a DNA chain (see pp. 388 and 364).
The genetic information of a virus can, therefore, be carried in a variety
of unusual forms, including RNA instead of DNA. A viral chromosome may be
a single-stranded RNA chain, a double-stranded RNA helix, a circular
single-stranded DNA chain, or a linear single-stranded DNA chain. Moreover,
although some viral chromosomes are simple linear DNA double helices, circular
DNA double helices and more complex linear DNA double helices are also
common. Several viruses have protein molecules covalently attached to the
5' ends of their DNA strands, for example, and the DNA double helices from the very large
poxviruses have their opposite strands at each end covalently joined
through phosphodiester linkages ().
A Viral Chromosome Codes for Enzymes Involved
in the Replication of Its Nucleic Acid 52
Figure 6-76
.
The T4 bacteriophage chromosome, showing the positions of the more than 30 genes involved in T4 DNA replication
The genome of bacteriophage T4 consists
of 169,000 nucleotide pairs and encodes about 300 different proteins.
Each type of viral genome requires unique enzymatic tricks for its replication
and thus must encode not only the viral coat protein but also one or more of
the enzymes needed to replicate the viral nucleic acid. The amount of
information that a virus brings into a cell to ensure its own selective replication varies
greatly. The DNA of the relatively large bacteriophage T4, for example, contains about
300 genes, including at least 30 genes that ensure the rapid replication of the T4
chromosome in its
E. coli host cell (). T4 DNA replication has the
unusual feature that 5-hydroxymethyl-C is incorporated in place of C in its DNA.
The unusual base composition of the T4 DNA makes it readily distinguishable
from host DNA and selectively protects it from nucleases encoded in the T4
genome that thus degrade only the
E. coli chromosome. Still other proteins alter host
cell RNA polymerase molecules so that they are unable to transcribe
E. coli DNA and instead transcribe different sets of bacteriophage genes at different stages of
infection, according to the needs of the phage.
Smaller DNA viruses, such as the monkey virus SV40 and the tiny
bacterio-phage M13, carry much less genetic information. They rely heavily on
host-cell enzymes to carry out their DNA synthesis, parasitizing most of the host-cell
DNA replication proteins. Most DNA viruses, however, code for proteins that
selectively initiate the synthesis of their own DNA, recognizing a particular nucleotide
sequence in the virus that serves as a replication
origin. This is important because a virus must override the cellular control signals that would otherwise cause
the viral DNA to replicate in pace with the host cell DNA, doubling only once in
each cell cycle. We do not yet understand very much about how eucaryotic cells
regulate their own DNA synthesis, and the mechanisms used by viruses to
escape from this regulation - which are much more accessible to study - provide
insights into the host mechanisms.
RNA viruses have particularly specialized requirements for replication,
since to reproduce their genomes they must copy RNA molecules, which means
polymerizing nucleoside triphosphates on an RNA template. Cells normally do
not have enzymes to carry out this reaction, so even the smallest RNA viruses
must encode their own RNA-dependent polymerase enzymes in order to replicate.
We now look in more detail at the replication mechanisms of the various types of
viruses.
Both RNA Viruses and DNA Viruses Replicate Through
the Formation of Complementary Strands 53
Like DNA replication, the replication of the genomes of RNA viruses
occurs through the formation of complementary strands. For most RNA viruses
this process is catalyzed by specific RNA-dependent RNA polymerase enzymes
(replicases). These enzymes are encoded by the viral RNA chromosome and are
often incorporated into the progeny virus particles, so that upon entry of the
virus into a cell, they can immediately begin replicating the viral RNA. Replicases
are always packaged into the capsid of the so-called negative-strand RNA viruses, such as influenza or vesicular stomatitis virus. Negative-strand viruses are
so called because the infecting single strand does not code for protein; instead
its complementary strand carries the coding sequences. Thus the infecting
strand remains impotent without a preformed replicase. In contrast, the viral RNA
of positive-strand RNA viruses, such as poliovirus, can serve as mRNA and
produce a replicase once it enters the cell; therefore the naked genome itself is infectious.
The synthesis of viral RNA always begins at the
3' end of the RNA template, starting with the synthesis of the
5' end of the new viral RNA molecule and progressing in the
5'-to-3' direction until the 5' end of the template is reached.
There are no error-correcting mechanisms for viral RNA synthesis, and error rates
are similar to those in DNA transcription (about 1 error in
104 nucleotides synthesized). This is not a serious deficiency as long as the RNA chromosome is
relatively short; for this reason the genomes of all RNA viruses are small relative
to those of the large DNA viruses.
All DNA viruses begin their replication at a replication origin, where
special initiator proteins bind and then attract the replication enzymes of the host
cell (see
Figure 8-34). There are many different replication pathways, however.
The complexity of these diverse replication schemes reflects, in part, the problem
of replicating the ends of a simple linear DNA molecule, given a DNA
polymerase enzyme that cannot begin synthesis without a primer (see pp. 253-254). DNA
viruses have solved this problem in a variety of ways: some have circular DNA
genomes and thus no ends; others have linear DNA genomes that repeat their
terminal sequences or end in loops; while still others have special terminal
proteins that serve to prime the DNA polymerase directly (see ).
Viruses Exploit the Intracellular Traffic Machinery
of their Host Cells 54
All viruses have only a limited amount of nucleic acid in their genome, and
so they must parasitize host-cell pathways for most of the steps in their
reproduction. In fact, because viral products are usually synthesized in large
amounts during infection, and because during its life cycle the virus follows a
sequential route through the compartments of the host cell, virus-infected cells have
served as important models for tracing the pathways of intracellular transport and
for studying how essential biosynthetic reactions are compartmentalized in
eucaryotic cells.
Figure 6-77
.
The structure of Semliki forest virus
Schematic drawings of a cross-section (A) and an
exploded three-dimensional view (B) of the virus. (C) A
three-dimensional reconstruction of the surface of
the virus derived from cryoelectron micrographs of unstained
specimens. The virus has a total mass of 46 million daltons. (B, adapted from
S.C. Harrison, Curr. Opin. Struct. Biol. 2:293-299, 1992. Current Science;
C, courtesy of Stephen Fuller.)
Enveloped animal viruses, in which the genome is enclosed in a
lipid-bilayer membrane, have exploited the compartmentalization of the cell to an
especially fine degree. To follow the life cycle of an enveloped virus is to take a tour
through the cell. A well-studied example is
Semliki forest
virus, which consists of a single-stranded RNA genome surrounded by a
capsid formed by a regularly arranged icosahedral (20-faced) shell composed of many copies of one protein (called
C protein). The
nucleocapsid (genome + capsid) is surrounded by a closely
apposed lipid bilayer that contains only three types of polypeptide chains, each
encoded by the viral RNA. These envelope proteins form heterotrimers that span the
lipid bilayer and interact with the C protein of the nucleocapsid, linking the
membrane and nucleocapsid together (). The glycosylated portions of the
envelope proteins are always on the outside of the lipid bilayer, and each trimer
forms a "spike" that can be seen in electron micrographs projecting outward from
the surface of the virus ().
Infection is initiated when an envelope protein on the virus binds to a
normal cell protein that serves as its receptor on the host-cell plasma membrane.
The virus then uses the cell's normal endocytic pathway to enter the cell by
receptor-mediated endocytosis and is delivered to early endosomes (discussed
in Chapter 13). But instead of being transferred from endosomes to lysosomes,
the virus escapes from the endosome by virtue of the special properties of one of
its envelope proteins. At the acidic pH of the endosome, this protein causes the
viral envelope to fuse with the endosome membrane, releasing the bare
nucleocapsid into the cytosol. The nucleocapsid is "uncoated" in the cytosol, releasing the
viral RNA, which is then translated by host-cell ribosomes to produce a
virus-encoded RNA polymerase. This in turn makes many copies of viral RNA, some of
which serve as mRNA molecules to direct the synthesis of the structural proteins of
the virus - the capsid C protein and the three envelope proteins.
Figure 6-78
.
The life cycle of the Semliki forest virus
The virus parasitizes the host cell for most of
its biosyntheses.
The newly synthesized capsid and envelope proteins follow separate
pathways through the cytoplasm. The envelope proteins, like the plasma
membrane proteins of the host cell, are synthesized by ribosomes that are bound to
the rough ER; in contrast, the capsid protein, like the cytosolic proteins of the
cell, is synthesized by ribosomes that are not membrane bound. The newly
synthesized capsid protein binds to the recently replicated viral RNA to form
new nucleocapsids. The envelope proteins, in contrast, are inserted into the
membrane of the ER, where they are glycosylated, transported to the Golgi
apparatus, and then delivered to the plasma membrane ().
The viral nucleocapsids and envelope proteins finally meet at the
plasma membrane. As a result of a specific interaction with a cluster of envelope
proteins, the nucleocapsid forms a bud whose envelope contains the envelope
proteins embedded in host-cell lipids. Finally, the bud pinches off and a free virus is
released on the outside of the cell. The clustering of envelope proteins as they
assemble around the nucleocapsid during viral budding excludes the host
plasma membrane proteins from the final virus particle.
Different Enveloped Viruses Bud from Different
Cellular Membranes 55
Figure 6-79
.
Two enveloped viruses that bud from different domains of the plasma membrane
Electron micrographs showing that one type
of enveloped virus buds from the apical plasma membrane while another
type buds from the basolateral plasma membrane of the same epithelial
cell line grown in culture. These cells grow with their basal surface
attached to the culture dish. The boxed area in each schematic drawing
corresponds to the indicated electron micrograph. (Micrographs courtesy of
E. Rodriguez-Boulan and D.D. Sabatini.)
Viral envelope proteins are all transmembrane proteins that are synthesized
in the ER. Like other ER proteins, they carry sorting signals that direct them to
a particular cell membrane (discussed in
Chapter 13). Their final location
determines the site of viral budding. Epithelial cell lines, for example, can form
polarized cell sheets when they are cultured on an appropriate surface, such as a
collagen-coated porous filter. When viruses infect such polarized cells,
which maintain distinct domains of apical and basolateral plasma membrane, some
of them (such as influenza virus) bud exclusively from the apical plasma
membrane, whereas others (such as Semliki forest virus and vesicular stomatitis virus)
bud only from the basolateral plasma membrane (). This polarity of
budding reflects the presence on the envelope proteins of distinct apical
or basolateral sorting signals, which direct the proteins to only one
cell-surface domain; the proteins in turn cause the virus to assemble in that domain.
Other viruses have envelope proteins with different kinds of sorting
signals. Herpes virus, for example, is a DNA virus that replicates in the nucleus, where
its nucleocapsid assembles, and then acquires an envelope by budding through
the inner nuclear membrane into the ER lumen; the envelope proteins therefore
must be specifically transported from the ER membrane to the inner nuclear
membrane, probably via the lipid bilayer that surrounds the nuclear pores. Flavivirus, in contrast, buds directly into the ER lumen, and bunyavirus buds into the Golgi apparatus, indicating that their envelope proteins carry signals for retention
in the ER and Golgi membranes, respectively. After budding, the enveloped
herpes virus, flavivirus, and bunyavirus particles become soluble in the ER and
Golgi lumen, and they move outward toward the cell surface exactly as if they
were secreted proteins; in the trans Golgi network they are incorporated into
transport vesicles and secreted from the cell by the constitutive secretory pathway
(discussed in Chapter 13).
Viral Chromosomes Can Integrate
into Host Chromosomes 56
The end result of the entry of a viral chromosome into a cell is not always
its immediate multiplication to produce large numbers of progeny. Many
viruses enter a latent state, in which their genomes are present but inactive in the
cell and no progeny are produced. Viral latency was discovered when it was
found that exposure to ultraviolet light induced many apparently uninfected
bacteria to produce progeny bacteriophages. Subsequent experiments showed that
these lysogenic bacteria carry in their chromosomes a dormant but complete viral
chromosome. Such integrated viral chromosomes are called
proviruses.
Bacteriophages that can integrate their DNA into bacterial chromosomes
are known as
temperate bacteriophages. The prototypic example is the
bacteriophage lambda, discussed earlier. When lambda infects a suitable
E. coli host cell, it normally multiplies to produce several hundred progeny particles, which
are released when the bacterial cell lyses; this is called a
lytic infection. More rarely, the free ends of the linear infecting DNA molecules join to form a DNA circle
that becomes integrated into the circular host
E.
coli chromosome by a site-specific recombination event. The resulting lysogenic bacterium, carrying the
proviral lambda chromosome, multiples normally until it is subjected to an
environmental insult, such as exposure to ultraviolet light or ionizing radiation. The
resulting cell debilitation induces the integrated provirus to leave the host
chromosome and begin a normal cycle of viral replication. In this way the integrated
provirus need not perish with its damaged host cell but has a chance to escape to
other
E. coli cells ().
The Continuous Synthesis of Some Viral Proteins
Can Make Cells Cancerous 57
Animal cells, like bacteria, can offer viruses an alternative to lytic growth. Permissive cells permit DNA viruses to multiply lytically and kill the cell. Nonpermissive cells may allow the DNA virus to enter but not to replicate lytically; in a
small percentage of such cells the viral chromosome either becomes integrated into
the host cell genome, where it is replicated along with the host chromosomes,
or forms a plasmida circular DNA moleculethat replicates in a controlled
fashion without killing the cell. Such nonpermissive infections sometimes result
in a genetic change in the host cell, causing it to proliferate in an ill-controlled
way and thus transforming it into its cancerous equivalent. In this case the DNA
virus is called a DNA tumor virus and the process is called virus-mediated neoplastic transformation. The most extensively studied DNA tumor viruses are
two papovaviruses, SV40 and polyoma. Their transforming ability has been traced
to several viral proteins that cooperate to stimulate quiescent cells to
proliferate - that is, they drive the cells from
G0 into S phase. In permissive cells the shift
to S phase (the phase of the cell cycle where DNA is synthesized) provides the
virus with all of the host-cell replication enzymes required for viral DNA
synthesis. When a provirus happens to make these viral proteins in a nonpermissive
cell, they can override some of the normal growth control mechanisms in the cell
and its progeny. By this means some DNA tumor viruses that infect humans
are known to contribute to the development of some types of human cancers
(although the great majority of human cancers are thought not to involve
tumor viruses).
RNA Tumor Viruses Are Retroviruses 58
For one group of RNA viruses, the so-called RNA tumor
viruses, the infection of a permissive cell often leads simultaneously to a nonlethal release of
progeny virus from the cell surface by budding and a permanent genetic change in
the infected cell that makes it cancerous. How RNA virus infection could lead to
a permanent genetic alteration was unclear until the discovery of the enzyme reverse transcriptase, which transcribes the infecting RNA chains of these
viruses into complementary DNA molecules that integrate into the host cell
genome. RNA tumor viruses - which include the first well-known tumor virus, the
Rous sarcoma virus - are members of a large class of viruses known as
retroviruses. These viruses are so named because as part of their normal life cycle they
reverse the normal process in which DNA is transcribed into RNA.
Figure 6-81
.
Reverse transcriptase
(A) The three-dimensional structure of the enzyme from HIV-1 (the
AIDS virus), determined by x-ray crystallography; (B) a schematic
view of a model for its activity on an RNA template. Note that the
polymerase domain (yellow) has a covalently attached RNAse domain
(red) that degrades an RNA strand in an
RNA/DNA helix. This activity helps the polymerase convert the
initial hybrid helix into a DNA double helix. (A, courtesy of Tom Steitz; B,
adapted from L.A. Kohlstaedt et al. ,
Science 256:1783-1790, 1992. © 1992
the AAAS.)
Figure 6-82
.
The life cycle of a retrovirus
The retrovirus genome consists of an RNA molecule of
about 8500 nucleotides; two such molecules are packaged into each viral
particle. The enzyme reverse transcriptase
first makes a DNA copy of the viral RNA molecule and then a second
DNA strand, generating a double-stranded DNA copy of the RNA genome.
The integration of this DNA double helix into the host chromosome,
catalyzed by the viral integrase, is required for the synthesis of new viral
RNA molecules by the host-cell RNA polymerase.
The enzyme reverse transcriptase is an unusual DNA polymerase that
uses either RNA or DNA as a template (); it is encoded by the
retrovirus RNA and is packaged inside each viral capsid during the production of new
virus particles. When the single-stranded RNA of the retrovirus enters a cell,
the reverse transcriptase brought in with the capsid first makes a DNA copy of
the RNA strand to form a DNA-RNA hybrid helix, which is then used by the
same enzyme to make a double helix with two DNA strands. The two ends of the
linear viral DNA molecule are recognized by a virus-encoded integrase that
catalyzes the insertion of the viral DNA into virtually any site on a host-cell
chromosome (see ). The next step in the infectious process is
transcription of the integrated viral DNA by host-cell RNA polymerase, producing large
numbers of viral RNA molecules identical to the original infecting genome.
Finally, these RNA molecules are translated to produce the capsid, envelope, and
reverse transcriptase proteins that are assembled with the RNA into new enveloped
virus particles, which bud from the plasma membrane ().
Both RNA and DNA tumor viruses transform cells because the
permanent presence of the viral DNA in the cell causes the synthesis of new proteins
that alter the control of host-cell proliferation. The genes that code for such
proteins are called oncogenes. Unlike DNA tumor viruses, whose oncogenes typically
encode normal viral proteins essential for viral multiplication, the oncogenes
carried by RNA tumor viruses are modified versions of normal host-cell genes that
are not required for viral replication. Since only a limited amount of RNA can
be packed into the capsid of a retrovirus, the acquired oncogene sequences
often replace an essential part of the retroviral genome. In Chapters 15 and 24 we
discuss how viral oncogenes have provided important clues to the causes and
nature of cancer, as well as to the normal mechanisms that control cell growth
and division in multicellular animals. We also discuss how the random integration
of viral DNA into genomes can alter normal genes and thereby affect cell
behavior (see Figure 24-24).
The Virus That Causes AIDS Is a Retrovirus 59
In 1982 physicians first became aware of a new sexually transmitted disease
that was associated with an unusual form of cancer (Kaposi's sarcoma) and a
variety of unusual infections. Because both of these problems reflect a severe
deficiency in the immune system - specifically in helper T lymphocytes - the disease
was named acquired immune deficiency syndrome
(AIDS). By culturing lymphocytes from patients with an early stage of the disease, a retrovirus was isolated that
is now known to be the causative agent of AIDS, which has become a
rapidly spreading epidemic that threatens to kill millions of people worldwide.
The retrovirus, called human immunodeficiency virus
(HIV), enters helper T lymphocytes by first binding to a functionally important plasma membrane
protein called CD4 (discussed in Chapter 23). There are two features of HIV
that make it especially deadly. First, it eventually kills the helper T cells that it
infects rather than living in symbiosis with them, as do most other retroviruses,
and helper T cells are vitally important in defending us against infection. Second,
the provirus tends to persist in a latent state in the chromosomes of an infected
cell without producing virus until it is activated by an unknown rare event; this
ability to hide greatly complicates any attempt to treat the infection with antiviral drugs.
Figure 6-83
.
A map of the HIV genome
The genome consists of about 9000 nucleotides and
contains nine genes, whose locations are shown in
green and
red. Three of the genes
(
green) are common to all retroviruses:
gag encodes capsid proteins,
env encodes
envelope proteins, and
pol encodes both the reverse transcriptase (see ) and the integrase (see ) proteins. The HIV genome is unusually complex, since it
contains six small genes (in
red) in addition
to the three (in
green) that are normally required for the retrovirus life
cycle. At least some of these small genes encode proteins that regulate
viral gene expression, and it is tempting to speculate that it is this
extra complexity that makes HIV so deadly. As indicated by the
red lines,RNA splicing (see
Figure 8-7) is required
to produce the Rev and Tat proteins.
Much current research on AIDS is aimed at understanding the life cycle
of HIV. The complete nucleotide sequence of the viral RNA has been
determined. This has made it possible to identify and study each of the proteins that it
encodes. The three-dimensional structure of its reverse transcriptase (see ) is being used to help design new drugs that inhibit the enzyme. The
nine genes of this retrovirus are displayed on the HIV genetic map in .
Some Transposable Elements Are Close Relatives
of Retroviruses 60
Because many viruses can move into and out of their host chromosomes,
any large genome is likely to contain a number of different proviruses. Most
genomes are also likely to house a variety of mobile DNA sequences that do not form
viral particles and cannot leave the cell. Such transposable
elements range in length from a few hundred to tens of thousands of base pairs, and they are
usually present in multiple copies per cell. One can consider these elements as tiny
parasites hidden in chromosomes. Each transposable element is occasionally
activated to move to another DNA site in the same cell by a process called transposition, catalyzed by its own site-specific recombination enzyme. These
integrases, also referred to as transposases, are often encoded in the DNA of the element
itself. Since most transposable elements move only very rarely (once in
105 cell generations for many elements in bacteria), it is often difficult to distinguish
them from nonmobile parts of the chromosome. It is not known what suddenly
triggers their movement.
Transposition can occur by a variety of mechanisms. One large family
of transposable elements uses a mechanism that is indistinguishable from part
of a retrovirus life cycle. These elements, called
retrotransposons, are present in organisms as diverse as yeasts, flies, and mammals. One of the
best-understood retrotransposons is the so-called Ty1 element of yeasts. The first step in its
transposition is the transcription of the entire transposable element, producing
an RNA copy of the element that is more than 5000 nucleotides long. This
transcript encodes a reverse transcriptase enzyme that makes a double-stranded DNA
copy of the RNA molecule via a RNA/DNA hybrid intermediate, precisely
mimicking the early stages of infection by a retrovirus (see ). The analogy
continues as the linear DNA molecule uses an integrase to integrate into a
randomly selected site on the chromosome. Although the resemblance to a retrovirus
is striking, unlike a retrovirus, the Ty1 element does not have a functional
protein coat and therefore can only move within a single cell and its progeny.
Other Transposable Elements Transfer Themselves Directly from One Site in the Genome to
Another 61
Figure 6-84
.
The direct movement of a transposable element from one chromosomal site to another
Transposable elements of this type can be recognized by the
"inverted repeat DNA sequences"
(
orange) at their ends. Experiments show
that these sequences, which can be as short as 20 nucleotides, are all that
is necessary for the DNA between them to be transposed by the
particular transposase enzyme associated with the element. The mechanism
shown here is closely related to that used by a retrovirus to integrate its
double-stranded DNA into a chromosome (compare with ).
Although the gap left in donor chromosome is resealed, the process often alters
the DNA sequence, causing a mutation at the donor site (not shown).
Unlike retrotransposons, many transposable elements rarely exist free of the
host chromosome; the transposases that catalyze their movement can act on the
DNA of the element while it is still integrated in the host genome. The
transposase binds to a short sequence that is repeated in reverse orientation at each end
of the element, thereby holding these two ends close together while catalyzing
the subsequent recombination event. The mechanism is closely related to that
used by the retrovirus integrase (see ). For some transposable elements
the transposition mechanism differs only in that the linear DNA molecule to be
integrated must be cut out of a much longer DNA molecule, leaving a break in
the vacated chromosome (). This break is subsequently resealed, but
in the process the DNA sequence is often altered, resulting in a mutation at the
old chromosomal site.
Figure 6-85
.
The replicative movement of a transposable element within a chromosome
The element shown replicates during transposition, its
movement occurring without it being excised from its original site. The two
inverted repeat DNA sequences that commonly flank the two ends
of transposable elements are shown in orange. At the start of transposition the transposase cuts one of the
two DNA strands at each end of the element, and the element then
serves as a template for DNA synthesis, which begins by the addition
of nucleotides to the 3' ends of chromosomal DNA sequences.
Many details are known, but the process is too complex to be illustrated here.
Other transposable elements replicate when they move. In the
best-studied example, a covalent connection is first made between the transposable
element and a randomly selected target site; this connection then triggers a localized
synthesis of DNA that results in one copy of the replicated transposable
element being inserted at a new chromosomal site, while the other copy remains at
the old one (). The mechanism is closely related to the
nonreplicative mechanism just described, and it starts in nearly the same way; indeed,
some transposable elements can move by either pathway.
In addition to moving themselves, all types of transposable elements
occasionally move or rearrange neighboring DNA sequences of the host genome.
They frequently cause deletions of adjacent nucleotide sequences, for example, or
carry them to another site. The presence of transposable elements makes the
arrangement of the DNA sequences in chromosomes much less stable than
previously thought, and it is likely that they have been responsible for many important
evolutionary changes in genomes (discussed in Chapter 8).
Are the transposable elements also of evolutionary importance as the
most ancient ancestors of viruses? Although the precursors of retroviruses were
almost certainly retrotransposons, all present-day transposable elements rely heavily
on DNA-based reaction mechanisms. But very early cells are thought to have
had RNA rather than DNA genomes, so we must look to RNA-based mechanisms
for the ultimate origin of viruses.
Most Viruses Probably Evolved from Plasmids 62
Even the largest viruses depend heavily on their host cells for biosynthesis;
no known virus makes its own ribosomes or generates the ATP it requires, for
example. Clearly, therefore, cells must have evolved before viruses. The
precursors of the first viruses were probably small nucleic acid fragments that developed
the ability to multiply independently of the chromosomes of their host cells.
Such independently replicating elements, called
plasmids, can replicate indefinitely outside the host chromosome. Plasmids occur in both DNA and RNA forms,
and, like viruses, they contain a special nucleotide sequence that serves as an
origin of replication. Unlike viruses, however, they cannot make a protein coat
and therefore cannot move from cell to cell in this way.
The first RNA plasmids may have resembled the viroids found in some plant cells. These small RNA circles, only 300 to 400 nucleotides long, are
replicated despite the fact that they do not code for any protein. Having no protein
coat, viroids exist as naked RNA molecules and pass from plant to plant only when
the surfaces of both donor and recipient cells are damaged so that there is no
membrane barrier for the viroid to pass. Under the pressure of natural selection,
such independently replicating elements could be expected to acquire
nucleotide sequences from the host cell that would facilitate their own multiplication,
including sequences that code for proteins. Some present-day plasmids are
indeed quite complex, encoding proteins and RNA molecules that regulate their
replication, as well as proteins that control their partitioning into daughter cells.
The largest known plasmids are double-stranded DNA circles more than 100,000
base pairs long.
The first virus probably appeared when an RNA plasmid acquired a
gene coding for a capsid protein. But a capsid can enclose only a limited amount
of nucleic acid; therefore a virus is limited in the number of genes it can
contain. Forced to make optimal use of their limited genomes, some small viruses
evolved overlapping genes, in which part of the nucleotide sequence encoding one
protein is used (in the same or a different reading frame) to encode a second
protein. Other viruses evolved larger capsids and consequently could
accommodate more genes.
With their unique ability to transfer nucleic acid sequences across
species barriers, viruses have almost certainly played an important part in the
evolution of the organisms they infect. Many recombine frequently with their
host-cell genome and with one another. In this way they can pick up small pieces of
host chromosome at random and carry them to different cells or organisms.
Moreover, integrated copies of viral DNA (proviruses) have become a normal part of
the genome of most organisms. Examples of such proviruses include the
lambda family of bacteriophages and the so-called endogenous retroviruses found
in numerous copies in vertebrate genomes. The integrated viral DNA can
become altered so that it cannot produce a complete virus but can still encode
proteins, some of which may be useful to the host cell. Therefore, viruses, like sexual
reproduction, can speed up evolution by promoting the mixing of gene pools.
The process in which DNA sequences are transferred between different
host-cell genomes by means of a virus is called DNA transduction, and several viruses that transduce DNA with particularly high frequencies are commonly used
by researchers to move genes from one cell to another. Viruses and their close
relatives - plasmids and transposable elements - have also been important to
cell biology in many other ways. Because of their relative simplicity, for
example, studies of their reproduction have progressed unusually rapidly and have
illuminated many of the basic genetic mechanisms in cells. In addition, both
viruses and plasmids have been crucial elements in the development of the
recombinant DNA technologies that will be described in Chapter 7.
Summary
Viruses are infectious particles that consist of a DNA or an RNA molecule (the
viral genome) packaged in a protein capsid, which in the enveloped viruses is
surrounded by a lipid-bilayer-based membrane. Both the structure of the viral genome and
its mode of replication vary widely among viruses. A virus can multiply only inside
a host cell, whose genetic mechanisms it subverts for its own reproduction. A
common outcome of a viral infection is the lysis of the infected cell and release of
infectious viral particles. In some cases, however, the viral chromosome instead integrates
into a host-cell chromosome, where it is replicated as a provirus along with the host
genome. Many viruses are thought to have evolved from plasmids, which are
self-replicating DNA or RNA molecules that lack the ability to wrap themselves in a
protein coat.
Transposable elements are DNA sequences that differ from viruses in being
able to multiply only in their host cell and its progeny; like plasmids, they cannot
exist stably outside of cells. Unlike plasmids, they normally replicate only as an
integral part of a chromosome. Some transposable elements, however, are closely related
to retroviruses and can move from place to place in the genome by the reverse
transcription of an RNA intermediate. Although both viruses and transposable elements
can be viewed as parasites, many of the DNA sequence rearrangements they cause
are important for the evolution of cells and organisms.
Copyright © 1994 Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson. Published by Garland Publishing, a member of the Taylor & Francis Group. No part of the publication may be reproduced or used in any form or by any means known now or invented hereafter without the permission of the publisher.