Figure 11-7
.
Overview of mRNA processing in eukaryotes
Shortly after RNA polymerase II initiates transcription at the first
nucleotide of the first exon of a gene, the 5′ end of the
nascent RNA is capped with 7-methylguanylate. Transcription by RNA
polymerase II terminates at any one of multiple termination sites
downstream from the poly(A) site, which is located at the
3′ end of the final exon. After the primary transcript is
cleaved at the poly(A) site, a string of adenine (A) residues is
added. The poly(A) tail contains ≈250 A residues in
mammals, ≈150 in insects, and ≈100 in
yeasts. For short primary transcripts with few introns,
polyadenylation, cleavage, and splicing usually follows termination,
as shown. For large genes with multiple introns, introns often are
spliced out of the nascent RNA before transcription of the gene is
complete. Note that the 5′ cap is retained in mature
mRNAs.
View Movie: Life Cycle of an mRNA
As discussed in
Chapter 4, the initial
primary transcript synthesized by
RNA polymerase II undergoes several processing
steps before a functional mRNA is produced. In this section, we take a closer look
at how eukaryotic cells carry out mRNA processing, which includes three major
processes:
5′
capping, 3′
cleavage/polyadenylation, and
RNA splicing ().
Processing occurs in the
nucleus, and the functional mRNA produced is transported to
the
cytoplasm by mechanisms discussed later.
The 5′-Cap Is Added to Nascent RNAs Shortly after Initiation by RNA
Polymerase II
After nascent RNA molecules produced by RNA polymerase II reach a length of
25 – 30 nucleotides,
7-methylguanosine is added to their 5′ end. This
initial step in RNA processing is catalyzed by a dimeric capping enzyme, which
associates with the phosphorylated carboxyl-terminal tail domain (CTD) of RNA
polymerase II. Recall that the CTD becomes phosphorylated during transcription
initiation (see Figure 10-50). Because
the capping enzyme does not associate with polymerase I or III, capping is
specific for transcripts produced by RNA polymerase II.
Figure 11-8
.
Capping of the 5′ end of nascent RNA transcripts with
7′-methylguanylate (m7G)
The first two reactions are catalyzed by a capping enzyme that
associates with the phosphorylated CTD of RNA polymerase II shortly
after transcription initiation. Two different methyltransferases
catalyze reactions 3 and 4. S-adenosylmethionine
(S-Ado-Met) is the source of the methyl
(CH3) group for the two methylation steps; the
guanylate (G) is methylated first, then the 2′ hydroxyl of
the first one or two nucleotides (N) in the transcript. See Figure 4-18 for structure of
the resulting 5′ cap. [See S. Venkatesan and B. Moss,
1982, Proc. Nat’l. Acad. Sci. USA
79:304.]
One subunit of the capping
enzyme removes the γ-phosphate from the
5′ end of the nascent RNA emerging from the surface of a RNA
polymerase II (). The other
subunit transfers the GMP moiety from GTP to the 5′-diphosphate of the
nascent
transcript, creating the guanosine
5′-5′-triphosphate structure. In the final steps, separate
enzymes transfer methyl groups from
S-adenosylmethionine to the
N
7 position of the guanine and the 2′ oxygens of
riboses at the 5′ end of the nascent RNA.
Pre-mRNAs Are Associated with hnRNP Proteins Containing Conserved RNA-Binding
Domains
Figure 11-9
.
Visualization of hnRNP protein associated with nascent
transcripts in an oocyte of the newt Nophthalmus
viridescens
A portion of a “lampbrush” chromosome is shown.
DNA at the chromosome axis stains white with the DNA-specific dye
DAPI. The long red filaments are nascent transcripts bound by hnRNP
proteins, which fluoresce red after staining with a monoclonal
antibody against a specific hnRNP protein. [Courtesy of M. Roth and
J. Gall.]
Nascent RNA
transcripts from
protein-coding
genes and mRNA processing
intermediates, collectively referred to as
pre-mRNA, do not exist as free RNA molecules in the nuclei of
eukaryotic cells. From the time nascent
transcripts first emerge from RNA
polymerase II until mature mRNAs are transported into the
cytoplasm, the RNA
molecules are associated with an abundant set of nuclear
proteins, as numerous
in growing eukaryotic cells as
histones. These
proteins are the major
protein
components of heterogeneous ribonucleoprotein particles (hnRNPs), which contain
heterogeneous nuclear RNA (hnRNA), a collective term
referring to
pre-mRNA and other nuclear RNAs of various sizes. The
proteins in
these ribonucleoprotein particles can be dramatically visualized with
fluorescentlabeled monoclonal antibodies ().
To identify hnRNP proteins, researchers exposed cells to high-dose
UV irradiation, which causes covalent cross-links to form between RNA bases and
closely associated proteins. Chromatography of nuclear extracts from treated
cells on an oligo-dT cellulose column, which binds RNAs with a poly(A) tail, was
used to recover proteins that had become cross-linked to nuclear mRNA in living
cells (i.e., hnRNP proteins). Subsequent treatment of cell extracts from
unirradiated human cells with monoclonal antibodies specific for the major hnRNP
proteins identified by this cross-linking technique revealed a complex set of
abundant hnRNP proteins ranging in size from 34 to 120 kDa. Characterization of
the mRNAs encoding these proteins has shown that some of them (e.g., A2 and B1)
are related proteins derived by alternative splicing of exons from the same
transcription unit.
Binding studies with purified hnRNP proteins suggest that different hnRNP
proteins associate with different regions of a newly made pre-mRNA molecule as
determined by the sequence of the RNA. For example, the hnRNP proteins A1, C,
and D bind preferentially to the pyrimidine-rich sequences at the 3′
ends of introns, discussed in a later section. Like transcription factors, most
hnRNP proteins have a modular structure. They contain one or more RNA-binding
domains and at least one other domain that is thought to interact with other
proteins. Several different RNA-binding motifs have been identified by
constructing deletions of hnRNP proteins and testing their ability to bind RNA.
Although some RNA-binding proteins contain domains with the zinc-finger motif
common in DNA-binding proteins (see Figure
10-41), this motif has not yet been described in any hnRNP
proteins.
Figure 11-10
.
Structure of complex between an RNP motif from U1A protein and
RNA
(a) Diagram of the RNP motif domain. The conserved RNP1 and RNP2
regions are located in the two middle β–strands.
(b) Surface representation of the U1A protein-RNA complex determined
by X-ray crystallography. The RNA forms a stem-loop with the
single-stranded portion of the loop bound to the surface of the
protein. The N- and C-termini are at the upper left and right,
respectively. Acidic and basic amino acids are colored red and blue,
respectively. [From K. Nagai et al., 1995, Trends Biochem.
Sci.
20:235.]
The
RNP motif, also called the RNA-binding
domain (RBD), is the
most common RNA-binding
domain in hnRNP
proteins. This ≈80-residue
motif, which occurs in many other RNA-binding
proteins, contains two highly
conserved regions (RNP1 and RNP2) that allow the
motif to be recognized in newly
sequenced
proteins. X-ray crystallographic analysis has shown that the RNP
motif
consists of a four-stranded β sheet flanked on one side by two
α helices. The conserved RNP1 and RNP2 sequences lie side by side on
the two central β strands, and their side chains make multiple contacts
with a single-stranded region of RNA. The single-stranded RNA loop lies across
the surface of the β sheet and fits into a groove between the
protein
loop connecting strands β
2 and β
3 and
the C-terminal region ().
The RGG box, another RNA-binding motif found in hnRNP proteins,
contains five Arg-Gly-Gly (RGG) repeats with several interspersed aromatic amino
acids. Although the structure of this motif has not yet been determined, its
arginine-rich nature is similar to the RNA-binding domains of the
λ-phage N and HIV Tat proteins.
The 45-residue KH
motif is found in the hnRNP K protein and several other RNA-binding
proteins; commonly two or more copies of the KH motif are interspersed with RGG
repeats. The three-dimensional structure of a representative KH motif,
determined by NMR methods (Section 3.5), is similar to that of the RNP motif but
smaller, consisting of a three-stranded β sheet supported from one side
by a single α helix. It is not yet clear how this motif binds RNA.
Mutations in the fragile-X gene (FMR1), which encodes a protein
containing the KH motif, are associated with the most common form of heritable
mental retardation. Although the molecular function of the Fmr1 protein is
unknown, it presumably involves RNA binding.
hnRNP Proteins May Assist in Processing and Transport of mRNAs
Figure 11-11
.
Hybridization of RNA molecules in vitro is accelerated by hnRNP
proteins
The presence of complex secondary structures within RNA molecules
inhibits hybridization between long complementary sequences in
separate molecules. Association of hnRNP proteins with RNA is
thought to prevent formation of RNA secondary structures, thereby
facilitating base-pairing between different complementary molecules.
These proteins may have a similar function in vivo. [Adapted from D.
S. Portman and G. Dreyfuss, 1994, EMBO J.
13:213.]
The association of
pre-mRNAs with hnRNP
proteins may prevent formation of short
secondary structures dependent on
base-pairing of
complementary regions, thereby
making the
pre-mRNAs accessible for interaction with other
macromolecules (). Moreover,
pre-mRNAs
associated with hnRNP
proteins present a more uniform
substrate for further
processing steps than would free, unbound
pre-mRNAs each type of which forms a
unique
secondary structure dependent on its specific sequence.
The diversity of hnRNP proteins suggests that they probably have other functions
as well. For example, various hnRNP proteins may interact with the RNA sequences
that specify RNA splicing or cleavage/polyadenylation and contribute to the
structure recognized by RNA-processing factors. Finally, cell-fusion experiments
have shown that some hnRNP proteins remain localized in the nucleus, whereas
others cycle in and out of the cytoplasm, suggesting that they function in the
transport of mRNA (see later section).
Pre-mRNAs Are Cleaved at Specific 3′ Sites and Rapidly
Polyadenylated
In animal cells, all mRNAs, except histone mRNAs, have a 3′ poly(A)
tail. Early studies of pulse-labeled adenovirus and SV40 RNA demonstrated that
the viral primary transcripts extend beyond the poly(A) site in the viral mRNAs.
These results suggested that A residues are added to a 3′ hydroxyl
generated by endonucleolytic cleavage, but the predicted downstream RNA
fragments are degraded so rapidly in vivo that they cannot be detected. However,
this cleavage mechanism was firmly established by detection of both predicted
cleavage products in in vitro processing reactions performed with extracts of
HeLa-cell nuclei.
Early sequencing of cDNA clones from animal cells showed that nearly all mRNAs
contain the sequence AAUAAA 10 – 35 nucleotides
upstream from the poly(A) tail. Polyadenylation of RNA transcripts from
transfected genes is virtually eliminated when template DNA encoding the AAUAAA
sequence is mutated to any other sequence except one encoding AUUAAA. The
unprocessed RNA transcripts produced from such mutant templates do not
accumulate in nuclei, but are rapidly degraded. Further mutagenesis of sequences
within a few hundred bases of poly(A) sites revealed that a second signal
downstream from the cleavage site is required for efficient cleavage and
polyadenylation of most pre-mRNAs in animal cells. This downstream
poly(A) signal is not a specific sequence but rather a GU-rich or simply a
U-rich region within ≈50 nucleotides of the cleavage site.
Figure 11-12
.
Model for cleavage and polyadenylation of pre-mRNAs in mammalian
cells
Cleavage-and-polyadenylation specificity factor (CPSF) binds to an
upstream AAUAAA polyadenylation signal. CStF interacts with a
downstream GU- or U-rich sequence and with bound CPSF, forming a
loop in the RNA; binding of CFI and CFII help stabilize the complex.
Binding of poly(A) polymerase (PAP) then stimulates cleavage at a
poly(A) site, which usually is
10 – 35 nucleotides 3′ of
the upstream polyadenylation signal. The cleavage factors are
released, as is the downstream RNA cleavage product, which is
rapidly degraded. Bound PAP then adds ≈12 A residues at
a slow rate to the 3′-hydroxyl group generated by the
cleavage reaction. Binding of poly(A)-binding protein II (PABII) to
the initial short poly(A) tail accelerates the rate of addition by
PAP. After 200 – 250 A residues have
been added, PABII signals PAP to stop polymerization.
Identification and purification of the
proteins required for cleavage and
polyadenylation of
pre-mRNA has led to the model shown in . According to this model, a 360-kDa
cleavage and polyadenylation specificity factor (CPSF),
composed of four different
polypeptides, first forms an unstable complex with
the
upstream AU-rich poly(A) signal. Then at least three additional
proteins — a 200-kDa heterotrimer called
cleavage stimulatory factor (CStF), a 150-kDa heterotrimer
called
cleavage factor I (CFI), and a second cleavage factor
(CFII), as-yet poorly characterized — bind to
the CPSF-RNA complex. Interaction between CStF and the GU- or U-rich
downstream
poly(A) signal stabilizes the multiprotein complex. Finally, a
poly(A)
polymerase (PAP) binds to the complex before cleavage can occur.
This requirement for PAP binding links cleavage and polyadenylation, so that the
free 3′ ends generated are rapidly polyadenylated. Assembly of this
large, multiprotein cleavage-polyadenylation complex around the AU-rich poly(A)
signal in a
pre-mRNA is analogous in many ways to formation of the
transcription-initiation complex at the AT-rich
TATA box of a
template DNA
molecule (see
Figure 10-50). In both
cases, multiprotein complexes assemble cooperatively through a network of
specific
protein –
nucleic acid and
protein-
protein interactions.
Following cleavage at the poly(A) site, polyadenylation proceeds in two phases.
Addition of the first 12 or so A residues occurs slowly, followed by rapid
addition of up to 200 – 250 more A residues. The
rapid phase requires the binding of multiple copies of a poly(A)-binding
protein
containing the RNP
motif. This
protein is designated
PABII to
distinguish it from the poly(A)-binding
protein that binds to the poly(A) tail
of cytoplasmic mRNAs. PABII binds to the short A tail initially added by PAP,
stimulating polymerization of additional A residues by PAP (see ). PABII is also responsible
for signaling poly(A) polymerase to terminate polymerization when the poly(A)
tail reaches a length of 200 – 250 residues,
although the mechanism for measuring this length is not yet understood.
Splicing Occurs at Short, Conserved Sequences in Pre-mRNAs via Two
Transesterification Reactions
Figure 11-13
.
Demonstration that introns are spliced out by electron microscopy
of RNA-DNA hybrid between adenovirus DNA and the mRNA encoding
hexon, a major viral protein
(a) Diagram of the EcoRI A fragment of adenovirus
DNA, which extends from the left end of the genome to just before
the end of the final exon of the hexon gene. The gene consists of
three short exons and one long (≈3.5-kb) exon separated
by three introns of ≈1, 2.5, and 9 kb. (b) Electron
micrograph (left) and schematic drawing
(right) of hybrid between an
EcoRI A fragment and hexon mRNA. The loops
marked A, B, and C correspond to the introns indicated in (a). Since
these intron sequences in the viral genomic DNA are not present in
mature hexon mRNA, they loop out between the exon sequences that
hybridize to their complementary sequences in the mRNA. [Micrograph
from S. M. Berget et al., 1977, Proc. Nat’l. Acad.
Sci. USA
74:3171; courtesy of P. A. Sharp.]
During the final step in formation of a mature, functional mRNA, the
introns are
removed and
exons are spliced together (see ). The discovery that
introns are removed during splicing
came from electron microscopy of RNA-DNA hybrids between adenovirus DNA and the
mRNA encoding hexon, a major
virion capsid protein ().
* Similar analyses of hybrids between RNA isolated from the nuclei of
infected cells and viral DNA revealed RNAs that were colinear with the viral DNA
(
primary transcripts) and RNAs with one or two of the
introns removed
(processing intermediates). These results, together with the findings that the
59 cap and 39 poly(A) tail of mRNA precursors are retained in mature cytoplasmic
mRNAs, led to the realization that
introns are removed from
primary transcripts
as
exons are spliced together. For short
transcription units,
RNA splicing
usually follows cleavage and polyadenylation of the 3′ end of the
primary transcript. But for long
transcription units containing multiple
exons,
splicing of
exons in the nascent RNA usually begins before
transcription of the
gene is complete.
Figure 11-14
.
Consensus sequences around 5′ and 3′ splice
sites in vertebrate pre-mRNAs
The only nearly invariant bases are the (5′)GU and
(3′)AG of the intron, although the flanking bases
indicated are found at frequencies higher than expected based on a
random distribution. A pyrimidine-rich region (light blue) near the
3′ end of the intron is found in most cases. The
branch-point adenosine, also invariant, usually is
20 – 50 bases from the 3′
splice site. The central region of the intron, which may range from
40 bases to 50 kilobases in length, generally is unnecessary for
splicing to occur. [See R. A. Padgett et al., 1986, Ann.
Rev. Biochem.
55:1119; E. B. Keller and W. A. Noon,
1984,Proc. Nat’l. Acad. Sci. USA
81:7417.]
The location of
exon-
intron junctions (i.e.,
splice sites) in a
pre-mRNA can be determined by comparing the sequence of
genomic DNA with that of
the cDNA prepared from the corresponding mRNA. Sequences that are present in the
genomic DNA but absent from the cDNA represent
introns and indicate the
positions of splice sites. Such analysis of a large number of different mRNAs
revealed moderately conserved, short
consensus sequences at
intron-
exon
boundaries in eukaryotic
pre-mRNAs; in higher organisms, a pyrimidine-rich
region just
upstream of the 3′ splice site also is common (). The most conserved
nucleotides are the (5′)GU and (3′)AG found at the ends of
most
introns. Deletion analyses of the center portion of
introns in various
pre-mRNAs have shown that generally only 30 – 40
nucleotides at each end of an
intron are necessary for splicing to occur at
normal rates.
Recombinant DNAs containing the 5′ splice site of one transcription
unit (e.g., SV40 late region) and the 3′ splice site of another (e.g.,
mouse β-globin gene) have been prepared and introduced into cultured
cells. Spliced mRNA molecules are formed in which the two exon sequences are
joined and the chimeric intron is deleted precisely. The formation of correctly
spliced mRNAs in such experiments indicates that the cell’s splicing
machinery can recognize 5′ and 3′ splice sites and correctly
splice them together, with little influence from the intervening sequence in
most cases.
Figure 11-15
.
Analysis of RNA products formed in an in vitro splicing
reaction
A nuclear extract from HeLa cells was incubated with a 497-nucleotide
radiolabeled RNA (bottom) that contained portions
of two exons (orange and tan) from human β-globin mRNA
separated by a 130-nucleotide intron (blue). After incubation for
various times, the RNA was purified and subjected to electrophoresis
and autoradiography, along with RNA markers (lane M). The number of
nucleotides in the various species is indicated. Much of the
slower-migrating starting RNA (497) was correctly spliced, yielding
a 367-nucleotide product. The excised intron (130*) migrated slower
than expected based on its molecular weight, indicating that it is
not a linear molecule. Likewise, one of the reaction intermediates
(339*) exhibited an anomalously slow electrophoretic mobility.
Additional analysis indicated that in both cases the intron had a
lariat structure resulting in the slow mobility. The 252** band, an
aberrant product of the in vitro reaction, is greatly reduced in
reactions in which the RNA is capped. [From B. Ruskin et al., 1984,
Cell
38:317; photograph courtesy of Michael R. Green. See
also R. A. Padgett et al., 1984, Science
225:898.]
Analysis of the intermediates formed during splicing of
pre-mRNAs in vitro led to
the conclusion that
introns are removed as a
lariat structure
in which the 5′ G of the
intron is joined in an unusual
2′,5′-
phosphodiester bond to an adenosine near the
3′ end of the
intron (). This A residue is called the
branch point
because it forms an RNA branch in the lariat structure.
Figure 11-16
.
Splicing of exons in pre-mRNA occurs via two transesterification
reactions
In the first reaction, the ester bond between the 5′
phosphorus of the intron and the 3′ oxygen (red) of exon 1
is exchanged for an ester bond with the 2′ oxygen (dark
blue) of the branch-site A residue. In the second
reaction, the ester bond between the 5′ phosphorus of exon
2 and the 3′ oxygen (light blue) of the intron is
exchanged for an ester bond with the 3′ oxygen of exon 1,
releasing the intron as a lariat structure and joining the two
exons. Arrows show where the activated hydroxyl oxygens react with
phosphorus atoms.
The finding that excised
introns have a branched lariat structure led to the
discovery that splicing of
exons proceeds via two sequential
transesterification reactions (). In each reaction, one phosphate-ester bond
is exchanged for another. Since the number of phosphate-ester bonds in the
molecule is not changed in either reaction, no energy is consumed. The net
result of these two transesterification reactions is that two
exons are ligated
and the intervening
intron is released as a branched lariat structure.
Spliceosomes, Assembled from snRNPs and a Pre-mRNA, Carry Out
Splicing
Even before splicing was accomplished in vitro, several observations led to the
suggestion that small nuclear RNAs (snRNAs) assist in the splicing
reaction. First, the short consensus sequence at the 5′ end of introns
was found to be complementary to a sequence near the 5′ end of the
snRNA called U1. Second, snRNAs were found associated with hnRNPs in nuclear
extracts. Five U-rich snRNAs (U1, U2, U4, U5, and U6), ranging in length from
107 to 210 nucleotides, participate in RNA splicing.
In the nucleus of eukaryotic cells, snRNAs are associated with six to ten
proteins in small nuclear ribonucleoprotein particles (snRNPs).
Some of these proteins are common to all snRNPs, and some are specific for
individual snRNPs. Experiments with a synthetic oligonucleotide that hybridizes
with the 5′-end region of U1 snRNA and later studies with pre-mRNAs
that were mutated in the 5′ splice-site consensus sequence provided
strong evidence that base pairing between the 5′ splice site of a
pre-mRNA and the 5′ region of U1 snRNA is required for RNA
splicing.
Figure 11-17
.
Diagram of interactions between pre-mRNA, U1 snRNA, and U2
snRNA early in the splicing process
The 5′ region of U1 snRNA initially base-pairs with
nucleotides at the 5′ end of the intron (blue) and
3′ end of the 5′ exon (dark red) of the
pre-mRNA; U2 snRNA base-pairs with a sequence that includes the
branch-point A, although this residue is not base-paired. The
yeast branch-point sequence is shown here. Secondary structures
in the snRNAs that are not altered during splicing are shown in
diagrammatic line form. The purple rectangles represent
sequences that bind snRNP proteins recognized by anti-Sm
antibodies. For unknown reasons, antisera from patients with the
autoimmune disease systemic lupus erythematosus (SLE) contain
these antibodies. Such antisera have been useful in
characterizing components of the splicing reaction. [See E. J.
Sontheimer and J. A. Steitz, 1993, Science
262:1989; adapted from M. J. Moore et al., 1993, in
R. Gesteland and J. Atkins, eds., The RNA
World, Cold Spring Harbor Press, pp. 303-357.]
View Movie: mRNA Splicing
Involvement of U2 snRNA in splicing initially was suspected when it was found to
have an internal sequence that is largely
complementary to the consensus
sequence flanking the branch point in
pre-mRNAs (see ).
Mutation experiments, similar to those
conducted with U1 snRNA and 5′ splice sites, demonstrated that
base
pairing between U2 snRNA and the branch-point sequence in
pre-mRNA is critical
to splicing. These studies with U1 and U2 snRNAs indicate that during splicing
they
base-pair with
pre-mRNA as shown in . Significantly, the branch- point A itself, which is not
base-paired to U2 snRNA, “bulges out,” allowing its
2′ hydroxyl to participate in the first transesterification reaction
of
RNA splicing (see ).
Similar studies with other snRNAs demonstrated that RNA-RNA interactions
involving them also occur during splicing. For example, an internal region of U6
snRNA initially base-pairs with the 5′ end of U4 snRNA. Rearrangements
later in the splicing process result in U6 snRNA base pairing with the
5′ end of U2 snRNA, which remains base-paired to the branch-point
sequence in the intron. Later in the splicing process, base pairing of U5 snRNA
with four exon nucleotides adjacent to the splice sites displaces U1 snRNA from
the pre-mRNA.
Figure 11-18
.
Electron micrograph of a spliceosome
Extracts of HeLa cells were mixed with a β-globin pre-mRNA;
the reaction was interrupted before splicing was completed, so that
the spliceosomes, containing snRNPs and the pre-mRNA substrate,
could be purified. [From R. Reid et al., 1988, Cell
53:949; courtesy of J. Griffith.]
Figure 11-19
.
The spliceosomal splicing cycle
The splicing snRNPs (U1, U2, U4, U5, and U6) associate with the
pre-mRNA and with each other in an ordered sequence to form the
spliceosome. This large ribonucleoprotein complex then catalyzes
the two transesterification reactions that result in splicing of
the
exons (light and dark red) and excision of the
intron (blue)
as a lariat structure (see ). Although ATP
hydrolysis is not
required for the transesterification reactions, it is thought to
provide the energy necessary for rearrangements of the
spliceosome structure that occur during the cycle. Note that the
snRNP
proteins in the
spliceosome are distinct from the hnRNP
proteins discussed earlier. In higher
eukaryotes, the
association of U2 snRNP with
pre-mRNA is assisted by an hnRNP
protein called U2AF, which binds to the pyrimidine-rich region
near the 3′ splice site. U2AF also probably interacts
with other
proteins required for splicing through a
domain
containing repeats of the dipeptide serine-arginine (the SR
motif). The branch-point A in
pre-mRNA is indicated in boldface.
[See S. W. Ruby and J. Abelson, 1991,
Trends
Genet.
7:79; adapted from M. J. Moore et al., 1993, in R.
Gesteland and J. Atkins, eds.,
The RNA World,
Cold Spring Harbor Press, pp. 303-357.]
View Movie: mRNA Splicing
Based on the results of these experiments, identification of reaction
intermediates, and other biochemical analyses, the five splicing snRNPs are
thought to sequentially assemble on the
pre-mRNA forming a large
ribonucleoprotein complex called a
spliceosome, which is roughly the size of a
ribosome (). According to the model
depicted in , assembly of a
spliceosome begins with the
base pairing of U1 and U2 snRNAs, as part of the U1
and U2 snRNPs, to the
pre-mRNA (see ). Extensive
base pairing between the snRNAs in the U4 and U6
snRNPs forms a complex that associates with U5 snRNP. The U4/U6/U5 complex then
associates, presumably via
protein-
protein interactions, with the previously
formed complex consisting of a
pre-mRNA base-paired to U1 and U2 snRNPs to yield
a
spliceosome.
After formation of the spliceosome, extensive rearrangements occur in the pairing
of snRNAs and the pre-mRNA, as noted previously. The rearranged spliceosome then
catalyzes the two transesterification reactions that result in RNA splicing.
After the second transesterification reaction, the ligated exons are released
from the spliceosome while the lariat intron remains associated with the snRNPs.
This final intron-snRNP complex is unstable and dissociates. The individual
snRNPs released participate in a new cycle of splicing. The excised intron is
rapidly degraded by a “debranching enzyme,” which hydrolyzes
the 5′,2′-phosphodiester bond at the branch point, and other
nuclear RNases.
It is estimated that at least one hundred proteins are involved in RNA splicing,
making this process comparable in complexity to protein synthesis and initiation
of transcription. Some of these splicing factors are associated with snRNPs, but
others are not. Sequencing of yeast genes encoding splicing factors has revealed
that they contain domains with the RNP motif, which interacts with RNA, and the
SR motif, which interacts with other proteins and may contribute to RNA binding.
Some splicing factors also exhibit sequence homologies to known RNA helicases;
these may be necessary for the base-pairing rearrangements that occur in snRNAs
during the spliceosomal splicing cycle.
Introns whose splice sites do not conform to the standard
consensus sequence
recently were identified in some
pre-mRNAs. This class of
introns begins with AU
and ends with AC rather than following the usual
“GU – AG rule” (see ). Research on the
biochemistry of splicing for this special class of
introns soon identified four
novel snRNPs. Together with the standard U5 snRNP, these snRNPs appear to
participate in a splicing cycle analogous to that discussed above.
Portions of Two Different RNAs Are Trans-Spliced in Some Organisms
Virtually all functional mRNAs in vertebrate and insect cells are derived from a
single molecule of the corresponding pre-mRNA by removal of internal introns and
splicing of exons. However, in two types of
protozoa — trypanosomes and
euglenoids — mRNAs are constructed by splicing
together separate RNA molecules. This process, referred to as
trans-splicing, is also used in the synthesis of
10 – 15 percent of the mRNAs in the round worm
Caenorhabditis elegans, an important model organism for
studying embryonic development.
The parasitic trypanosomes produce abundant amounts of a single 140-nucleotide
leader RNA from tandemly repeated transcription units. In a two-step reaction
analogous to spliceosomal pre-mRNA splicing, a 39-nucleotide portion of the
leader RNA, termed a mini-exon, is spliced to the 5′
end of protein-coding exons in primary transcripts, which lack internal introns.
The 5′ mini-exon, present in all trypanosome mRNAs, is thought to
assist in initiation of translation. Because of trans-splicing, polycistronic
protein- coding transcription units in trypanosomes, which are common, yield
monocistronic mRNAs from their polycistronic primary transcripts. Splicing of a
5′ mini-exon to a coding region in a primary transcript triggers
cleavage and polyadenylation at the 3′ end of the exon. Consequently,
trypanosomes use trans-splicing and linked cleavage and polyadenylation to
combine the operon organization of polycistronic transcription units
characteristic of bacteria with the monocistronic organization of mRNAs
characteristic of eukaryotes.
Self-Splicing Group II Introns Provide Clues to the Evolution of
snRNAs
Under certain nonphysiological in vitro conditions, pure preparations of some RNA
transcripts slowly splice out introns in the absence of any protein. This
observation led to recognition that some introns are
self-splicing. Two types of self-splicing introns have been
discovered: group I introns, present in nuclear rRNA genes of
protozoans, and group II introns, present in protein-coding genes
and some rRNA and tRNA genes of mitochondria and chloroplasts in plants and
fungi. Discovery of the catalytic activity of self-splicing introns
revolutionized concepts about the functions of RNA. As discussed in Chapter 4, RNA is now thought to
catalyze peptide-bond formation during protein synthesis in ribosomes. Here we
discuss the probable role of group II introns, now found only in mitochondrial
and chloroplast DNA, in the evolution of snRNAs; the functioning of group I
introns is considered in the later section on rRNA processing.
Figure 11-20
.
Schematic diagrams comparing the secondary structures of group II
self-splicing introns (a) and U snRNAs present in the spliceosome
(b)
The first transesterification reaction is indicated by black arrows;
the second reaction, by blue arrows. The branch-point A is
boldfaced. The similarity in these structures suggests that the
spliceosomal snRNAs evolved from group II introns, with the
trans-acting snRNAs being functionally analogous to the
corresponding domains in group II introns. [Adapted from P. A.
Sharp, 1991, Science
254:663.]
Even though their precise sequences are not highly conserved, all group II
introns fold into a conserved, complex
secondary structure containing numerous
stem-loops ().
Self-splicing by a group II
intron occurs via two transesterification reactions,
involving intermediates and products analogous to those found in nuclear
pre-mRNA splicing. The mechanistic similarities between group II
intron
self-splicing and spliceosomal splicing led to the hypothesis that snRNAs
function analogously to the stem-loops in the
secondary structure of group II
introns. According to this hypothesis, snRNAs interact with 5′ and
3′ splice sites of
pre-mRNAs and with each other to produce an RNA
structure functionally analogous to that of group II self-splicing
introns
().
An extension of this hypothesis is that introns in present-day nuclear pre-mRNAs
evolved from ancient group II self-splicing introns through the progressive loss
of internal RNA structures, which concurrently evolved into transacting snRNAs
that perform the same functions. In support of this kind of evolutionary model,
group II intron mutants have been constructed in which domain V and part of
domain I are deleted. Such mutants are defective in self-splicing, but when RNA
molecules equivalent to the deleted regions are added to the in vitro reaction,
self-splicing occurs. This finding demonstrates that these domains in group II
introns can be trans-acting, like snRNAs.
The similarity in the mechanisms of group II intron self-splicing and
spliceosomal splicing of pre-mRNAs also suggests that the splicing reaction is
catalyzed by the snRNA, not the protein, components of spliceosomes. Although
group II introns can self-splice in vitro at elevated temperatures and
Mg2+ concentrations, under in vivo conditions proteins
called maturases, which bind to group II intron RNA, are
required for rapid splicing. Maturases, encoded by group II introns themselves,
are thought to stabilize the precise three-dimensional interactions of the
intron RNA required to catalyze the two splicing transesterification reactions.
By analogy, snRNP proteins in spliceosomes are thought to stabilize the precise
geometry of snRNAs and intron nucleotides required to catalyze pre-mRNA
splicing.
The evolution of snRNAs may have been an important step in the rapid evolution of
higher eukaryotes. As internal intron sequences were lost and their functions in
RNA splicing supplanted by trans-acting snRNAs, the remaining intron sequences
would be free to diverge. This in turn likely facilitated the evolution of new
genes through exon shuffling (Section 9.3). It also permitted the increase in
protein diversity that results from alternative RNA splicing and an additional
level of gene control resulting from regulated RNA splicing.
One more remarkable property of group II introns deserves mention, namely, their
ability to behave as mobile DNA elements in the genome. The maturases that
increase the rate of self-splicing of these introns also contain a domain that
is homologous to reverse transcriptase. Thus group II introns can move in the
genome like other nonviral retrotransposons discussed in Chapter 9. As is generally true for
mobile DNA elements, transposition of group II introns is rare. However, when a
group II intron does transpose, it does not inactivate the gene into which it
inserts, because the inserted intron is spliced out of the transcript produced
from the target gene by self-splicing!
Most Transcription and RNA Processing Occur in a Limited Number of Domains in
Mammalian Cell Nuclei
Figure 11-21
.
Localization of polyadenylated RNA and RNA splicing factors in
the nucleus of a mammalian fibroblast
Digital imaging microscopy was used to reconstruct a
1-μm-thick section of a stained human fibroblast nucleus. (a)
Section stained with red rhodamine-labeled poly-dT to detect
polyadenylated RNA (red) and with DAPI to detect DNA (blue).
Polyadenylated RNA is localized to a limited number of discrete foci
(speckles) between regions of chromatin, although not all regions
containing low levels of DNA contained detectable polyadenylated RNA
(arrow). (b) The same section shown in (a) stained to detect
polyadenylated RNA (red) and the essential RNA-splicing protein
SC-35, which was visualized with a green fluorescein-labeled
monoclonal antibody. Regions where the stains overlap appear yellow.
SC-35 is present in the center of many foci (arrow).
Nu = nucleolus. [From K. C. Carter
et al., 1993, Science
259:1330.]
The digital imaging micrographs in demonstrate that most of the nuclear polyadenylated RNA
(including unspliced and partially spliced
pre-mRNA and nuclear mRNA) occurs in
discrete foci lying between dense regions of
chromatin and that a required
protein splicing factor (SC-35) is localized to the center of these same loci.
The results of these and other studies suggest that
transcription and RNA
processing do not occur randomly throughout the eukaryotic
nucleus; rather, the
nucleus is organized into discreet
domains
(≈20 – 100 in human
fibroblasts) where
the bulk of
transcription and
RNA processing occurs.
Figure 11-22
.
Transmission electron micrograph showing the nuclear matrix
(skeleton) of a HeLa cell
Cells were treated with a nonionic detergent to remove membranes;
digested with DNase to remove most of the DNA; and then extracted
with 0.25 M ammonium sulfate to remove histones and
chromatinassociated protein. A whole mount of the remaining material
was prepared. [From S. Penman et al., 1982, Cold Spring
Harbor Symp. Quant. Biol.
46:1013.]
This highly organized view of the
nucleus implies that there is an underlying
nuclear substructure. It has been known for many years that when mammalian cells
are treated with a mild nonionic detergent, DNase I, and high concentrations of
salt, a fibrillar network of
protein and RNA remains in the region of the
nucleus (). This
protein
network has been called the
nuclear matrix, or
nuclear
skeleton. It is composed of
actin and numerous other
protein
components that have not been fully characterized, including components of the
chromosomal scaffold that rearranges and condenses to form
metaphase chromosomes
during
mitosis (see
Figure 9-34).
However, snRNPs remain associated with the nuclear matrix prepared from
detergent-extracted, DNase I – treated cells.
Moreover, when the nuclear matrix is prepared with a low concentration of salt,
pre-mRNAs associated with the matrix undergo splicing when ATP is added. These
results suggest that the RNA-processing foci observed microscopically may be
associated with specific regions of the nuclear matrix.
SUMMARY
-
Eukaryotic mRNA precursors are processed by
5′ capping, 3′ cleavage and polyadenylation, and RNA
splicing to remove introns before being transported to the cytoplasm
where they are translated by ribosomes.
-
The cap is added to the 5′ end of
a pre-mRNA nascent transcript by a capping enzyme that associates with
the phosphorylated CTD of RNA polymerase II shortly after transcription
initiation.
-
Nascent pre-mRNA transcripts are associated
with a class of abundant RNA-binding proteins called hnRNP proteins.
-
In most protein-coding genes, a conserved
polyadenylation signal (AAUAAA) lies
10 – 30 nucleotides upstream from a
poly(A) site where cleavage and polyadenylation occur. A GU- or U-rich
sequence downstream from the poly(A) site contributes to the efficiency
of cleavage/ polyadenylation.
-
A multiprotein complex that includes
poly(A) polymerase (PAP) carries out the cleavage and polyadenylation of
a pre-mRNA. A nuclear poly(A)-binding protein, PABII, stimulates
addition of A residues by PAP and stops addition once the poly(A) tail
reaches 200 – 250 residues (see ). -
RNA splicing is carried out by a very large
ribonucleoprotein complex, the spliceosome, that is assembled by
interactions of five different snRNP particles with each other and with
pre-mRNA (see ). The
spliceosome catalyzes two transesterification reactions that join the
exons and remove the intron as a lariat structure, which is subsequently
degraded (see ). -
Group II self-splicing introns, which are
found in chloroplast genes and mitochondrial genes of plants and fungi,
exhibit a largely conserved secondary structure, which is necessary for
self-splicing. The snRNAs in the spliceosome are thought to have an
overall secondary structure similar to that of group II introns.
-
Most transcription and RNA processing in a
mammalian cell nucleus takes place in a limited number of domains. A
nuclear matrix or scaffold is formed by a fibrous protein network
throughout the nucleus. This nuclear matrix may help to organize the
foci of RNA transcription and processing.
ǀ