Figure 10.1
.In bacteria, transcription and translation are often coupled
Initiation of transcription, culminating with the RNA polymerase leaving the promoter and beginning synthesis of an RNA molecule, is simply the first step in the genome expression pathway. In this chapter and the next we will follow the process onwards and examine how transcription and translation eventually result in synthesis of the proteome.
We begin our detailed study of transcription by looking at the synthesis and processing of mRNAs, the molecules that make up the transcriptome and which specify the protein content of the cell. As the central players in genome expression, mRNAs have received the greatest attention from researchers and we now have a detailed picture of how they are produced. Events in bacteria are different in many respects from those in eukaryotes and so we will deal with the two types of organism in different sections. One aspect of eukaryotic mRNA processing - intron splicing - is so important that it requires a section of its own.
Bacterial mRNAs do not undergo any significant forms of processing: the primary transcript that is synthesized by the RNA polymerase is itself the mature mRNA, and its translation usually begins before transcription is complete (Figure 10.1
Because there is just one bacterial RNA polymerase (Section 9.2.1), the general mechanism of transcription is the same for all bacterial genes. The following descriptions of elongation and termination, given in the context of mRNA synthesis, therefore apply equally well to the synthesis of non-coding RNA.
The chemical basis of the template-dependent synthesis of RNA was shown in Figure 3.5. Ribonucleotides are added one after another to the growing 3′ end of the RNA transcript, the identity of each nucleotide specified by the base-pairing rules: A base-pairs with T or U; G base-pairs with C. During each nucleotide addition, the β- and γ-phosphates are removed from the incoming nucleotide (see Figure 1.6), and the hydroxyl group is removed from the 3′-carbon of the nucleotide present at the end of the chain.
Exactly how termination occurs is not known. Current thinking views transcription as a stepwise nucleotide-by-nucleotide process, with the polymerase pausing at each position and making a ‘choice’ between continuing elongation by adding another ribonucleotide to the transcript, or terminating by dissociating from the template. Which choice is selected depends on which alternative is more favorable in thermodynamic terms (von Hippel, 1998). This model emphasizes that, in order for termination to occur, the polymerase has to reach a position on the template where dissociation is more favorable than continued RNA synthesis.
Rho is a helicase that follows the RNA polymerase along the transcript. When the polymerase stalls at a hairpin, Rho catches up and breaks the RNA-DNA base pairs, releasing the transcript. Note that the diagram is schematic and does not reflect the relative sizes of Rho and the RNA polymerase.
In bacteria, two mechanisms have evolved for influencing the repeated choice that the polymerase has to make between elongation and termination when copying a template. Both mechanisms are important in regulating the expression of genes contained within operons.
The antiterminator protein attaches to the DNA and transfers to the RNA polymerase as it moves past, subsequently enabling the polymerase to continue transcription through termination signal number 1, so the second of the pair of genes in this operon is transcribed.
See the text for details.
Despite this equivalence, the overall processes for mRNA synthesis in bacteria and eukaryotes are quite different. The most striking dissimilarity is the extent to which eukaryotic mRNAs are processed during transcription. In bacteria, the transcripts of protein-coding genes are not processed at all: the primary transcripts are mature mRNAs. In contrast, all eukaryotic mRNAs have a cap added to the 5′ end, most are also polyadenylated by addition of a series of adenosines to the 3′ end, many contain introns and so undergo splicing, and a few are subject to RNA editing. A function has been assigned to capping, but the reason for polyadenylation largely remains a mystery. With splicing and editing we can appreciate why the events occur - the former removes introns that block translation of the mRNA; the latter changes the coding properties of the mRNA - but we do not understand why these mechanisms have evolved. Why do genes have introns in the first place? Why edit an mRNA rather than encoding the desired sequences in the DNA?
Eukaryotic mRNAs are processed while they are being synthesized. The cap is added as soon as transcription has been initiated, splicing and editing begin while the transcript is still being made, and polyadenylation is an inherent part of the termination mechanism for RNA polymerase II. To deal with all of these events together would be confusing, with too many different things being described at once. We will therefore postpone editing until the end of the chapter, which means it can be dealt with in tandem with similar forms of chemical modification occurring during rRNA and tRNA processing, and we will consider splicing after we have studied capping, elongation and polyadenylation.
Promoter clearance is the transition from the pre-initiation complex to a complex that has begun to synthesize RNA. Promoter escape occurs when the polymerase moves away from the promoter region and becomes committed to making a transcript. Note that the drawing is schematic and is not intended to indicate the shape or subunit composition of the RNA polymerase II complex that synthesizes the transcript.
The top part of the diagram shows the capping reaction in outline. A GTP molecule (drawn as Gppp) reacts with the 5′ end of the mRNA to give a triphosphate linkage. In the second step of the process, the terminal G is methylated at nitrogen number 7. The bottom part of the diagram shows the chemical structure of the type 0 cap, with asterisks indicating the positions where additional methylations might occur to produce type 1 and type 2 cap structures.
A second methylation replaces the hydrogen of the 2′-OH group of what is now the second nucleotide in the transcript. This results in a type 1 cap.
If this second nucleotide is an adenosine, then a methyl group might be added to nitrogen number 6 of the purine.
Another 2′-OH methylation might occur at the third nucleotide position, resulting in a type 2 cap.
All RNAs synthesized by RNA polymerase II are capped in one way or another. This means that as well as mRNAs, the snRNAs that are transcribed by this enzyme are also capped (see Table 9.3). The cap may be important for export of mRNAs and snRNAs from the nucleus (Section 10.5), but its best defined role is in translation of mRNAs, which is covered in Section 11.2.2.
As mentioned above, the fundamental aspects of transcript elongation are the same in bacteria and eukaryotes. The one major distinction concerns the length of transcript that must be synthesized. The longest bacterial genes are only a few kb in length and can be transcribed in a matter of minutes by the bacterial RNA polymerase, which has a polymerization rate of several hundred nucleotides per minute. In contrast, RNA polymerase II can take hours to transcribe a single gene, even though it can work at up to 2000 nucleotides per minute. This is because the presence of multiple introns in many eukaryotic genes (Section 10.1.3) means that considerable lengths of DNA must be copied. For example, the pre-mRNA for the human dystrophin gene is 2400 kb in length and takes about 20 hours to synthesize.
| Elongation factor | Function |
|---|---|
| CSB, ELL, Elongin | These factors suppress ‘pausing’ of RNA polymerase II, which can occur when the enzyme transcribes through a region where intra-strand base pairs (e.g. a hairpin loop) can form |
| SII | Prevents arrest (complete cessation of elongation) |
| FACT, HMG14 | Thought to modify chromatin in order to assist elongation |
A second difference between bacterial and eukaryotic elongation is that RNA polymerase II, as well as the other eukaryotic nuclear polymerases, has to negotiate the nucleosomes that are attached to the template DNA that is being transcribed. At first glance it is difficult to imagine how the polymerase can elongate its transcript through a region of DNA wound around a nucleosome (see Figure 2.5). The solution to this problem is probably provided by elongation factors that are able to modify the chromatin structure in some way. In mammals, the elongation factor FACT has been shown to interact with histones H2A and H2B, possibly influencing nucleosome positioning, and less well defined interactions have been demonstrated for other factors (Orphanides and Reinberg, 2000). Yeast possesses a factor called elongator, which has tentatively been assigned a role in chromatin modification because it contains a subunit that has histone acetyltransferase activity (Section 8.2.1; Wittschieben et al., 1999), but so far a homolog of this complex has not been identified in mammals. An intriguing question is whether the first polymerase to transcribe a particular gene is a ‘pioneer’ with a special elongation factor complement that opens up the chromatin structure, with subsequent rounds of transcription being performed by standard polymerase complexes that take advantage of the changes induced by the pioneer.
Virtually all eukaryotic mRNAs have a series of up to 250 adenosines at their 3′ ends. These As are not specified by the DNA and are added to the transcript by a template-independent RNA polymerase called poly(A) polymerase (Bard et al., 2000). This polymerase does not act at the extreme 3′ end of the transcript, but at an internal site which is cleaved to create a new 3′ end to which the poly(A) tail is added.
See the text for details. Note that the diagram is schematic and is not intended to indicate the relative sizes and shapes of the various protein complexes, nor their precise positioning, although CPSF and CstF are thought to bind to the 5′-AAUAAA-3′ and GU-rich sequences, respectively, as shown. Note that ‘GU’ indicates a GU-rich sequence rather than the dinucleotide 5′-GU-3′.
CPSF is shown attached to the RNA polymerase II elongation complex that is synthesizing RNA. CPSF binds to the polyadenylation signal sequence as soon as it is transcribed. This changes the interaction between CPSF and the CTD of RNA polymerase II so that termination of transcription is now favored over continued elongation. Note that this is a schematic representation and ignores the possibility that CstF may also be a component of the elongation complex. This representation also shows CPSF leaving the complex in order to bind to the polyadenylation signal, when in reality it may maintain its attachment to RNA polymerase II during the polyadenylation process.
Even though polyadenylation can be identified as an inherent part of the termination process, this does not explain why it is necessary to add a poly(A) tail to the transcript. A role for the poly(A) tail has been sought for several years, but no convincing evidence has been found for any of the various suggestions that have been made. These suggestions include an influence on mRNA stability, which seems unlikely as some stable transcripts have very short poly(A) tails, and a role in initiation of translation. The latter proposal is supported by research showing that poly(A) polymerase is repressed during those periods of the cell cycle when relatively little protein synthesis occurs (Colgan et al., 1996).
| Intron type | Where found | Cross-reference |
|---|---|---|
| GU-AG introns | Eukaryotic nuclear pre-mRNA | Section 10.1.3 |
| AU-AC introns | Eukaryotic nuclear pre-mRNA | Section 10.1.3 |
| Group I | Eukaryotic nuclear pre-rRNA, organelle RNAs, few bacterial RNAs | Section 10.2.3 |
| Group II | Organelle RNAs, some prokaryotic RNAs | Box 10.2 |
| Group III | Organelle RNAs | Box 10.2 |
| Twintrons | Organelle RNAs | Box 10.2 |
| Pre-tRNA introns | Eukaryotic nuclear pre-tRNA | Section 10.2.3 |
| Archaeal introns | Various RNAs | Box 10.2 |
Few rules can be established for the distribution of introns in protein-coding genes, beyond the fact that introns are less common in lower eukaryotes: the 6000 genes in the yeast genome contain only 239 introns in total, whereas many individual mammalian genes contain 50 or more introns. When the same gene is compared in related species, we usually find that some of the introns are in identical positions but that each species has one or more unique introns. This implies that some introns remain in place for millions of years, retaining their positions while species diversify, whereas others appear or disappear during this same period. This leads to two competing hypotheses for the evolution of introns:
‘Introns late’ is the hypothesis that introns evolved relatively recently and are gradually accumulating in eukaryotic genomes.
‘Introns early’ is the alternative hypothesis, that introns are very ancient and are gradually being lost from eukaryotic genomes.
| Gene | Length (kb) | Number of introns | Amount of the gene taken up by the introns (%) |
|---|---|---|---|
| Insulin | 1.4 | 2 | 69 |
| β-globin | 1.6 | 2 | 61 |
| Serum albumin | 18 | 13 | 79 |
| Type VII collagen | 31 | 117 | 72 |
| Factor VIII | 186 | 25 | 95 |
| Dystrophin | 2400 | 78 | 98 |
Adapted from Strachan and Read (1999).
With the vast bulk of pre-mRNA introns, the first two nucleotides of the intron sequence are 5′-GU-3′ and the last two 5′-AG-3′. They are therefore called ‘GU-AG’ introns and all members of this class are spliced in the same way. These conserved motifs were recognized soon after introns were discovered and it was immediately assumed that they must be important in the splicing process. As intron sequences started to accumulate in the databases it was realized that the GU-AG motifs are merely parts of longer consensus sequences that span the 5′ and 3′ splice sites. These consensus sequences vary in different types of eukaryote; in vertebrates they can be described as:
5′ splice site 5′-AG↓GUAAGU-3′
3′ splice site 5′-PyPyPyPyPyPyNCAG↓-3′
In these designations, ‘Py’ is one of the two pyrimidine nucleotides (U or C), ‘N’ is any nucleotide, and the arrow indicates the exon-intron boundary. The 5′ splice site is also known as the donor site and the 3′ splice site as the acceptor site.
The longer consensus sequences around the splice sites are given in the text. Abbreviation: Py, pyrimidine nucleotide (U or C).
Cleavage of the 5′ splice site is promoted by the hydroxyl (OH) attached to the 2′-carbon of an adenosine nucleotide within the intron sequence. This results in the lariat structure and is followed by the 3′-OH group of the upstream exon inducing cleavage of the 3′ splice site. This enables the two exons to be ligated, with the released intron being debranched and degraded.
Cleavage of the 5′ splice site occurs by a transesterification reaction promoted by the hydroxyl group attached to the 2′ carbon of an adenosine nucleotide located within the intron sequence. In yeast, this adenosine is the last one in the conserved 5′-UACUAAC-3′ sequence. The result of the hydroxyl attack is cleavage of the phosphodiester bond at the 5′ splice site, accompanied by formation of a new 5′-2′ phosphodiester bond linking the first nucleotide of the intron (the G of the 5′-GU-3′ motif) with the internal adenosine. This means that the intron has now been looped back on itself to create a lariat structure.
Cleavage of the 3′ splice site and joining of the exons result from a second transesterification reaction, this one promoted by the 3′-OH group attached to the end of the upstream exon. This group attacks the phosphodiester bond at the 3′ splice site, cleaving it and so releasing the intron as the lariat structure, which is subsequently converted back to a linear RNA and degraded. At the same time, the 3′ end of the upstream exon joins to the newly formed 5′ end of the downstream exon, completing the splicing process.
The mammalian U1-snRNP comprises the 165-nucleotide U1-RNA plus ten proteins. Three of these (U1-70K, U1-A and U1-C) are specific to this snRNP, the other seven are Sm proteins that are found in all the snRNPs involved in splicing. The U1-RNA forms a base-paired structure as shown. The U1-70K and U1-A proteins attach to two of the major stem-loops of this base-paired structure, and U1-C attaches via a protein-protein interaction. The Sm proteins attach to the Sm site. Based on Stark et al. (2001).
See the text for details. There are several unanswered questions about the series of events occurring during splicing and it is unlikely that the scheme shown here is entirely accurate. The key point is that associations between the snRNPs are thought to bring the three critical parts of the intron - the two splice sites and the branch point - into close proximity.
The commitment complex initiates a splicing activity. This complex comprises U1-snRNP, which binds to the 5′ splice site, partly by RNA-RNA base-pairing, and the protein factors SF1, U2AF35 and U2AF65, which make protein-RNA contacts with the branch site, the polypyrimidine tract and the 3′ splice site, respectively.
The pre-spliceosome complex comprises the commitment complex plus U2-snRNP, the latter attached to the branch site. At this stage, an association between U1-snRNP and U2-snRNP brings the 5′ splice site into close proximity with the branch point.
The spliceosome is formed when U4/U6-snRNP (a single snRNP containing two snRNAs) and U5-snRNP attach to the pre-spliceosome complex. This results in additional interactions that bring the 3′ splice site close to the 5′ site and the branch point. All three key positions in the intron are now in proximity and the two transesterifications occur as a linked reaction, possibly catalyzed by U6-snRNP, completing the splicing process.
There is one final aspect of SR proteins that we should address. This is the possibility that a subset of these SR proteins, called CASPs (CTD-associated SR-like proteins) or SCAFs (SR-like CTD-associated factors), form a physical connection between the spliceosome and the CTD of the RNA polymerase II transcription complex, and hence provide a link between transcript elongation and processing. As with some of the polyadenylation proteins (Section 10.1.2), it is probable that these splicing factors ride with the polymerase as it synthesizes the transcript, and are deposited at their appropriate positions at intron splice sites as soon as these are transcribed. Electron microscopy studies have shown that transcription and splicing occur together, and the discovery of splicing factors that have an affinity for RNA polymerase provides a biochemical basis for this observation (Corden and Patturajan, 1997).
When introns were first discovered it was imagined that each gene always gives rise to the same mRNA: in other words, that there is a single splicing pathway for each primary transcript (Figure 10.19A
(A) The cascade begins with sex-specific alternative splicing of the sxl pre-mRNA. In males all exons are present in the mRNA, but this means that a truncated protein is produced because exon 3 contains a termination codon. In females, exon 3 is skipped, leading to a full-length, functional SXL protein. (B) In females, SXL blocks the 3′ splice site in the first intron of the tra pre-mRNA. U2AF65 is unable to locate this site and instead directs splicing to a cryptic site in exon 2. This results in an mRNA that codes for a functional TRA protein. In males, there is no SXL so the 3′ splice site is not blocked and a dysfunctional mRNA is produced. (C) In males, exon 4 of the dsx pre-mRNA is skipped. The resulting mRNA codes for a male-specific DSX protein. In females, TRA stabilizes the attachment of SR proteins to an exonic splicing enhancer located within exon 4, so this exon is not skipped, resulting in the mRNA that codes for the female-specific DSX protein. The two versions of DSX are the primary determinants of the male and female physiologies. The female dsx mRNA ends with exon 4 because the intron between exons 4 and 5 has no 5′ splice site, meaning that exon 5 cannot be ligated to the end of exon 4. Instead a polyadenylation site at the end of exon 4 is recognized in females. See Chabot (1996) for more details. Note that the diagram is schematic and that the introns are not drawn to scale.
The gene comprises 35 exons, shown as boxes, eight of which (in blue) are optional and appear in different combinations in different slo mRNAs. There are 8! = 40 320 possible splicing pathways and hence 40 320 possible mRNAs, but only some 500 of these are thought to be synthesized in the human cochlea. Based on Graveley (2001).
At present we do not understand how alternative splicing is regulated and cannot describe the process that determines which of several splicing pathways is followed by a particular transcript. The players are thought to be the SR proteins in conjunction with ESEs and ESSs, but the way in which they control splice site selection is not known.
One of the more surprising events of recent years has been the discovery of a few introns in eukaryotic pre-mRNAs that do not fall into the GU-AG category, having different consensus sequences at their splice sites. These are the AU-AC introns which, to date, have been found in approximately 20 genes in organisms as diverse as humans, plants and Drosophila (Nilsen, 1996; Tarn and Steitz, 1997).
As well as the sequences at their splice sites, AU-AC introns have a conserved (though not invariant) branch site sequence with the consensus 5′-UCCUUAAC-3′, the last adenosine in this motif being the one that participates in the first transesterification reaction. This points us towards the remarkable feature of AU-AC introns: their splicing pathway is very similar to that for GU-AG introns, but involves a different set of splicing factors. Only the U5-snRNP is involved in the splicing mechanisms of both types of intron. The roles of U1-snRNP and U2-snRNP are taken by a previously discovered complex that had never been assigned a function. U11/ U12-snRNP, and an entirely new U4atac/U6atac-snRNP have subsequently been isolated, completing the picture.
The splicing pathways for the ‘major’ and ‘minor’ types of intron are not identical but many of the interactions between the transcript and the snRNPs and other splicing proteins are remarkably similar. This means that AU-AC introns, rather than simply being a curiosity, are proving useful in testing models for interactions occurring during GU-AG intron splicing. The argument is that a predicted interaction between two components of the GU-AG spliceosome can be checked by seeing if the same interaction is possible with the equivalent AU-AC components. This has already been informative in helping to define a base-paired structure formed between the U2 and U6 snRNAs in the GU-AG spliceosome (Tarn and Steitz, 1996).
In bacteria, the same RNA polymerase synthesizes all types of RNA. The issues that we have already discussed regarding elongation and termination of bacterial mRNA (Section 10.1.1) therefore also hold for rRNA and tRNA synthesis, and the only outstanding areas that we have to cover are the processing of the pre-rRNAs and pre-tRNAs into the mature molecules. This processing involves cutting events and chemical modifications, both types of reaction being similar to equivalent processing events for eukaryotic rRNAs and tRNAs: we will therefore deal with bacterial and eukaryotic processing together, cutting events in Section 10.2.2 and chemical modifications in Section 10.3. The distinctive feature of eukaryotic rRNA and tRNA processing is the presence in some eukaryotic pre-RNAs of introns, different from the pre-mRNA introns described above; these will be covered in Section 10.2.3. First, however, there are issues regarding transcript elongation and termination by RNA polymerases I and III that we must address.
In general, we know less about transcript elongation and termination by RNA polymerases I and III than we do about equivalent processes for RNA polymerase II. The interaction of the polymerase with the template and transcript during elongation appears to be similar with all three enzymes, a reflection of the structural relatedness of the three largest subunits in each polymerase. One difference is the rate of transcription - RNA polymerase I, for example, being much slower than RNA polymerase II, managing a polymerization rate of only 20 nucleotides per minute, compared with 2000 per minute for mRNA synthesis. A second difference is that neither RNA polymerase I nor RNA polymerase III transcripts are capped. Various proteins that might act as elongation factors for RNA polymerase I or III have been isolated, including SGS1 and SRS2 of yeast, which code for two related DNA helicases. Mutations in the genes for SGS1 and SRS2 cause a reduction in RNA polymerase I transcription as well as DNA replication (Lee et al., 1999). SGS1 is interesting because it is a homolog of a pair of human proteins that are defective in the growth disorders Bloom's and Werner's syndromes (Section 7.4.2) but the exact involvement of SGS1 and SRS2, and other putative elongation factors, in transcription by RNA polymerases I and III is not known.
The major differences between the three polymerases are seen when the termination processes are compared. The polyadenylation system for RNA polymerase II termination (Section 10.1.2) is unique to that enzyme and no equivalent has been described for the other two polymerases. Termination by RNA polymerase I involves a DNA-binding protein, called Reb1p in Saccharomyces cerevisiae and TTF-I in mice, which attaches to the DNA at a recognition sequence located 12–20 bp downstream of the point at which transcription terminates (Figure 10.22
The example shown results in synthesis of tRNAtyr. The tRNA sequence in the primary transcript adopts its base-paired cloverleaf structure (see Figure 11.2) and two additional hairpin structures form, one on either side of the tRNA. Processing begins with the cut by ribonuclease E or F forming a new 3′ end just upstream of one of the hairpins. Ribonuclease D, which is an exonuclease, trims seven nucleotides from this new 3′ end and then pauses while ribonuclease P makes a cut at the start of the cloverleaf, forming the 5′ end of the mature mRNA. Ribonuclease D then removes two more nucleotides, creating the 3′ end of the mature molecule. With this tRNA the 3′-terminal CCA sequence is present in the RNA and is not removed by ribonuclease D. With some other tRNAs this sequence has to be completely or partly added by tRNA nucleotidyltransferase. Abbreviation: RNase, ribonuclease. Based on Turner et al. (1997).
Some eukaryotic pre-rRNAs and pre-tRNAs contain introns which must be spliced during the processing of these transcripts into mature RNAs. Neither type of intron is similar to the GU-AG and AU-AC introns of pre-mRNA.
The splicing pathway for Group I introns is similar to that of pre-mRNA introns in that two transesterifications are involved. The first is induced not by a nucleotide within the intron but by a free nucleoside or nucleotide, any one of guanosine or guanosine mono-, di- or triphosphate (Figure 10.25
| Ribozyme | Description |
|---|---|
| Self-splicing introns | Some introns of Groups I, II and III splice themselves by an autocatalytic process. There is also growing evidence that the splicing pathway of GU-AG introns includes at least some steps that are catalyzed by snRNAs (Newman, 2001) |
| Ribonuclease P | The enzyme that creates the 5′ ends of bacterial tRNAs (see Section 10.2.2) consists of an RNA subunit and a protein subunit, with the catalytic activity residing in the RNA |
| Ribosomal RNA | The peptidyl transferase activity required for peptide bond formation during protein synthesis (Section 11.2.3) is associated with the 23S rRNA of the large subunit of the ribosome |
| tRNAPhe | Undergoes self-catalyzed cleavage in the presence of divalent lead ions |
| Virus genomes | Replication of the RNA genomes of some viruses involves self-catalyzed cleavage of chains of newly synthesized genomes linked head to tail. Examples are the plant viroids and virusoids and the animal hepatitis delta virus. These viruses form a diverse group with the self-cleaving activity specified by a variety of different base-paired structures, including a well-studied one that resembles a hammerhead. |
For more details of ribozymes, see Doherty and Doudna (2000).
The sequence of the intron is shown in capital letters, with the exons in lower case. Additional interactions fold the intron into a three-dimensional structure that brings the two splice sites close together. Reprinted with permission from Burke et al. (1987) Nucleic Acids Research, 15, 7217–7221, Oxford University Press.
See the text for details. In the second stage of the splicing pathway, the 2′,3′-P terminus is converted to a 3′-OH end by a phosphodiesterase, and the 5′-OH terminus is converted to 5′-P by a kinase. These two ends are then ligated together.
The processing events that we have studied so far have been either chemical modifications that affect the ends of transcripts (capping, polyadenylation) or physical changes to the lengths of transcripts (splicing, cutting events). The final type of processing that occurs with pre-RNAs is the chemical modification of nucleotides within the transcript. This occurs with pre-rRNAs and pre-tRNAs of both bacteria and eukaryotes and, to a much lesser extent, with pre-mRNAs of eukaryotes. Equivalent events in the archaea are poorly understood.
| Modification | Details | Example | |
|---|---|---|---|
| Methylation | Addition of one or more -CH3 groups to the base or sugar |
![]() | Methylation of guanosine gives 7-methylguanosine |
| Deamination | Removal of an amino (-NH2) group from the base |
![]() | Deamination of adenosine gives inosine |
| Sulfur substitution | Replacement of oxygen with sulfur |
![]() | 4-Thiouridine |
| Base isomerization | Changing the positions of atoms in the ring component of the base |
![]() | Isomerization of uridine gives pseudouridine |
| Double-bond saturation | Converting a double bond to a single bond |
![]() | Double bond saturation converts uridine to dihydrouridine |
| Nucleotide replacement | Replacement of an existing nucleotide with a new one |
![]() | Queosine |
We know relatively little about how tRNA modifications are carried out, beyond the fact that there are a number of enzymes that catalyze these changes. How the enzymes are directed to the correct nucleotides on which they must act has not been explained. With the rRNA and mRNA modifications we know rather less about the reasons for the chemical alterations but rather more about how the alterations are carried out. These issues are covered in the next two sections.
(A) This example shows methylation of the C at position 1436 in the Saccharomyces cerevisiae 25S rRNA (equivalent to the 28S rRNA of vertebrates), directed by U24 snoRNA. The D box of the snoRNA is highlighted. Modification always occurs at the base pair five positions away from the D box. Note that the interaction between rRNA and snoRNA involves an unusual G-U base pair, which is permissible between RNA polynucleotides (see also Figure 11.7A). Based on Tollervey (1996). (B) Many snoRNAs are synthesized from intron RNA, as shown here for human U16 snoRNA, which is specified by a sequence in intron 3 of the gene for ribosomal protein L1.
The snoRNA system provides an elegant solution to site-specific chemical modification but it applies only to eukaryotic rRNAs. In contrast, the modifications made to bacterial rRNAs are carried out by enzymes that directly recognize the sequence and/or structures of the regions of RNA that contain the nucleotides to be modified. Often two or more nucleotides in the same region are modified at once. Bacterial rRNA modification is therefore similar to the systems for modifying tRNAs in both bacteria and eukaryotes.
Conversion of a C to a U creates a termination codon, resulting in a shortened form of apolipoprotein B being synthesized in intestinal cells.
| Tissue | Target RNA | Change | Comments |
|---|---|---|---|
| Intestine | Apolipoprotein B mRNA | C→U | Converts a glutamine codon to a stop codon |
| Muscle | α-galactosidase mRNA | U→A | Converts a phenylalanine codon into a tyrosine codon |
| Testis, tumors | Wilms tumor-1 mRNA | U→C | Converts a leucine codon into a proline codon |
| Tumors | Neurofibromatosis type-1 mRNA | C→U | Converts an arginine codon into a stop codon |
| B lymphocytes | Immunoglobulin mRNA | Various | Contributes to the generation of antibody diversity |
| HIV-infected cells | HIV-1 transcript | G→A, C→U | Involved in regulation of the HIV-1 infection cycle |
| Brain | Glutamate receptor mRNA | A→inosine | Multiple positions leading to various codon changes |
Based on Smith and Sowden (1996), Scott (1997), Bourara et al. (2000) and Neuberger and Scott (2000). For more details about the generation of antibody diversity see Section 12.2.1.
So far this chapter has concentrated on synthesis of RNAs. Their degradation is equally important, especially with regard to mRNAs whose presence or absence in the cell determines which proteins will be synthesized. Degradation of specific mRNAs could be a powerful way of regulating genome expression.
The rate of degradation of an mRNA can be estimated by determining its half-life in the cell. The estimates show that there are considerable variations between and within organisms. Bacterial mRNAs are generally turned over very rapidly, their half-lives rarely being longer than a few minutes, a reflection of the rapid changes in protein synthesis patterns that can occur in an actively growing bacterium with a generation time of 20 minutes or so. Eukaryotic mRNAs are longer lived, with half-lives of, on average, 10–20 minutes for yeast and several hours for mammals. Within individual cells the variations are almost equally striking: some yeast mRNAs have half-lives of only 1 minute whereas for others the figure is more like 35 minutes (Tuite, 1996). These observations raise two questions: what are the processes for mRNA degradation, and how are these processes controlled?
Studies of mutant bacteria whose mRNAs have extended half-lives have identified a range of ribonucleases and other RNA-degrading enzymes that are thought to be involved in mRNA degradation. These include (Carpousis et al., 1999):
RNase E and RNase III, which are endonucleases that make internal cuts in RNA molecules;
RNase II, which is an exonuclease that removes nucleotides in the 3′→5′ direction;
Polynucleotide phosphorylase (PNPase), which also removes nucleotides sequentially from the 3′ end of an mRNA but, unlike true nucleases, requires inorganic phosphate as a co-substrate.
In the cell, RNase E and PNPase are located within a multiprotein complex called the degradosome. Other components of the degradosome include an RNA helicase, which is thought to aid degradation by unwinding the double-helix structure of the stems of RNA stem-loops. Fragments of rRNA occasionally co-purify with the degradosome, suggesting that the complex might be involved in both rRNA and mRNA degradation. But the exact role of the degradosome is still not clear and a few researchers are sceptical about its actual existence, pointing out that proteins not obviously involved in mRNA degradation, such as the glycolysis enzyme enolase, appear to be components of the degradosome, possibly indicating that the complex is an artefact that is produced during extraction of proteins from bacterial cells. A more significant gap in our knowledge concerns the way in which degradation is specifically targeted at individual mRNAs. We know that specific degradation occurs because mRNA degradation has been implicated in the regulation of several sets of bacterial genes, such as the pap operon of E. coli, which codes for proteins involved in synthesis of the cell surface pili (Baga et al., 1988). Unfortunately, the process by which such control is exerted remains a mystery.
Among eukaryotes, most progress in understanding mRNA degradation has been made with yeast. At least four pathways have been identified. One of these involves a multiprotein complex called the exosome, which degrades transcripts in the 3′→5′ direction and contains nucleases related to the enzymes of the bacterial degradosome. Exosomes are probably also present in mammalian cells and are clearly important, but they are not particularly well studied. Their role may not be in mRNA degradation per se, but in monitoring polyadenylation and ensuring that transcripts that are about to leave the nucleus have an appropriate poly(A) tail (Hilleren et al., 2001).
Rather more is known about two other eukaryotic mRNA degradation processes. The first of these is deadenylation-dependent decapping (Figure 10.31
The second well studied system for degradation of eukaryotic mRNAs is called nonsense-mediated RNA decay (NMD) or mRNA surveillance. The first of these names gives a clue to its function, because in molecular biology jargon a ‘nonsense’ sequence is a termination codon. NMD results in the specific degradation of mRNAs that have a termination codon at an incorrect position, either because the gene has undergone a mutation or as a result of incorrect splicing. The incorrect codon is thought to be detected by a ‘surveillance’ mechanism that involves a complex of proteins which scans the mRNA and somehow is able to distinguish between the correct termination codon, located at the end of the coding region of the transcript, and one that is in the wrong place (Figure 10.32A
The systems described above represent the eukaryotic processes for controlled degradation of endogenous mRNAs. Eukaryotes also possess other RNA degradation mechanisms that have evolved largely to protect the cell from attack by foreign RNAs such as the genomes of viruses. An example is the pathway called RNA interference, a name that will be familiar because RNA interference has been adopted by genome researchers as a means of inactivating selected genes in order to study their function (Section 7.2.2). The target DNA for RNA interference must be double stranded, which excludes cellular mRNAs but encompasses many viral genomes. The double-stranded RNA is cleaved by a ribonuclease called Dicer into short interfering RNAs (siRNAs) of 21–25 nucleotides in length (Ambros, 2001). This inactivates the virus genome, but what if the virus genes have already been transcribed? If this has occurred then the harmful effects of the virus will already have been initiated and RNA interference would appear to have failed in its attempt to protect the cell from damage. One of the more remarkable discoveries of recent years has revealed a second stage of the interference process that is directed specifically at the viral mRNAs. The siRNAs produced by cleavage of the viral genome are separated into individual strands, one strand of each siRNA subsequently base-pairing to any viral mRNAs that are present in the cell. The double-stranded regions that are formed are target sites for the RDE-1 nuclease, which destroys the mRNAs (see Figure 7.16).
In a typical mammalian cell, about 14% of the total RNA is present in the nucleus (Alberts et al., 1994). About 80% of this nuclear fraction is RNA that is being processed before leaving for the cytoplasm. The other 20% is snRNAs and snoRNAs, playing an active role in the processing events, at least some of these molecules having already been to the cytoplasm where they were coated with protein molecules before being transported back into the nucleus. In other words, eukaryotic RNAs are continually being moved from nucleus to cytoplasm and possibly back to the nucleus again.
In eukaryotes, rRNAs, tRNAs and mRNAs are transported from the nucleus to the cytoplasm, where these molecules carry out their cellular functions. At least some of the snRNAs and snoRNAs are also transported to the cytoplasm, where they are coated with proteins before returning to the nucleus to carry out their roles in RNA processing. The nuclear pore is not simply a hole in the nuclear membrane. It contains a protein assembly comprising a ring embedded in the pore, with structures radiating into both the nucleus and the cytoplasm. Not shown in this diagram is the central channel complex, a 12 kDa protein that is thought to reside in the channel that connects the cytoplasm to the nucleus.
Export of mRNAs is triggered by completion of the splicing pathway, possibly through the action of the protein called Yra1p in yeast and Aly in animals (Zhou et al., 2000; Keys and Green, 2001). Once outside the nucleus, there are mechanisms that ensure that mRNAs are transported to their appropriate places in the cell. It is not known to what extent protein localization within the cell is due to translation of an mRNA at a specific position or to movement of the protein after it has been synthesized, but it is clear that at least some mRNAs are translated at defined places. For example, those mRNAs coding for proteins that are to be transferred into a mitochondrion are translated by ribosomes located on the surface of the organelle. It is assumed that protein ‘address tags’ are attached to mRNAs in order to direct them to their correct locations after they are transported out of the nucleus, but very little is known about this process.
Give short definitions of the following terms:
Adenosine deaminase acting on RNA (ADAR)
AU-AC intron
Cleavage and polyadenylation specificity factor (CPSF)
Cleavage stimulation factor (CstF)
Exonic splicing enhancer (ESE)
Exonic splicing silencer (ESS)
GU-AG intron
Nonsense-mediated RNA decay (NMD)
Poly(A) polymerase
Rho dependent terminator
Short interfering RNA (siRNA)
Small nuclear ribonucleoprotein (snRNP)
Outline the important features of the elongation phase of transcription in Escherichia coli.
Describe how transcription is terminated in Escherichia coli.
Using diagrams and specific examples, indicate how the processes called antitermination and attenuation influence transcription in bacteria.
Describe the series of events that result in capping of a eukaryotic mRNA.
Name and outline the functions of three different elongation factors for mammalian RNA polymerase II.
Draw a series of diagrams to illustrate how a eukaryotic mRNA becomes polyadenylated.
What are the key sequence features of a GU-AG intron?
Give a detailed description of the series of events involved in splicing a GU-AG intron.
What processes are thought to ensure that the correct splice sites are selected during splicing of a GU-AG intron?
Give two examples to illustrate the importance of alternative splicing in genome expression.
Why are AU-AC introns remarkable?
Outline our current knowledge regarding elongation and termination of transcription by RNA polymerases I and III.
Describe the cutting events involved in processing of bacterial and eukaryotic pre-rRNA and pre-tRNA.
What is meant by ‘self-splicing’? Give details of the types of intron that display self-splicing. In your answer, distinguish between those introns that self-splice in vivo and those that only display this property in vitro.
What is a ribozyme? Compile an annotated list of known ribozymes.
List six types of chemical modification that occur with nucleotides in rRNA and tRNA. In each case, draw the structure of an example of a nucleotide resulting from the modification.
Give details of two examples of mRNA editing that occur in mammals.
Outline the more complex forms of RNA editing that are known in various eukaryotes.
Describe the processes of mRNA degradation in bacteria. How does the bacterial degradosome compare with the eukaryotic exosome?
Distinguish between deadenylation-dependent decapping and nonsense-mediated RNA decay.
What is Dicer and what does it do?
Outline how eukaryotic RNAs are transported from the nucleus to the cytoplasm.
‘Current thinking views transcription as a stepwise nucleotide-by-nucleotide process, with the polymerase pausing at each position and making a “choice” between continuing elongation by adding another ribonucleotide to the transcript, or terminating by dissociating from the template. Which choice is selected depends on which alternative is more favorable in thermodynamic terms.’ Evaluate this view of transcription.
Explore the introns-early and introns-late hypotheses. Is it possible to devise an analysis that will distinguish which of these two hypotheses is correct?
To what extent has the study of AU-AC introns provided insights into the details of GU-AG intron splicing?
The existence of ribozymes is looked upon as evidence that RNA evolved before proteins and therefore at one time, during the earliest stages of evolution, all enzymes were made of RNA. Assuming that this hypothesis is correct, explain why some ribozymes persist to the present day.
Using the current information on RNA degradation, devise a hypothesis to explain how specific mRNAs could be individually degraded. Can your hypothesis be tested?
Free Full text in PMC]