Section IVSL2 Trans -Splicing and Operons

Publication Details

A. The Discovery of a Second Spliced Leader

In 1989, a second SL was discovered in C. elegans. Huang and Hirsh (1989) reported that the SL at the 5′end of the gpd-3 gene, although it is the same length as the SL found at the 5′ends of other mRNAs, has a different sequence, which they called SL2. Using this sequence, they identified a different SL RNA (SL2 RNA) with a potential secondary structure similar to that of the original SL (now called SL1). SL2 RNA is also present as an snRNP; it is bound to Sm antigen and has a TMG cap. Like SL1, SL2 is trans-spliced to a variety of mRNAs. Initial evidence suggested that gpd-3 mRNA receives SL2, but no SL1, whereas other mRNAs receive only SL1. How is this specificity achieved? If SL1 is specific for trans-splicing at the 3′splice sites following outrons, and outrons are simply A + U-rich RNA of any sequence, where is the information to specify SL2?

B. The Discovery of Operons

A look at the organization of the gpd-3 genomic region (Huang et al. 1989) provides the answer (Fig. 6). The gpd-3 gene and all other genes whose mRNAs receive SL2 at their 5′ends occur at downstream positions in closely spaced clusters of genes with the same 5′to 3′orientations (Spieth et al. 1993; Zorio et al. 1994; for review, see Blumenthal 1995). The gpd-3 gene is the third gene in a three-gene cluster, and both of the downstream genes receive SL2. The upstream gene in the cluster, mai-1 , is not trans-spliced, although many first genes in such clusters are trans-spliced to SL1. Since the original discovery of this cluster, many more SL2-accepting mRNAs have been reported, and many more such clusters have also been found. In all cases (>30 are now known), mRNAs for genes that reside in downstream positions in clusters begin with SL2. Furthermore, every gene found to reside in such a position has turned out to encode mRNA that receives SL2. So it is now clear that the product of a gene in a downstream location in a closely spaced cluster is SL2 trans-spliced (Zorio et al. 1994).

Figure 6. The mai-1 operon.

Figure 6

The mai-1 operon. Boxes indicate exons, lines introns, and wavy lines intercistronic regions. This figure is based on the data in Huang et al. (1989) and Spieth (more...)

How might chromosomal position translate into specificity of trans-splicing? Spieth et al. (1993) hypothesized that the gene clusters are in fact similar to bacterial operons in the sense that the entire cluster is transcribed from a single promoter and regulatory region at the 5′end of the cluster. However, whereas bacterial operon mRNA remains polycistronic and is translated by internal ribosome binding at the 5′end of each cistron, the C. elegans polycistronic pre-mRNA is converted into monocistronic mRNAs. This occurs by cleavage and polyadenylation at the 3′ends of the upstream genes, accompanied by SL2-specific trans-splicing at the 5′ends of the downstream genes. Several experimental observations support this hypothesis. First, polycistronic cDNA clones were isolated that contained the entire coding sequences of mai-1 and gpd-2 , the first two genes of the operon, joined by the 100-nucleotide sequence between them, which demonstrates that these two genes are cotranscribed. Second, in worms carrying a construct containing the gpd-2 / gpd-3 gene pair downstream from the heat shock promoter, expression of the downstream gene gpd-3 is dependent on heat shock, and its product is trans-spliced to SL2. This shows that a controllable operon gives mature mRNA with correct splicing specificity. Third, SL2 specificity in this construct is dependent on the promoter being upstream of the first gene. Fourth, when the poly(A) site of the upstream gene is mutated, polycistronic precursor accumulates, indicating that expression of the mature downstream product is dependent on correct maturation of the upstream gene product. Finally, when a gene whose product normally receives SL1 is inserted between gpd-2 and gpd-3 , such that the intercistronic DNA between gpd-2 and the inserted gene is composed of sequences from the outron of the inserted gene, its product receives primarily SL2, indicating that being a downstream gene in an operon is sufficient to result in SL2-specific trans-splicing.

C. Properties of C. elegans Operons

In C. elegans, an operon is a cluster of closely spaced genes, transcribed from a regulatory region at the 5′end of the cluster, and whose monocistronic mRNAs are created from the polycistronic precursor RNA by conventional 3′-end formation (cleavage and polyadenylation) accompanied by trans-splicing to SL2 or a mixture of the two SLs. An analysis of more than 30 such operons has revealed some interesting properties. First, the genes are quite close together. Figure 7 shows the distances between the poly(A) sites of the upstream genes and the trans-splice sites of the downstream genes. In most cases, the genes are about 100 bp apart, whereas a few are about 300−400 bp apart. It is not yet clear what aspect of the evolution or function of the operons requires that they be spaced so uniformly. It also is not clear whether the data in Figure 7 represent a true bimodal distribution, which would suggest that there might be multiple constraints, resulting in two different favored lengths. The evolution of the operons and possible mechanistic constraints are discussed below.

Figure 7. Distance between genes in operons.

Figure 7

Distance between genes in operons. Each bar represents the number of genes with the indicated distance between the 3′end of the upstream gene and the 5′end of the downstream gene (more...)

Second, although some genes in downstream positions in operons receive SL2 exclusively, others receive a mixture of SL1 and SL2 (Spieth et al. 1993; Zorio et al. 1994). In contrast, trans-splice sites near promoters accept only SL1 (Zorio et al. 1994; Conrad et al. 1995). Why some downstream genes receive a mixture is not yet clear. One possibility is that these operons have internal promoters, so the SL1-containing message arises from pre-mRNA started at the internal promoter. A second possibility is that when 3′-end formation fails, the trans-splice site is no longer an SL2 substrate but can still be spliced to SL1. According to this idea, the entire upstream gene is read as an outron by the splicing machinery. So far, all of the genes following a 300−400-bp intercistronic region have been found to receive a mixture of the two SLs, whereas only a third of the genes following a 100-bp intercistronic region receive the mixture (Zorio et al. 1994). The reason for this difference is not known.

Although it was possible to isolate polycistronic cDNA clones for the mai-1/gpd-2 operon, in most cases, no evidence of polycistronic pre-mRNA has been found. Even when the sensitive reverse transcriptase−polymerase chain reaction (RT-PCR) technique has been used to search for RNA crossing the boundary between two genes, it has not been detected for most operons (D. Zorio et al., in prep.), suggesting that 3′-end formation and SL2 trans-splicing occur cotranscriptionally. It even suggests the possibility that 3′-end formation occurs before RNA polymerase has passed into the downstream gene. Alternatively, the two RNA processing events may occur in a concerted fashion, such that RNA containing sequences from both genes never exists. Only when 3′-end formation is, for some reason, inefficient can such RNA be detected.

D. Purpose of Operons in C. elegans

In bacteria, operons serve an important regulatory purpose: They allow coexpression from a single promoter/regulatory region of genes whose products function together. This both assures coordinate expression and results in efficient use of the cell's regulatory machinery. Do the C. elegans operons exist to assure coordinate regulation of genes whose products function together? Where it has been examined, the genes in C. elegans operons are indeed coexpressed (e.g., kup-1 and kin-13 [Land et al. 1994]). Because the functions of many operon genes are not known, no strong argument can be made for most of the operons (e.g., dpy-30 and rnp-1 [Hsu et al. 1995] and mes-3 and dom-3 [Paulsen et al. 1995]). However, there appear to be a few clear examples of purposeful coregulation. For instance, the two lin-15 genes are contained in an operon; although it is not known how these two unrelated proteins function, it is known that they collaborate in an aspect of signal transduction in formation of the vulva (Huang et al. 1994; Clark et al. 1994; see Greenwald, this volume). A second example is the deg-3 gene, which encodes an acetylcholine receptor subunit (see Treinin and Chalfie 1995). When it was discovered that this gene's product received SL2, the upstream DNA was sequenced and found to contain another gene, appropriately spaced and in the same orientation, that is likely to be another subunit of the same receptor (M. Treinin and M. Chalfie, pers. comm.). Since they have shown that these two genes are expressed in the same cells, a good case can be made that their presence together in the operon serves the purpose of coexpression of proteins whose products function together. Many of the operons contain genes whose products would be expected to be expressed ubiquitously; for example, one operon contains the gene for fibrillarin, a protein needed for rRNA processing, and a gene for a ribosomal protein, rps-16 (Zorio et al. 1994). Another contains genes for a chromosomal protein and for topoisomerase II. In these cases, they may be in the same operon simply to take advantage of a single ubiquitously expressed promoter.

It seems likely that the C. elegans operons are not ancient but are instead an innovation, perhaps having evolved as a response to selection for a small genome (although it is not clear what aspect of C. elegans' lifestyle or development might require a small genome). Regions of DNA between genes might have been deleted such that the two genes are so close together that they can be cotranscribed from the upstream gene's promoter. If both of their products happen to be required together, or everywhere, an evolutionary advantage of reducing the genome size will accrue with no cost due to losing a regulatory region. Any new associations between genes will be tolerated as long as the benefits outweigh the costs. One of the benefits would be coregulation of genes whose products are needed in the same cells. Operons that fulfill this purpose should accumulate in the genome over time, whereas others might be lost.

It is not yet clear how widespread operons of the type found in C. elegans are in eukaryotes. So far, they have been observed only in free-living nematodes: C. elegans and its sibling species, C. briggsae, and a distantly related rhabditid nematode, CEW1 (D. Evans et al., unpubl.). In general, where C. briggsae homologs of C. elegans genes that occur in operons have been examined, the C. briggsae homologs have been shown to be present in the same genomic arrangement (Lee et al. 1992; Hengartner and Horvitz 1994b; Kuwabara and Shah 1994; D. Zorio et al., in prep.). In CEW1, an operon containing two ribosomal proteins has been discovered, with 87 bp between the genes, and the downstream gene is trans-spliced to a sequence quite similar to SL2 (D. Evans et al., unpubl.). The fact that 80−90% of gene products of the parasitic species, Ascaris lumbricoides, are trans-spliced to SL1 (Maroney et al. 1995), whereas others are not trans-spliced, suggests that if operons exist in this species, either a specialized spliced leader is not used at internal trans-splice sites or operons are much less common than they are in C. elegans. It should be emphasized, however, that no systematic search for operons outside of C. elegans has yet been undertaken.

B. E. Another Kind of Operon

Recently, a second type of operon has been discovered (Hengartner and Horvitz 1994b; I. Korf and S. Strome, pers. comm.) that is different in two significant ways: (1) The mRNA of the downstream gene is trans-spliced to SL1, rather than SL2, and (2) there is no intercistronic sequence. The site of polyadenylation of the upstream gene and the trans-splice site are at adjacent nucleotides. The first such operon to be reported contains the cyt-1 and ced-9 genes, and the second contains the mes-6 and cks-1 genes. Operons containing the U170K and SRP54 genes are also apparently of this type (L. Xu and T. Blumenthal, unpubl.). It seems likely that this kind of operon may be functionally no different from the more common SL2 type but that the mechanism of processing of its polycistronic precursor may be quite different (see below).

F. Mechanism of SL2-specific Trans -splicing and 3′-end Formation

How is SL2 trans-splicing specified? SL2-accepting trans-splice sites have the same consensus sequences as do intron 3′splice sites and SL1-accepting trans-splice sites (see Table 2). However, they appear to have more mismatches to the consensus than do the others, suggesting a mechanistic difference. One obvious idea is that 3′end formation, which occurs just upstream, is somehow directly or indirectly involved. The downstream product of the cleavage event in 3′-end formation is a free 5′phosphate (for review, see Wahle and Keller 1992). One might expect this product to be rapidly degraded because it is not protected by a cap. Why is the RNA coding for the downstream gene not subject to such degradation? Perhaps it binds to the SL2 snRNP, which subsequently splices at the trans-splice site 100−400 nucleotides downstream. In this view, SL2 is attracted to the appropriate sites by the 5′phosphate end. A somewhat more palatable idea is that the SL2 snRNP has affinity for the 3′-end formation machinery itself. According to this idea, a single complex involving the 3′-end formation machinery and the splicing machinery forms to accomplish both processes, perhaps simultaneously.

Both of these models suggest that the 3′-end formation machinery would be needed for SL2-specific splicing, and there is some experimental support for such a relationship. It has been shown in other systems that 3′-end formation is less efficient if the promoter is moved closer to the 3′end (see, e.g., Iwasaki and Temin 1990). Spieth et al. (1993) inserted the heat shock promoter at various positions within the gpd-2 gene and found that the closer the promoter is to the 3′-end signals, the less SL2 splicing occurs. It was subsequently shown that SL1 trans-splicing occurs instead and that 3′-end formation is indeed less efficient when the promoter is positioned closer to the 3′end (S. Kuersten et al., unpubl.), supporting the idea that 3′-end formation and SL2 trans-splicing are connected. The most direct experiment would be to inactivate the 3′-end formation signal. Spieth et al. (1993) mutated this signal, such that it was weakened but not eliminated, and found that polycistronic transcript accumulated. Thus, even though the trans-splice signal remained intact, it was only inefficiently used. This experiment suggests that the upstream 3′-end formation signal must remain intact for efficient SL2-specific trans-splicing. However, it does not answer whether 3′-end formation or the machinery that normally binds there is needed. The fact that in the rrs-1 mutant strain, missing SL1, SL2 is trans-spliced onto normally SL1 accepting RNAs indicates that they are in competition, presumably for use of all trans-splice sites (Ferguson et al. 1996). The fact that an extrachromosomal array carrying the SL2 RNA gene can suppress the embryonic lethality of the rrs-1 mutation lends further support to the idea that SL2 can be utilized in place of SL1 when SL1 is missing (Ferguson et al. 1996).