• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of rnaThe RNA SocietyeTOC AlertsSubscriptionsJournal HomeCSHL PressRNA
RNA. Aug 2005; 11(8): 1303–1316.
PMCID: PMC1370813

Genome-wide analyses of two families of snoRNA genes from Drosophila melanogaster, demonstrating the extensive utilization of introns for coding of snoRNAs

Abstract

Small nucleolar RNAs (snoRNAs) are an abundant group of noncoding RNAs mainly involved in the post-transcriptional modifications of rRNAs in eukaryotes. In this study, a large-scale genome-wide analysis of the two major families of snoRNA genes in the fruit fly Drosophila melanogaster has been performed using experimental and computational RNomics methods. Two hundred and twelve gene variants, encoding 56 box H/ACA and 63 box C/D snoRNAs, were identified, of which 57 novel snoRNAs have been reported for the first time. These snoRNAs were predicted to guide a total of 147 methylations and pseudouridylations on rRNAs and snRNAs, showing a more comprehensive pattern of rRNA modification in the fruit fly. With the exception of nine, all the snoRNAs identified to date in D. melanogaster are intron encoded. Remarkably, the genomic organization of the snoRNAs is characteristic of 8 dUhg genes and 17 intronic gene clusters, demonstrating that distinct organizations dominate the expression of the two families of snoRNAs in the fruit fly. Of the 267 introns in the host genes, more than half have been identified as host introns for coding of snoRNAs. In contrast to mammals, the variation in size of the host introns is mainly due to differences in the number of snoRNAs they contain. These results demonstrate the extensive utilization of introns for coding of snoRNAs in the host genes and shed light on further research of other noncoding RNA genes in the large introns of the Drosophila genome.

Keywords: snoRNA, ncRNA, intron, RNA modification, Drosophila melanogaster

INTRODUCTION

Small nucleolar RNAs (snoRNAs) represent an abundant group of noncoding RNAs (ncRNAs) mainly involved in rRNA biogenesis (Kiss 2002). With the exception of RNase MRP, all the snoRNAs fall into two major families, box C/D and box H/ACA snoRNAs, on the basis of common sequence motifs and structural features (Balakin et al. 1996). Box C/D snoRNAs share two conserved motifs, the 5′ end box C (RUGAUGA) and the 3′ end box D (CUGA), whereas the box H/ACA snoRNAs exhibit a common hairpin-hinge-hairpin-tail secondary structure with the H (ANANNA) motif in the hinge region and an ACA triplet 3 nt from the 3′ end of the molecule (Ganot et al. 1997). Several snoRNAs, such as U3, snR30, and RNase MRP, are required for specific cleavage of pre-rRNAs (Kiss 2002). However, the majority of box C/D snoRNAs function as guides for site-specific 2′-O-ribose methylation with most box H/ACA snoRNAs functioning as guides for pseudo-uridylation in the post-transcriptional processing of rRNAs (Smith and Steitz 1997; Bachellerie et al. 2000). Recent research has shown that some snoRNAs and scaRNAs participate in the modifications of snRNAs (Kiss 2002; Zhou et al. 2002; Tycowski et al. 1998, 2004) and even tRNAs by Archaea homologs (Clouet d’Orval et al. 2001). Moreover, an increasing number of orphan snoRNAs with unknown function have been identified from different eukaryotes, suggesting they play additional roles in cellular processes. The identification of snoRNA homologs in Archaea further demonstrates their ancient origin and preservation during the course of evolution (Bachellerie et al. 2002).

The Drosophila rRNAs possess numerous post-transcriptional modifications (Ofengand and Bakin 1997), which has made D. melanogaster a good model for studying the expression of snoRNA genes (Tycowski and Steitz 2001; Huang et al. 2004). A recent study of experimental Rnomics in the fruit fly identified 66 small ncRNAs, of which 35 species belonged to the two families of snoRNAs (Yuan et al. 2003). Computational analyses of conserved structural and functional motifs have also been applied to identify typical box H/ACA snoRNAs and box C/D snoRNA, resulting in a validation of 10 box H/ACA snoRNAs (Huang et al. 2004) and 26 box C/D snoRNAs from the Drosophila genome (Accardo et al. 2004; our unpubl. results). These studies, in addition to other reports (Tycowski and Steitz 2001; Zhou et al. 2002), have identified approximately one-third of the total snoRNAs from Drosophila as estimated proportionally to the modifications on rRNAs. Furthermore, some interesting features in the expression of the snoRNA genes have also been observed, such as the discovery of dUhg (Tycowski and Steitz 2001) and intronic box H/ACA snoRNA gene clusters in the fruit fly (Huang et al. 2004). These results encouraged us to perform a large-scale genome-wide analysis of the two families of snoRNAs in Drosophila using both experimental and computational Rnomics methods. In an attempt to identify all of the snoRNAs and give an overall view of their genomic organization, we have identified 212 gene variants encoding 119 snoRNAs from the Drosophila genome, including 57 novel snoRNAs. This analysis provides more comprehensive data on the pattern of rRNA modifications as well as a panorama of the snoRNA genomic organization in D. melanogaster. Moreover, a systematic study on the snoRNA host genes has revealed how extensively and efficiently intron regions have been used for ncRNA coding in the compact genome of the fruit fly.

RESULTS

Identification of 31 novel box C/D snoRNAs from Drosophila

A special cDNA library for box C/D snoRNAs was constructed with total D. melanogaster RNA. We focused on the cDNA fraction of 60–120 nt according to the size range of most box C/D snoRNAs. After screening to eliminate abundant fragments of rRNAs, 35 box C/D snoRNA variants were identified from the library. Most of the cDNA sequences were satisfactorily intact and exhibited typical features of box C/D snoRNA. Detailed examination of the functional elements of the cDNA sequences revealed 28 box C/D snoRNA species, including 12 novel snoRNAs (Table 11).

TABLE 1.
BOX C/D snoRNA genes in D. melanogaster

We have previously used a eukaryotic box C/D snoRNA search program to identify snoRNA genes in Oryza sativa (Chen et al. 2003). This effective computational program was applied in the present study to search the D. melanogaster genome for box C/D snoRNAs. The searches were mainly based on primary structural elements of the C/D snoRNAs and the information on rRNA methylation sites in eukaryotes such as mammals and yeast before fruit fly data were available. Computational analysis identified 97 snoRNA candidates that accounted for most of our experimental results and recently reported data (Accardo et al. 2004). Northern blotting was used to verify the novel candidates identified in silico, and 19 novel box C/D snoRNAs were validated (Fig. 1A1A)) and added to our snoRNA database (Table 11).). All snoRNA sequences were then used to find their isoforms in the Drosophila genome.

FIGURE 1.
Northern blot analyses. Aliquots of 30 μg total cellular RNA were separated on a denaturing 8% polyacrylamide gel and hybridized with the labeled oligonucleotide probes described in Materials and Methods. (Lane M) molecular weight markers (pBR322 ...

In total, 102 gene variants encoding 63 box C/D snoRNAs were identified from the Drosophila genome. It has to be pointed out that a scaRNA U85 and three known snoRNAs, snoR684, snoR291, and U3, were not detected in this analysis; however these snoRNAs have been included for further analysis in the discussion. Forty-eight snoRNAs were predicted to guide 54 methylated residues in 5.8S, 18S, and 28S rRNA. Four typical snoRNAs and a scaRNA U85 were guides for internal methylations in U2, U5, and U6 snRNAs. In addition to the guide snoRNAs, 12 snoRNAs with no target in either rRNA and snRNA were termed orphan snoRNAs and their functions remain to be elucidated (Table 11;; Supplementary Figure S1).

Identification of 26 novel box H/ACA snoRNAs from Drosophila

To study box H/ACA snoRNAs in Drosophila, another special cDNA library was constructed with total D. melanogaster RNA. In contrast to the box C/D snoRNA research, we focused on the cDNA fraction of 120–180 nt to evaluate the box H/ACA snoRNAs. After screening several thousand cDNA clones to eliminate heavy contamination of rRNA fragments, 31 box H/ACA candidates were identified. Sequence and structural analyses revealed that the candidates represented 29 box H/ACA snoRNA species, including 14 novel species when compared with known snoRNA data (Yuan et al. 2003; Huang et al. 2004).

Based on the conserved secondary structure and functional elements, a computational analysis for other possible box H/ACA snoRNA genes was performed on all the introns of known snoRNA host genes and ribosomal protein genes. Apart from known box H/ACA snoRNAs, 19 novel candidates were identified; 12 of them were further confirmed by the Northern blotting analyses (Fig. 1B1B)) and added to our database.

In total, 110 gene variants encoding 56 box H/ACA snoRNAs were identified from the Drosophila genome. When compared with the published data on Drosophila, 26 novel box H/ACA snoRNAs have been first reported in this study. Fifty-one snoRNAs were predicted to guide 85 pseudouridylations in 18S and 28S rRNA. Two snoRNAs, Y28S-1232 and Y28S-3186, may guide pseudouridylations for both rRNA and U4 or U5 snRNA. Moreover, five orphan snoRNAs that lack sequence complementary to rRNA and snRNA were also determined (Table 22;; Supplementary Figure S2).

TABLE 2.
Box H/ACA snoRNA genes in D. melanogaster

Two distinguished organizations dominate the expression of the snoRNAs in Drosophila

Following the identification of numerous snoRNAs, their genomic organization and expression strategy were then investigated. A large proportion of the box C/D snoRNAs were clearly located in the introns of protein-coding genes such as ribosomal proteins and proteins associated with snoRNP and transcriptional factors. However, 33 gene variants encoding 16 box C/D snoRNAs in addition to two box H/ACA snoRNAs were found clustered in six chromosomal spots that had no annotation (Fig. 22),), and therefore were regarded as intergenic spacers in the Drosophila genome. Interestingly, each interval between two individual snoRNA genes was ~150–200 nt, which resembles the situations of the two dUhg genes reported previously (Tycowski and Steitz 2001). By searching typical intron splicing signals (5′GT and 3′AG with branching sequences) that flank each of the snoRNA genes, our computational analysis revealed that all the snoRNAs were indeed intron encoded (Table 33)) and it was unlikely that the spliced exons of host genes encoded any protein. Therefore, six new UHG-like genes were identified and designated as dUhg 3–8. All six ncRNAs, except dUhg 7, were further supported by the presence of a perfect match with ESTs from the Flybase database (http://www.flybase.org, Database of the Drosophila Genome) (Fig. 33).). As observed in the vertebrate UHG genes (Tycowski et al. 1996), the cDNA transcripts of the dUhg genes in Drosophila possessed a poly(A) tail at their 3′ end, suggesting products of RNA polymerase II (Table 33).). Together with two previously identified dUhg genes (Tycowski and Steitz 2001), the eight dUhgs encode 53 snoRNA isoforms that represent half of the known box C/D snoRNA genes in Drosophila. In contrast to other eukaryotes, dUhg therefore represents a major gene organization for the expression of box C/D snoRNAs in the fruit fly. Of particular interest is that although each dUhg encodes multiple box C/D snoRNAs, the mode of one snoRNA per intron is strictly maintained.

FIGURE 2.
Chromosomal mapping of snoRNA genes in D. melanogaster. The gray and the empty boxes represent euchromatin and hetero-chromatin, respectively. Centromere is denoted by an empty circle. Intronic snoRNA gene/gene clusters are indicated by their host genes ...
FIGURE 3.
Schematic diagram of eight dUhg genes in D. melanogaster. Exons and introns of host genes are indicated by empty boxes and solid lines, respectively. Box H/ACA and box C/D snoRNA genes within introns are indicated by black and gray boxes, respectively. ...
TABLE 3.
Eight dUhg host genes in D. melanogaster

Apart from intron-encoded snoRNAs, three box C/D snoRNAs, including U3 snoRNA, which is involved in rRNA processing, are transcribed independently from intergenic regions, showing a diverse gene organization. It is also worth noting that the U3 snoRNA gene is clustered with a tRNALeu gene in an inverted repeat (Fig. 44).

FIGURE 4.
SnoRNA gene clusters and their duplication in D. melanogaster. (A) Seven novel intronic box H/ACA snoRNA gene clusters. (B) Inverted U3 snoRNA-tRNA clusters and repeated units of box C/D snoRNA gene in introns. Exons and introns of host genes are indicated ...

In terms of the box H/ACA snoRNAs, most of them are intron-encoded in protein-coding genes, especially rich in ribosomal proteins. A distinguishing characteristic of the genomic organization for this family of snoRNA is the prevalence of intronic clusters. In addition to the clusters reported previously (Huang et al. 2004), a total of 17 intronic clusters encoding 77 box H/ACA snoRNAs (70% of the gene variants identified) have been identified from the Drosophila genome (Fig. 44),), highlighting the importance of this gene organization for the expression of box H/ACA snoRNAs. In general, the clusters are composed of isoforms of the same snoRNA genes, suggesting that local duplication has served as a major way to form the multiple snoRNA clusters in the introns of protein-coding genes. Interestingly, DmOr-aca4, an orphan snoRNA, was mapped entirely to an intron of a hypothetical pre-mRNA, but was transcribed in the opposite orientation of the protein-coding gene. This peculiar gene organization may suggest a possible antisense function of the orphan snoRNA in the regulation of pre-mRNA processing. In addition to being intron encoded, five box H/ACA snoRNAs appear to be transcribed independently from intergenic regions.

Sixty host genes encoding 207 snoRNAs were scattered over the two large autosomes and the X chromosome of Drosophila, but none was found in chromosome 4, which is the shortest chromosome. All the host genes were mapped in the region of euchromatin, with the exception of two located in the heterochromatin where the expression of most genes is not active (Fig. 22).). The overall genome-wide analysis of snoRNA genes in Drosophila shows a preferential transcription of polycistronic snoRNA from two distinct gene organizations, first, dUhgs, which mainly encode box C/D snoRNA, and second, intronic clusters, which dominate the expression of box H/ACA snoRNA.

Extensive utilization of introns for snoRNA-coding in the Drosophila host genes

It is evident that in D. melanogaster snoRNAs can usually be found in more than one intron of a variety of host genes. Among the 267 introns in the host genes, more than half of them (145 introns) have been identified as host introns for the snoRNAs and a small ncRNA (Dm184). A higher percentage for snoRNA coding in large introns of the host genes can be attained when taking account of 85 “empty” introns that are smaller than 150 nt, the minimum length required to accommodate one box C/D snoRNA. These empty introns can therefore be easily eliminated from the list of potential host introns. For example, nine of the 13 introns of Dom gene are longer than 150 nt, and 8 turn out to encode 15 snoRNAs and a small ncRNA. In splicing factor 3b, there are nine introns longer than 150 nt, all of which are host introns for snoRNAs. Remarkably, the dUhg genes, which are unlikely to encode any protein, seem to be perfectly designed for snoRNA coding. The eight dUhg genes possess 55 introns in total, only two of which are empty (Fig. 33).

Thirty-seven empty introns larger than 150 nt were further examined in detail. Interestingly, 11 of them contained 50–120-nt-long conserved intronic sequences (LCISs) as compared to their counterparts among different species of Drosophila, suggesting the presence of putative cis-acting elements or possible small ncRNAs other than snoRNAs. LCIS was also observed in the undefined region of some large introns that were hosts for snoRNAs. Further experiments were performed to validate some of these LCISs and two of them were characterized as stable ncRNAs with unknown function (Fig. 55).). The remaining 26 introns (about 10% of the total introns in this study) were devoid of any LCIS. However, 15 of the empty introns (57.7%) belonged to the first intron of ribosomal protein genes or other housekeeping genes. This suggests that instead of snoRNA coding, these introns may have a specialist function, as it has been shown that the first introns that contain binding sites for transcriptional factors are important for the expression of ribosomal protein genes in mammals (Antoine and Kiefer 1998).

FIGURE 5.
Experimental determination of two ncRNAs encoded by LCIS. (Lane M) molecular weight markers (pBR322 digested with HaeIII and 5′-end labeled with [γ-32P]ATP). Lane IN1 and IN2 are two samples of LCIS detected by specific probes in (A) Northern ...

The average size of the empty introns was ~60–70 bp (Fig. 66),), which corresponds perfectly to the length of the minimal intron (61 bp) in Drosophila (Yu et al. 2002). Due to snoRNA or snoRNA cluster containing, the host introns can vary in length from 150 bp to 2 kb. However, after removing all snoRNA/snoRNA clusters and ncRNA-coding regions from the 145 host introns, the size distribution of the remaining sequence in these introns was reduced to 90–120 bp, which was slightly longer than the minimal intron length (Fig. 66).). Taking account of sequences necessary for snoRNA processing (Hirose and Steitz 2001), the intron hosts are only sufficiently long enough to hold a snoRNA gene or gene cluster without any redundant sequence. Interestingly, this compact structure of the host introns appears strictly maintained to encode the two families of snoRNAs in different host genes. In fact, this compact structure of host introns results mainly from parsimonious spacers from the snoRNA-coding region to the 5′ splice site, most of which were centered around 30–40 bp despite the intron sizes varying to a large extent (Fig. 77).). The distance from the snoRNA gene/gene cluster to the 3′ splice sites averaged between 60 and 80 bp, which was very similar to the intronic positioning of box C/D snoRNA genes in mammals. This distance has been proven to be important for the effective processing of the snoRNAs from their host mRNA precursor (Hirose and Steitz 2001).

FIGURE 6.
Distribution of lengths of spacer sequences for 267 introns from 60 snoRNA host genes in D. melanogaster. Empty intron denotes the introns in which no intronic ncRNA was found. Extra region of “carrier” intron refers to the remaining sequences ...
FIGURE 7.
Distribution of lengths of spacer sequences for 144 D. melanogaster snoRNA genes/gene clusters. The black and gray bars represent distances from 5′ and 3′ splice sites, respectively.

DISCUSSION

A more complete list of the two families of snoRNA genes in Drosophila

To obtain a comprehensive understanding of the genomic organization and expression strategy of ncRNA in D. melanogaster, we have performed a large-scale analysis of the two major families of snoRNA genes using both experimental and computational RNomics methods. In this study, 212 gene variants encoding 56 box H/ACA and 63 box C/D snoRNAs have been identified in the fruit fly. These data are consistent with previous works (Tycowski and Steitz 2001; Yuan et al. 2003; Accardo et al. 2004; Huang et al. 2004) and further includes 57 novel snoRNAs. In addition, two novel small ncRNAs other than snoRNAs were also validated. This extensive study indicates the complexity of the snoRNA gene families in the Drosophila genome and, furthermore, strengthens the analytical strategies used in this research, such as size-fractioned cloning for specialized cDNA libraries that are enriched in snoRNAs. It is also evident that the computational analysis of the Drosophila genome is complementary to the cDNA cloning approach. In D. melanogaster, a high degree of pseudouridylation on rRNA has been reported (Ofengand and Bakin 1997), while rRNA methylation sites have not yet been mapped. In this study, we have added to the long list a number of novel box C/D snoRNAs that are absent from previous works (Tycowski and Steitz 2001; Yuan et al. 2003; Accardo et al. 2004), reflecting that the level of rRNA methylation in Drosophila is not low. According to our computational analysis of the Drosophila genome, we have further predicted that the number of box C/D snoRNA for rRNA is about 97, although dozens of them remain to be confirmed by experimental detection. Interestingly, most of the box C/D snoRNAs from Drosophila possess only one antisense sequence, which is predicted to guide a single methylation site in the RNAs. In contrast, more than half of the box H/ACA snoRNAs exhibit two functional elements guiding two pseudouridylations in the RNAs (some of which are predicted to guide two pseudouridylations in rRNA and snRNA). It is not clear whether the high proportion of the bifunctional box H/ACA snoRNAs is relative to the constraint of high rRNA pseudouridylation in Drosophila or to a selective advantage in the functional evolution of these snoRNAs. From the distribution of the modified residues predicted by the snoRNAs, we estimate that more than two-thirds of the guide snoRNA for rRNA methylation and pseudouridylation in Drosophila have been described in this study. Although by no means exhaustive, the analysis has provided a more complete list of the two families of snoRNA genes in Drosophila.

Evolution of snoRNA gene organization and expression strategy

Identification of a large number of snoRNA genes provides a unique opportunity to investigate the genomic organization and expression of the ncRNAs in Drosophila. Similar to vertebrates (Maxwell and Fournier 1995), almost all snoRNAs in Drosophila are intron encoded. This organization not only emphasizes the utility of introns that have long been considered as junk DNA (Flam 1994), but also suggests an intriguing link between mRNA splicing and the expression of snoRNAs that are mainly involved in the post-transcriptional modifications of rRNAs in higher animals. In general, the host genes of snoRNAs are mostly protein-coding genes related to ribosomal biogenesis and nucleolar formation, showing expressional coordination among the various components of the protein translation machinery (Bachellerie et al. 2000). Some ncRNA genes, such as UHG, have also served as host genes specifically encoding box C/D snoRNA, which demonstrates introns, instead of exons, in an RNA precursor producing stable and functional RNAs. Since UHG was originally identified in mammals (Tycowski et al. 1996), only a few UHG-like genes have been further identified for snoRNA coding. In this study, we have shown at least eight dUhgs in the Drosophila genome. The abundance of these genes together with their powerful ability for snoRNA coding is a characteristic of snoRNA gene organization in Drosophila. Thus the importance of large ncRNAs, such as UHG, for small RNA coding, particularly in higher animals, may be underestimated because most of the noncoding genes in the mammal genome remain to be annotated (News Staff 2004). For instance, recent studies have revealed that some imprinted ncRNA genes in mammals comprise a large number of repeated tandem introns that encode box C/D snoRNAs specifically expressed in brain (Cavaille et al. 2000, 2002). It remains to be answered why in both mammals and Drosophila, the UHG or dUhg, respectively, encode mainly box C/D snoRNAs, but rarely box H/ACA snoRNAs. For example, the eight dUhgs in Drosophila contained only two box H/ACA snoRNAs compared to 51 box C/D snoRNAs.

The majority of box H/ACA snoRNAs are also intron encoded in Drosophila, but they favor another gene organization, that is, intronic cluster. SnoRNA gene clusters were originally found in higher plants (Leader et al. 1995) and the budding yeast Saccharomyces cerevisiae in which most snoRNAs are independently transcribed from singletons and five polycistronic snoRNA clusters (Lowe and Eddy 1999; Qu et al. 1999). Gene clusters have also been found in some primordial organisms, such as trypanosomes (Dunbar et al. 2000), Euglena (Russell et al. 2004) and Giardia (our unpubl. results), implying that the clusters may be an ancient gene organization conserved through the course of evolution. Remarkably, both intronic and independently transcribed polycistrons have considerably developed and become the predominant genomic organizations of snoRNAs in flowering plants (Qu et al. 2001; Liang et al. 2002; Brown et al. 2003; Chen et al. 2003). An atypical discistron, the tRNA-snoRNA gene cluster, was also identified in the rice genome (Kruszka et al. 2003). The intronic polycistron has never been reported in metazoan other than Drosophila and recognized as a predominant genomic organization for the box H/ACA snoRNAs.

Among all the organisms analyzed so far, it is only in Drosophila that the two families of snoRNAs exhibit such an important divergence in gene organization and expression strategies. This may reflect intrinsic differences in the mechanisms by which the two families of snoRNA genes evolved respectively during the speciation of the fruit fly.

It is worth noting that box H/ACA snoRNAs have many more isoforms than C/D snoRNAs in Drosophila. Unlike C/D snoRNA isoforms, which are highly conserved, mutations occurred frequently in box H/ACA snoRNA isoforms. The conserved secondary structures and box elements in H/ACA isoforms remain unchanged, which guarantees the maturation of snoRNA. Interestingly, the accumulation of mutations in the isoforms would lead to partial alternation of snoRNA’s function in loss or gain of rRNA complementary sequences (Supplementary Table S1).

Intronic regions deserve more attention in searching for ncRNAs in Drosophila

As a type of intervening sequence of genes, splicesomal introns with a large size variation are present in most (especially multicellular) eukaryotes (Logsdon 1998; Nixon et al. 2002). By comparison to the housefly, Drosophila possesses a very small genome; however, a large number of introns (about five introns per gene) have been estimated for the Drosophila genome (Yu et al. 2002). Interestingly, comparative genomic analysis has revealed similar constraints in the intergenic and intronic sequences of the Drosophila genome and about one-fourth of intronic sequences are conserved (Bergman and Kreitman 2001). A draft expression map of the Drosophila genome has also shown thousands of uncharacterized transcripts expressed from noncoding DNA in a developmentally coordinated manner (Stolc et al. 2004). Nevertheless, the variation of intron sizes and function of most introns in the Drosophila genome still remain to be elucidated.

Recently, a systematic analysis of intron size has revealed the presence of minimal introns in most multicellular (and some unicellular) eukaryotes (Yu et al. 2002). The minimal introns showed a sharp peak in the distribution of intron size in all the organisms and were not randomly distributed among genes. They were therefore suggested to function as a type of cis-element for enhancing the export of spliced mRNA from the cell nucleus (Yu et al. 2002). The minimal introns in D. melanogaster have a sharp “spike” around 60 bp, and account for about half of the total introns in the genome.

In contrast to a small genome, the majority of the remaining introns in Drosophila genes are moderately large, falling between 0.1 kb and 2.0 kb. Numerous cis-elements were also found in the large introns, which function as regulatory sequences in alternative splicing of mRNA precursor (Standiford et al. 2001) or as enhancers for gene transcription (Meredith and Storti 1993). In addition, many ncRNAs, such as miRNAs, were frequently identified from these intronic sequences (Aravin et al. 2003; Lai et al. 2003; Yuan et al. 2003). In this work, we have systematically analyzed 267 introns from the 60 host genes and found a very high proportion of the large introns for snoRNA or other small ncRNA coding. The size variations of the snoRNA-containing introns are mainly due to different numbers of snoRNAs inside them, reflecting a compact and informative intron structure. These results demonstrate the highly efficient utilization of large introns and their sequences for functional RNA coding in the host genes and provide important clues for further research of other ncRNA genes in the large introns of the Drosophila genome.

MATERIALS AND METHODS

Computational search for snoRNA genes in the Drosophila genomic database

The D. melanogaster genome scaffolds available at the Flybase database were searched for potential box C/D snoRNAs in the following ways. (1) A eukaryote snoRNA search program (Chen et al. 2003) was used to identify putative snoRNA genes with box C/D, a terminal stem with at least three base pairings and, in most cases, an rRNA complementary sequence. (2) Flanking sequences (about 1 kb) of the snoRNA candidates and all intronic sequences of the Drosophila ribosomal protein genes, which are available at the Ribosomal Protein Gene Database (http://ribosome.miyazaki-med.ac.jp), were also examined for other possible box C/D snoRNAs. The sequences were also analyzed for additional noncanonical C/D candidates. (3) BLAST (Altschul et al. 1990) and FASTA (Pearson and Lipman 1998) programs were used to find gene variants of novel snoRNA genes and establish the sum of snoRNA isoforms. Sequence alignment of snoRNA isoforms was performed with Clustal X 1.8 and DNAstar packages.

All intronic sequences of box H/ACA snoRNA host genes and ribosomal protein genes in the Drosophila genome were obtained from the Flybase database and the Ribosomal Protein Gene Database. The novel box H/ACA snoRNA candidates were identified using our computer program, which takes into account both the sequence motifs and secondary structures in the snoRNAs (Huang et al. 2004). Identification of the gene variants and sequence alignment were performed as above.

Construction and screening of cDNA libraries and RNA analyses

Fresh wild-type D. melanogaster larvae were cultured and collected for RNA extraction. Total cellular RNA was isolated and purified according to the method of guanidine thiocyanate/ phenol-chloroform (Chomczynski and Sacchi 1987).

An aliquot of 50 μg total cellular RNA was polydenylated using poly(A) polymerase (Takara). Synthesis of the first strand of cDNA was performed with 25 μg of poly(A)+-tailed RNA in a 20-μL reaction mix containing 200 U of MMLV reverse transcriptase (Promega) and 0.5 μg of primer oligodT23 for 45 min at 42°C. The reaction mixture was separated on a denaturing 8% polyacrylamide gel (8 M urea, 1× TBE buffer). cDNAs with sizes ranging from 60 to 120 nt for box C/D cDNA library and ranging from 120 to 180 nt for box H/ACA cDNA library were excised and eluted from the gel. cDNAs were tailed with poly(dG) at the 3′ end by using terminal deoxynucleotidyl transferase (Takara), and then amplified by PCR with primers Hin dIII(T)16 and Bam HI(C)16, and cloned into plasmid pTZ18 as described previously (Zhou et al. 2002). The two cDNA libraries were screened by PCR with the P47 and P48 universal primer pair. Only the recombinant plasmids carrying fragments of the expected size were selected for sequencing, which was performed with an automatic DNA sequencer (Applied Biosystems, 377) using the Big Dye Deoxy Terminator cycle-sequencing kit (Applied Biosystems).

An aliquot of 30 μg total RNA was analyzed by electrophoresis on 8% acrylamide/7 M urea gels. Electrotransfer onto nylon membrane (Hybond-N+; Amersham) was followed by UV irradiation for 5 min. Hybridization with 5′-labeled probes was performed as previously described (Zhou et al. 2002).

Oligodeoxynucleotides

Oligonucleotides were synthesized and purified by Sangon Co. The sequences of oligonucleotide probes used for Northern blotting and reverse transcription and oligonucleotide primers used for cDNA libraries construction and screening are shown in Supplementary Table S2. The primers and probes used in reverse transcription and Northern blotting were 5′-end labled with [γ-32P] ATP (Yahui Co.) and submitted to purification according to standard laboratory protocols as previously described (Sambrook et al. 1989).

Database accession codes

All novel snoRNA gene and dUHG host gene sequences identified in this study have been deposited in the EMBL and GenBank databases. Accession numbers are shown in Table 33 and Supplementary Table S1.

Supplementary materials

Supplementary materials are available upon request (send an e-mail message containing the keyword “Drosophila snoRNA” to nc.ude.usz@40crbsl).

Acknowledgments

We thank Xiao-Hong Chen for her technical assistance and Quan-Shen Du for the D. melanogaster culture. We thank Professor Mohsen Ghadessy and Dr. Roxana S. Ghadessy for improving the text. This research is supported by the National Natural Science Foundation of China (key project 30230200) and the Program for Changjiang Scholars and Innovative Research Team in University from the Ministry of Education of China.

Notes

Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.2380905.

REFERENCES

  • Accardo, M.C., Giordano, E., Riccardo, S., Digilio, F.A., Iazzetti, G., Calogero, R.A., and Furia, M. 2004. A computational search for box C/D snoRNA genes in the Drosophila melanogaster genome. Bioinformatics 20: 3293–3301. [PubMed]
  • Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403–410. [PubMed]
  • Antoine, M. and Kiefer, P. 1998. Functional characterization of transcriptional regulatory elements in the upstream region and intron 1 of the human S6 ribosomal protein gene. Biochem. J. 336: 327–335. [PMC free article] [PubMed]
  • Aravin, A.A., Lagos-Quintana, M., Yalcin, A., Zavolan, M., Marks, D., Snyder, B., Gaasterland, T., Meyer, J., and Tuschl, T. 2003. The small RNA profile during Drosophila melanogaster development. Dev. Cell 5: 337–350. [PubMed]
  • Bachellerie, J.P., Cavaille, J., and Qu, L.H. 2000. Nucleotide modifications of eukaryotic rRNAs: The world of small nucleolar RNA guides revisited. In The ribosome: Structure, function, antibiotics and cellular interactions (eds. R.A. Garrett et al.), pp. 191–203. ASM Press, Washington, DC.
  • Bachellerie, J.P., Cavaille, J., and Huttenhofer, A. 2002. The expanding snoRNA world. Biochimie 84: 775–790. [PubMed]
  • Balakin, A.G., Smith, L., and Fournier, M.J. 1996. The RNA world of the nucleolus: Two major families of small nucleolar RNAs defined by different box elements with related functions. Cell 86: 823–834. [PubMed]
  • Bergman, C.M. and Kreitman, M. 2001. Analysis of conserved non-coding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 11: 1335–1345. [PubMed]
  • Brown, J.W., Echeverria, M., and Qu, L.H. 2003. Plant snoRNAs: Functional evolution and new modes of gene expression. Trends Plant Sci. 8: 42–49. [PubMed]
  • Cavaille, J., Buiting, K., Kiefmann, M., Lalande, M., Brannan, C.I., Horsthemke, B., Bachellerie, J.P., Brosius, J., and Huttenhofer, A. 2000. Identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an unusual genomic organization. Proc. Natl. Acad. Sci. 97: 14311–14316. [PMC free article] [PubMed]
  • Cavaille, J., Seitz, H., Paulsen, M., Ferguson-Smith, A.C., and Bachellerie, J.P. 2002. Identification of tandemly-repeated C/D snoRNA genes at the imprinted human 14q32 domain reminiscent of those at the Prader-Willi/Angelman syndrome region. Hum. Mol. Genet. 11: 1527–1538. [PubMed]
  • Chen, C.L., Liang, D., Zhou, H., Zhou, M., Chen, Y.Q., and Qu, L.H. 2003. The high diversity of snoRNAs in plants: Identification and comparative study of 120 snoRNA genes from Oryza sativa. Nucleic Acids Res. 31: 2601–2613 [PMC free article] [PubMed]
  • Chomczynski, P. and Sacchi, N. 1987. Single-step method of RNA isolation by acid guanidinium thiocyanate-phenol-chloroform extraction. Anal. Biochem. 162: 732–735. [PubMed]
  • Clouet d’Orval, B., Bortolin, M.L., Gaspin, C., and Bachellerie, J.P. 2001. Box C/D RNA guides for the ribose methylation of archaeal tRNAs. The tRNATrp intron guides the formation of two ribose-methylated nucleosides in the mature tRNATrp. Nucleic Acids Res. 29: 4518–4529. [PMC free article] [PubMed]
  • Dunbar, D.A., Chen, A.A., Wormsley, S., and Baserga, S.J. 2000. The genes for small nucleolar RNAs in Trypanosoma brucei are organized in clusters and are transcribed as a polycistronic RNA. Nucleic Acids Res. 28: 2855–2861. [PMC free article] [PubMed]
  • Flam, F. 1994. Hints of a language in junk DNA. Science 266: 1320. [PubMed]
  • Ganot, P., Bortolin, M.L., and Kiss, T. 1997. Related site-specific pseudouridine formation in preribosomal RNA is guided by small nucleolar RNAs. Cell 89: 799–809. [PubMed]
  • Hirose, T. and Steitz, J.A. 2001. Position within the host intron is critical for efficient processing of box C/D snoRNAs in mammalian cells. Proc. Natl. Acad. Sci. 98: 12914–12919. [PMC free article] [PubMed]
  • Huang, Z.P., Zhou, H., Liang, D., and Qu, L.H. 2004. Different expression strategy: Multiple intronic gene clusters of box H/ACA snoRNA in Drosophila melanogaster. J. Mol. Biol. 341: 669–683. [PubMed]
  • Kiss, T. 2002. Small nucleolar RNAs: An abundant group of noncoding RNAs with diverse cellular functions. Cell 109: 145–148. [PubMed]
  • Kruszka, K., Barneche, F., Guyot, R., Ailhas, J., Meneau, I., Schiffer, S., Marchfelder, A., and Echeverria, M. 2003. Plant dicistronic tRNA-snoRNA genes: A new mode of expression of the small nucleolar RNAs processed by RNase Z. EMBO J. 22: 621–632. [PMC free article] [PubMed]
  • Lai, E.C., Tomancak, P., Williams, R.W., and Rubin, G.M. 2003. Computational identification of Drosophila microRNA genes. Genome Biol. 4: R42. [PMC free article] [PubMed]
  • Leader, D.J., Sanders, J.F., Turnbull-Ross, A., Waugh, R., and Brown, J.W. 1995. Genomic organisation of plant U14 snoRNA genes. Biochem. Soc. Trans. 23: 314S. [PubMed]
  • Liang, D., Zhou, H., Zhang, P., Chen, Y.Q., Chen, X., Chen, C.L., and Qu, L.H. 2002. A novel gene organization: Intronic snoRNA gene clusters from Oryza sativa. Nucleic Acids Res. 30: 3262–3272. [PMC free article] [PubMed]
  • Logsdon Jr., J.M. 1998. The recent origins of spliceosomal introns revisited. Curr. Opin. Genet. Dev. 8: 637–648. [PubMed]
  • Lowe, T.M. and Eddy, S.R. 1999. A computational screen for methylation guide snoRNAs in yeast. Science 283: 1168–1171. [PubMed]
  • Maxwell, E.S. and Fournier, M.J. 1995. The small nucleolar RNAs. Annu. Rev. Biochem. 35: 897–934. [PubMed]
  • Meredith, J. and Storti, R.V. 1993. Developmental regulation of the Drosophila tropomyosin II gene in different muscles is controlled by muscle-type-specific intron enhancer elements and distal and proximal promoter control elements. Dev. Biol. 159: 500–512. [PubMed]
  • News Staff. 2004. Breakthrough of the year: The runners-up. Science. 306: 2013–2017. [PubMed]
  • Nixon, J.E., Wang, A., Morrison, H.G., McArthur, A.G., Sogin, M.L., Loftus, B.J., and Samuelson, J.A. 2002. Spliceosomal intron in Giardia lamblia. Proc. Natl. Acad. Sci. 99: 3701–3705. [PMC free article] [PubMed]
  • Ofengand, J. and Bakin, A. 1997. Mapping to nucleotide resolution of pseudouridine residues in large subunit ribosomal RNAs from representative eukaryotes, prokaryotes, archae-bacteria, mitochondria and chloroplasts. J. Mol. Biol. 266: 246–268. [PubMed]
  • Pearson, W.R. and Lipman, D.J. 1998. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. 85: 2444–2448. [PMC free article] [PubMed]
  • Qu, L.H., Henras, A., Lu, Y.J., Zhou, H., Zhou, W.X., Zhu, Y.Q., Zhao, J., Henry, Y., Caizergues-Ferrer, M., and Bachellerie, J.P. 1999. Seven novel methylation guide small nucleolar RNAs are processed from a common polycistronic transcript by Rat1p and RNase III in yeast. Mol. Cell. Biol. 19: 1144–1158. [PMC free article] [PubMed]
  • Qu, L.H., Meng, Q., Zhou, H., and Chen, Y.Q. 2001. Identification of 10 novel snoRNA gene clusters from Arabidopsis thaliana. Nucleic Acids Res. 29: 1623–1630. [PMC free article] [PubMed]
  • Russell, A.G., Schnare, M.N., and Gray, M.W. 2004. Pseudouridine-guide RNAs and other Cbf5p-associated RNAs in Euglena gracilis. RNA 10: 1034–1046. [PMC free article] [PubMed]
  • Sambrook, J., Fritsch, E.F., and Maniatis, T. 1989. Molecular cloning: A laboratory manual, 2d ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
  • Smith, C.M. and Steitz, J.A. 1997. Sno storm in the nucleolus: New roles for myriad small RNPs. Cell 89: 669–672. [PubMed]
  • Standiford, D.M., Sun, W.T., Davis, M.B., and Emerson Jr., C.P. 2001. Positive and negative intronic regulatory elements control muscle-specific alternative exon splicing of Drosophila myosin heavy chain transcripts. Genetics 157: 259–271. [PMC free article] [PubMed]
  • Stolc, V., Gauhar, Z., Mason, C., Halasz, G., van Batenburg, M.F., Rifkin, S.A., Hua, S., Herreman, T., Tongprasit, W., Barbano, P.E., et al. 2004. A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 306: 655–660. [PubMed]
  • Tycowski, K.T. and Steitz, J.A. 2001. Non-coding snoRNA host genes in Drosophila: Expression strategies for modification guide snoRNAs. Eur. J. Cell. Biol. 80: 119–125. [PubMed]
  • Tycowski, K.T., Shu, M.D., and Steitz, J.A. 1996. A mammalian gene with introns instead of exons generating stable RNA products. Nature 379: 464–466. [PubMed]
  • Tycowski, K.T., You, Z.H., Graham, P.J., and Steitz, J.A. 1998. Modification of U6 spliceosomal RNA is guided by other small RNAs. Mol. Cell 2: 629–638. [PubMed]
  • Tycowski, K.T., Alar, A., and Steitz, J.A. 2004. Guide RNAs with 5′ caps and novel box C/D snoRNA-like domains for modification of snRNAs in metazoa. Curr. Biol. 14: 1985–1995. [PubMed]
  • Yu, J., Yang, Z., Kibukawa, M., Paddock, M., Passey, D.A., and Wong, G.K. 2002. Minimal introns are not “junk.” Genome Res. 12: 1185–1189. [PMC free article] [PubMed]
  • Yuan, G., Klambt, C., Bachellerie, J.P., Brosius, J., and Huttenhofer, A. 2003. RNomics in Drosophila melanogaster: Identification of 66 candidates for novel non-messenger RNAs. Nucleic Acids Res. 31: 2495–2507. [PMC free article] [PubMed]
  • Zhou, H., Chen, Y.Q., Du, Y.P., and Qu, L.H. 2002. The Schizosac-charomyces pombe mgU6–47 gene is required for 2′-O-methylation of U6 snRNA at A41. Nucleic Acids Res. 30: 894–902. [PMC free article] [PubMed]

Articles from RNA are provided here courtesy of The RNA Society
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...