![]() | ![]() |
Formats:
|
||||||||||||||||||||||
Copyright © 2009 The Author(s) Mining small RNA sequencing data: a new approach to identify small nucleolar RNAs in Arabidopsis 1Institute of Plant and Microbial Biology, Academia Sinica, 2Molecular and Biological Agricultural Sciences Program, Taiwan International Graduate Program, National Chung-Hsing University and Academia Sinica, Taipei, 11529 and 3Graduate Institute of Biotechnology and Department of Life Sciences, National Chung-Hsing University, Taichung, 402, Taiwan *To whom correspondence should be addressed. Tel/Fax: Phone: 886 2 27871178; Email: shuwu/at/gate.sinica.edu.tw Received December 15, 2008; Revised February 20, 2009; Accepted March 23, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Small nucleolar RNAs (snoRNAs) are noncoding RNAs that direct 2′-O-methylation or pseudouridylation on ribosomal RNAs or spliceosomal small nuclear RNAs. These modifications are needed to modulate the activity of ribosomes and spliceosomes. A comprehensive repertoire of snoRNAs is needed to expand the knowledge of these modifications. The sequences corresponding to snoRNAs in 18–26-nt small RNA sequencing data have been rarely explored and remain as a hidden treasure for snoRNA annotation. Here, we showed the enrichment of small RNAs at Arabidopsis snoRNA termini and developed a computational approach to identify snoRNAs on the basis of this characteristic. The approach successfully uncovered the full-length sequences of 144 known Arabidopsis snoRNA genes, including some snoRNAs with improved 5′- or 3′-end annotation. In addition, we identified 27 and 17 candidates for novel box C/D and box H/ACA snoRNAs, respectively. Northern blot analysis and sequencing data from parallel analysis of RNA ends confirmed the expression and the termini of the newly predicted snoRNAs. Our study especially expanded on the current knowledge of box H/ACA snoRNAs and snoRNA species targeting snRNAs. In this study, we demonstrated that the use of small RNA sequencing data can increase the complexity and the accuracy of snoRNA annotation. INTRODUCTION Modifications of the noncoding RNAs, ribosomal RNAs (rRNAs) and spliceosomal small nuclear RNAs (snRNAs) are thought to influence RNA folding and/or their interactions with proteins for fine-tuning the activity of ribosomes and spliceosomes. rRNAs contain numerous modified nucleotides, of which some are conserved in eukaryotes (1). Two prevalent rRNA modifications are 2′-O-methylation at riboses and pseudouridylation of uridines. These two types of modifications are directed by two groups of small nucleolar RNAs (snoRNAs), box C/D and box H/ACA snoRNAs. Box C/D snoRNAs have two conserved motifs, boxes C and D at the 5′ and 3′ ends, respectively. The two motifs are brought together by a 3- to 4-bp terminal stem. The motifs (box C/D) and the stem together form a kink-turn structure (2). However, box H/ACA snoRNAs have two hairpins linked by a hinge and a short tail at the 3′ end. Box H is located at the hinge, and box ACA is usually 3 nt upstream of the 3′ terminus (3).Box C/D snoRNAs guide 2′-O-methylation, whereas box H/ACA snoRNAs direct pseudouridylation, both through site-specific base-pairing of rRNAs and antisense elements on snoRNAs (4). In addition to rRNAs, snoRNAs guide modifications of snRNAs. snoRNAs received this nomenclature by their ability to guide modification of rRNA and U6 in the nucleolus. However, some reports have indicated that modifications of RNA polymerase II-transcribed snRNAs occur at the nucleoplasmic Cajal body in vertebrates by a group of small Cajal body-specific RNAs (scaRNAs) (5). Whether plants also adopt scaRNAs for the modification of some snRNAs remains to be studied. To expand the understanding of the rRNA/snRNA modifications, a comprehensive snoRNA repertoire must be built by identifying additional snoRNAs. To date, the identification of snoRNAs has been largely achieved by conventional cloning-sequencing and computational prediction based on primary genomic sequences. The cloning-sequencing approach often lacks comprehensiveness and entails technical constrains. For example, snoRNAs identified by conventional sequencing sometimes lack well-defined termini (6). With the increasing availability of genome sequence information, multiple computational programs have been developed to predict box C/D and box H/ACA snoRNAs and have revealed many snoRNAs in diverse species (7–12). However, computational snoRNA prediction is usually restricted to snoRNAs with known predicted targets, their secondary structures or their sequences being conserved among species. A revolutionary approach is needed for the discovery of species-specific snoRNAs and snoRNAs with noncanonical targets. Next-generation sequencing technologies have became powerful tools for functional genomics research (13). High-throughput sequencing of short RNA fragments (18–26 nt) or RNA ends has greatly facilitated the discovery of Arabidopsis interfering RNAs and their targets (14–17). Although interfering RNAs contributed most of small RNA sequence data, a small proportion of small RNA fragments were derived from rRNAs, transfer RNAs, snRNAs and snoRNAs (17). Since these noncoding RNAs are usually longer than 60 bases, small RNA fragments from these transcripts likely result from RNA degradation processes and are usually discarded without further analysis. However, a recent discovery of a snoRNA-derived microRNA (miRNA) suggests that production of some small RNA fragments from these long noncoding RNAs may be through specific biogenesis pathways (18). It is thus worthwhile to further explore the hidden information in these small RNA data.Here, by mining next-generation sequencing data, we show enriched small RNA fragments at the snoRNA termini and describe a computational approach to identify both box C/D and box H/ACA snoRNAs on the basis of this feature. In addition to revising the sequences of 48 known snoRNA transcripts, we used this approach to identify 44 novel snoRNAs. Newly predicted snoRNAs are supported by their conserved structures, conserved target sites on rRNAs and snRNAs or their expression by alternative approaches. This work presents an additional application of small RNA sequencing data in the annotation of noncoding RNAs other than interfering RNAs and further reveals the complexity of snoRNAs in Arabidopsis. MATERIALS AND METHODS Sequence data sets used in this study Small RNA sequencing data obtained from various Arabidopsis genotypes, tissues and platforms were collected from the following public databases (Supplementary Table 1). Small RNAs cloned from Col-0 and mutants defective in small regulatory RNA pathways were downloaded from the Arabidopsis Small RNA Project database (ASRP, http://asrp.cgrb.oregonstate.edu/) (16,19,20). Small RNA data generated by the studies of Rajagopalan et al. and Axtell et al. (17,21) were retrieved from NCBI Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/). Thirteen small RNA libraries contributed by three separate studies were obtained from the Arabidopsis SBS database (http://mpss.udel.edu/at_sbs/) (14,22,23). Small RNAs ≥17 nt were pooled and mapped to Arabidopsis genome sequences released by the Arabidopsis Information Resource in 2004 (TAIR, http://www.arabidopsis.org/).Known Arabidopsis snoRNA sequences were collected from the Scottish Crop Research Institute Plant snoRNA database (http://bioinf.scri.sari.ac.uk/cgi-bin/plant_snorna/home) (24), TAIR and GenBank (http://www.ncbi.nlm.nih.gov/Genbank/). The sequences overlapping with transposons annotated in TAIR8 were removed from the known snoRNA data set. Data of parallel analysis of RNA ends (PARE) for Col-0 were downloaded from the Arabidopsis PARE website (http://mpss.udel.edu/at_pare/) (14). Box C/D snoRNA prediction We searched for small RNAs containing a box C motif and looked for downstream small RNAs that contain a box D motif and can form a 3–4-bp terminal stem with the upstream small RNAs. The box C should be located 4–5-nt downstream of the 5′ start of small RNAs, whereas the box D should be located 3–5-nt upstream of the 3′ terminus of small RNAs. The region defined from the start of a box C containing small RNA to the end of a box D containing small RNA should range from 65 to 300 nt and was further analyzed by the following criteria. First, the numbers of distinct small RNAs mapped to the 5′ and 3′ regions were denoted as N5 and N3. Second, the sum of N5 and N3 should be ≥3. Third, the number of distinct small RNAs mapped to the antisense strand of this region was indicated as Nas. Fourth, to examine the enrichment of small RNAs at both termini, we calculated the reads (Xinner) of small RNAs mapped to the positions ≤5 nt to each terminus and the reads (Xouter) of small RNAs mapped to the positions >5 and <19 nt from each terminus. The enrichment index (E) was calculated as E = 0-(Xouter/Xinner).From our analysis of small RNAs, we observed two characteristics associated with most of the known snoRNAs. First, the number of antisense small RNAs (Nas) does not exceed that of small RNAs at termini (N5 + N3–Nas ≥ 0). Second, small RNAs show at least 2-fold enrichment at their termini. Thus, E should be ≥−0.5. When no weighting of E was applied, a cutoff score of –0.5 (N5 + N3 + E–Nas) recovered 114 of the known ‘subgroup I’ box C/D snoRNA loci (see below), as well as 14 transposon loci. To discriminate snoRNA loci from transposons, which are also rich sources for small RNAs, we applied increased weighting for the enrichment index E. When the weighting of E between 1 and 30 was empirically tested, with a weighting of 10, the 114 known snoRNA loci were retained, but transposon loci were reduced to two. Thus, a score calculated by the following equation was used to evaluate each candidate genomic region for its potential as a box C/D snoRNA locus. For overlapping snoRNA candidates, the one with the best score was selected for each region.
Box C/D snoRNAs were predicted separately in three subgroups on the basis of the motif sequences of boxes C and D described as follows. The sixth base of box C and the first base of box D in each subgroup are complementary.
The cutoff scores were determined by rounding the lowest score awarded for the known snoRNA that gave the best separation of known snoRNAs and transposons in each corresponding subgroup. R indicates a purine base and N indicates any base. Parentheses mean either one of the letters separated by vertical lines within the parentheses. Candidates overlapping with transposons or containing highly repeated small RNAs (>100 genomics hits) were filtered out. The computational program is available upon request. Box H/ACA snoRNA prediction A 3′ end supported by at least two distinct small RNAs containing an A(A|T|C)A motif located 3–4-nt upstream of the 3′ terminus was selected for further analysis. A region 110–300-nt upstream of a selected 3′ end was searched for a potential 5′ start supported by at least two distinct small RNAs. The definitions of N5, N3 and Nas are as described in the prediction of box C/D snoRNAs. To examine the enrichment of small RNAs at both ends, we calculated the reads (Xinner) of small RNAs mapped to the positions ≤3 nt from each end and the reads (Xouter) of small RNAs mapped to the positions >3 and <10 nt from each terminus. A score calculated by the following equation was used to evaluate each candidate genomic region for its potential as a box H/ACA snoRNA locus.
A weighting of E between 1 and 50 was empirically tested in conjunction with the cutoff ScoreHACA set between 0 and 6. A combination of 30-fold E weighting and a cutoff score ≥6 retained 20 of 43 known box H/ACA snoRNAs. This process also reduced candidate sequences to 67 for manual structure examination with use of mfold v. 3.2 (http://mfold.bioinfo.rpi.edu/cgi-bin/rna-form1.cgi) (25). Candidates for box H/ACA snoRNA loci should have the folded hairpin-hinge-hairpin-tail structures being the best (lowest) free energy ones and should contain box H motifs (ANANNR) at the hinge region. The computational program is available upon request. snoRNA target prediction 25S rRNA and 5.8S rRNA sequences were extracted from GenBank accession X52320, and the 18S rRNA sequence was extracted from GenBank accession X16077. Except for U5, experimentally identified spliceosomal snRNA sequences were obtained from the Arabidopsis Splicing Related Gene database (ASRG, http://gremlin3dev.gdcb.iastate.edu/SRGD/ASRG/) (26). The U5 sequence was extracted from GenBank accession X13012. For box C/D snoRNAs, upstream sequences of box D or D′ were searched for complementarity to Arabidopsis rRNAs or snRNAs. Potential target sites should form at least 10-bp pairing with the 11-nt region located 1-nt upstream of box D or D′ of newly identified snoRNAs. No more than 1 G:U pair was allowed in the first 10 bp. Box D′ motifs could be CNGA or NTGA, and the distance of box D′ to both termini of box C/D snoRNAs had to be at least 25 nt. If more than one target site was predicted for one antisense element, only the best site was chosen for listing in Table 2. The presumptive nucleotides for 2′-O-methylation were those paired to the fifth nucleotide upstream of box D or D′ and were further examined for sequence conservation and experimental validation of 2′-O-methylation in humans and yeast. Data for human and yeast snoRNAs and RNA modification sites were extracted from snoRNA-LBME-db (http://www-snorna.biotoul.fr/) and the yeast snoRNA database at the University of Massachusetts-Amherst (http://people.biochem.umass.edu/sfournier/fournierlab/snornadb/) and shown as ‘Homology’ in Table 2 (27,28).
For box H/ACA snoRNAs, on the basis of their structures, pair sequences from internal loops in which the top-most nucleotide was located 13–16 nt upstream of box H or ACA were extracted and searched for complementarity to rRNAs and snRNAs. A potential pseudouridylation site, together with a downstream nucleotide, should be located at the top of the internal loops and flanked by a bipartite duplex of snoRNA and target sequences. The total length of a bipartite should be at least 9 bp and 3 bp for each stem in which no more than 1 G:U pair was allowed. If more than one target site was predicted for one antisense element, only the best site was chosen for listing in Table 3. The presumptive nucleotides for pseudouridylation were further examined for sequence conservation and experimental validation of pseudouridylation in humans and yeast as described above.
snoRNA northern blot analyses Total RNA was extracted from 10-day-old seedlings, rosette leaves of 4-week-old plants and flowers by use of TRIZOL reagent (Invitrogen, Carlsbad, CA, USA). Ten micrograms of total RNA was separated by 6% or 15% denaturing polyacrylamide TBE-Urea gels (Invitrogen) and transferred to Hybond-N+ membranes (GE Healthcare, Piscataway, NJ, USA) by use of a semidry transfer cell (Bio-Rad, Hercules, CA, USA). Membranes were UV-crosslinked and then baked at 80°C for 1 h. Antisense oligonucleotides complementary to predicted snoRNAs listed in Supplementary Table 2 were used as probes. The probes were end-labeled with [γ-32P]ATP by use of T4 polynucleotide kinase (New England Biolabs, Ipswich, MA, USA). Hybridization was performed at 42°C overnight after pre-hybridization with ULTRAhyb-Oligo buffer (Ambion, Austin, TX, USA) for at least 1 h. After two washes with 2× SSC and 0.1% SDS for 10 min each at room temperature and one wash with 0.1× SSC and 0.5% SDS for 1 min at 42°C, the membrane was exposed to Kodak BioMAX MS X-ray films for 1–3 days.RESULTS Enrichment of small RNA fragments at the termini of known snoRNAs To establish the sequence relationship of snoRNAs and snoRNA-derived small RNAs, we analyzed the position of small RNAs on 204 known snoRNAs in the Arabidopsis genome. We divided known snoRNAs into 5′-end, body and 3′-end regions. The 5′- and 3′-ends are genomic regions spanning 11 bases (–5 to +6) of the previously annotated snoRNA 5′ and 3′ termini (Figure 1
A close examination of the mapping positions of small RNAs revealed that those mapped to the 5′- and 3′-end regions of snoRNAs could accurately define the 5′ and 3′ termini of these snoRNAs. For instance, among 49 small RNA reads mapped to the 5′ end region of a box C/D snoRNA, U30, 45 reads began from the same position, which was 1-nt upstream (–1) from the reported U30 5′ terminus (Figure 1 Small RNAs also well supported the 3′ terminus of a known box H/ACA snoRNA, snoR74-2 (Figure 1 Prediction of snoRNAs by small RNA data The predominant occurrences of degraded small RNAs at the snoRNA termini prompted us to examine the possibility of using small RNA data for the annotation of snoRNAs similar to the use of expressed sequence tags to annotate genes. For this purpose, we developed a computational approach to identify snoRNAs by integrating the enriched behavior of small RNAs at snoRNA ends as described above (Figure 1 We divided the box C/D snoRNA prediction into three subgroups based on specific combinations of box C and box D. Boxes C and D are brought together by a terminal stem, and the sixth nucleotide of box C is opposite to the first nucleotide of box D (2). The pairing of these two nucleotides, together with neighboring pairings, form a short stem that is essential for the binding of nucleolar proteins (29). Currently, without the constrains of base-pairing of these two nucleotides, algorithms used to predict box C/D snoRNAs usually evaluate the box C and box D motifs independently. By considering the co-occurrence of these two base-pairing nucleotides in different subgroups (GC or AT pairs) as described in ‘Materials and Methods’ section, our algorithm may improve the sensitivity and specificity of box C/D snoRNA prediction. With our approach, we could identify 124 of 161 known box C/D snoRNAs and 27 candidates for novel box C/D snoRNAs (Tables 1and 2 and Supplementary Tables 3 and 4), which indicates the robustness of the small RNA-aided prediction methodology. Among the 37 known box C/D snoRNAs not identified by our approach, most lacked small RNA fragments mapped to their 5′ end and/or 3′ end. The under-representation of these small RNAs might be due to the higher stability or lower expression of these snoRNAs in the sequence libraries we used in the current analyses. The prediction results also amended the 5′ and/or 3′ termini of 30 known box C/D snoRNAs (Supplementary Table 3). The newly annotated sites deviated >3 nt from the termini reported previously.
Our small RNA-based approach also successfully identified 20 of 43 known box H/ACA snoRNAs (Table 1). We could not identify the remaining known box H/ACA snoRNAs by our approach mostly because of the absence or under-representation of small RNAs at snoRNA ends. The mapping of small RNAs successfully extended the 5′-end boundaries of 18 known H/ACA snoRNAs (Supplementary Table 3), which were originally found to lack complete 5′ ends (6). The resulting full-length snoRNA sequences are thus able to form intact hairpin-hinge-hairpin-tail structures (data not shown). Therefore, our method of defining the 5′ ends by small RNAs will greatly improve the current annotations of box H/ACA snoRNAs. Moreover, 17 candidates of novel box H/ACA snoRNAs were revealed by our prediction method (Supplementary Table 4) and showed structural resemblance to known box H/ACA snoRNAs (Figure 2 nt protruding from the 5′ hairpin (3).
Our results indicate that the methodology we developed could effectively identify both known and novel snoRNAs and could efficiently improve the annotation of snoRNAs. Validation of predicted snoRNAs by PARE data and northern blot analysis Two distinct approaches were used to verify the authenticity of snoRNAs predicted by our methodology. We first cross-examined our results of the 5′ termini of snoRNAs with a high-throughput data set of 5′ ends sequenced by PARE (14). PARE revealed the 5′ ends of uncapped transcripts and was first developed to identify miRNA targets. Although most plant pre-snoRNAs are independently transcribed by RNA polymerase II or excised from introns (30), the activity of endonucleases and exonucleases is required for the production of mature snoRNA 5′ ends in yeast (31,32). Therefore, similar to the cleavage products of miRNA target mRNAs, snoRNAs are likely to be cap-free and could be targets of PARE analyses. Indeed, we found that the PARE reads corresponding to the 5′ termini of many known snoRNAs could surpass those of known miRNA targets (data not shown). To validate whether our prediction results really reflect the 5′ termini of endogenous snoRNAs, we then extracted PARE reads at positions of snoRNA 5′ termini predicted by small RNAs and their neighboring sequences. The frequency of occurrence at each position was then plotted against the relative positions to the predicted 5′ terminus for each snoRNA. As shown in Figure 3
We also examined the sizes and expression of several newly predicted snoRNAs by northern blot analysis. The sizes of eight box C/D snoRNAs are consistent with those predicted by our computational analyses. All these snoRNAs could be detected in the tissues examined, including 10-day-old seedlings, 4-week-old leaves and flowers (Figure 4
Both PARE data and northern blot analysis verified the prediction of snoRNAs by small RNAs, which suggests that small RNA data can be used to precisely predict the 5′ end and full-length sizes of snoRNA species. Genome organization of loci encoding novel snoRNAs We next investigated whether the new snoRNA loci exhibit unique features in their genome organization. The results summarized in Tables 2and 3 indicate that 37 of 44 novel snoRNAs are produced from intergenic regions or introns, as most known Arabidopsis snoRNAs are. In some cases, snoRNAs reside in genes with related functionality for the modification of rRNAs or snRNAs. For example, previous results showed that two box C/D snoRNAs, U60.1 and U60.2, are located in the intron of two genes encoding fibrillarins, which are nucleolar proteins associated with box C/D snoRNAs (33). In our study, snoR140 is clustered with a known box H/ACA in the intron of a gene encoding an H/ACA ribonucleoprotein complex subunit 2-like protein (At5g08180). Moreover, a new box C/D snoRNA, snoR127, which was predicted to target the spliceosomal snRNA U2, is located in the intron of At5g27720, which encodes a protein similar to small nuclear ribonucleoprotein. In addition to the introns of genes involved in translation or splicing, novel snoRNAs are also found to locate in introns of genes encoding a nuclear DNA-binding protein and three unknown proteins. Five novel snoRNAs are located in the regions annotated as the 5′ untranslated region (UTR), the 3′ UTR, the coding region or the antisense strand of a coding gene. All these genes are annotated to encode unknown proteins, except for the one producing snoR126. The transcript of snoR126 is part of the 5′ UTR or the alternative-spliced intron of a gene encoding an ankyrin repeat family protein (At5g65860). Targets of novel snoRNAs Among 27 newly predicted box C/D snoRNAs, 21 were predicted to target rRNAs or snRNAs on the basis of the target prediction criteria described in ‘Materials and Methods’ section (Table 2). The methylation sites of some target sites are evolutionarily conserved and have been experimentally validated in yeast and/or humans. Five new box C/D snoRNAs were predicted to target spliceosomal snRNAs. Among them, snoR126 has dual antisense elements that may target two neighboring sites on the spliceosomal snRNA U6. Several snoRNAs with dual targets on the same rRNAs were proposed to interact with both sites simultaneously (34). snoR126 may be another example of concurrent targeting of a snoRNA to a single snRNA species. Two sites individually targeted by snoR125 and snoR127 on the spliceosomal snRNA U2 are conserved in humans and Arabidopsis. The modifications of these two sites have been experimentally validated in humans (35). However, a scaRNA, mgU2-19/30, is the guide RNA responsible for the methylation of these two sites (36). That snoR125 and snoR127 target U2 implies that they are potential scaRNAs in Arabidopsis. All, but one, novel box H/ACA snoRNAs were predicted to target rRNAs or snRNAs (Table 3). Roughly 60% of box H/ACA snoRNAs have potential targets for both antisense elements. This is similar to the previous finding that, for 50% of Arabidopsis box C/D snoRNA, both D and D′ have potential targets (34). The pseudouridylation of 13 predicted target sites have been experimentally validated in yeast and/or humans. Both newly predicted snoR103-2 and previously reported snoR103-1 were predicted to target two sites of snRNA U5 (Table 3) (6). Pseudouridylation of equivalent sites on human U5 was experimentally validated and predicted to be directed by three scaRNAs (37–39). Among them, human Ψ46/U5 is presumably directed by U85 and U89 (37,39), whereas Ψ43/U5 is directed by ACA57 (38). Interestingly, human U85 and U89 are composed of both box C/D and box H/ACA motifs while snoR103-1 and snoR103-2 contain only the box H/ACA motif. Arabidopsis snoR143 was predicted to direct U2 Ψ38 modification (Table 3). The equivalent site of human U2 was guided by a scaRNA ACA45 (38). A recent paper demonstrated that a small fraction of ACA45 is processed by Dicer to generate a small RNA with miRNA-like function (18). Such phenomenon was not observed for snoR143 in the current data sets we analyzed.
DISCUSSION Our analysis of more than 30 million reads of small RNA data revealed enriched small RNAs at termini of snoRNAs. By the use of a computational approach integrating this characteristic and conserved motifs/structures of snoRNAs, we could re-annotate 48 known snoRNAs and identify 44 novel snoRNAs in the Arabidopsis genome. The newly identified snoRNAs especially expand the knowledge of Arabidopsis box H/ACA snoRNAs, snoRNAs targeting snRNAs for modification and snoRNAs without canonical targets. This study describes the first application of small RNA data in the study of snoRNAs that are 70–300-nt long. With our successful application, small RNA data are new resources for the discovery and annotation of snoRNAs. Our work also demonstrates that, with appropriate mining tools, the analysis of small RNA data generated by next-generation sequencing will increase our understanding of longer noncoding RNAs in addition to miRNAs or siRNAs. The observation of small RNA fragments enriched at snoRNA termini raises the questions of how these small RNAs are generated and whether they have biological functions. snoRNA ends might be protected from degradation because they and/or their neighboring sequences, which contain conserved motifs, are bound by nucleolar proteins. Alternatively, there might be unknown RNA degradation or RNA processing pathways that prefer to direct endonucleolytic cleavage near snoRNA ends. For example, the production of miRNA-like small RNAs from a human box H/ACA snoRNA depends on Dicer, the RNase III enzyme responsible for the biogenesis of miRNAs and siRNAs (18). Further studies will help to determine whether small RNAs from Arabidopsis snoRNA termini are generated from similar pathways and have silencing activity similar to the miRNA-like small RNA from the human snoRNA. Similar to the use of expressed sequence tags to annotate genes, the use of small RNAs to identify snoRNAs may help uncover snoRNAs with atypical motifs or structures. Nevertheless, the abundance of snoRNA-derived small RNAs may heavily depend on the tissues sampled and the depth of small-RNA sequencing. To decrease the false-positive rate, our approach required at least two distinct small RNAs to support the potential termini of snoRNAs. Some known snoRNAs missed by our approach have only single small RNAs at each terminus or only small RNAs for one of the termini. This drawback can be overcome by integrating small RNA data with other snoRNA-predicting computational programs based mainly on genomic sequences. For example, single small RNA fragments can be used as seeds to initiate the computational prediction of snoRNAs. The relaxation of the number requirement of small RNAs at termini will likely increase the prediction sensitivity of our algorithm. The presence of small RNAs may replace the knowledge of known targets and evolutionary conservation, which was usually required in previous prediction programs. This information will especially improve the discovery of snoRNAs that do not target rRNAs and snRNAs or are species specific. The utilization of small RNA data may also allow for the search for noncanonical snoRNAs with less stringent thresholds for conserved motifs/structures. Our method could be easily adapted for the annotation of snoRNAs in species other than Arabidopsis. The 5′ termini of snoRNAs identified by small RNA data were validated by PARE data, which were generated from high-throughput sequencing of 5′ termini of transcripts lacking the 5′ cap (14). According to the experimental procedure in generating PARE data, the 5′ termini sequenced by PARE technology theoretically should come from transcripts with poly(A) tails (14). As mature snoRNAs lack poly(A) tails, PARE reads from 5′ ends of snoRNAs might have been generated from low-efficiency annealing of the dT priming oligo to the 3′- end of the RNA transcripts. Since the 5′ termini of most snoRNAs have been sequenced more than 50 times in the PARE data set, PARE data provide a great opportunity to study the maturation of snoRNA 5′ termini (Supplementary Table 5). The analysis of PARE reads and snoRNA genes may reveal clues to understanding how the snoRNA 5′ termini are defined and how precise the maturation is. Although PARE was first developed to identify miRNA targets, PARE data also facilitate the annotation of other non-coding RNAs and, potentially, their maturation process. Our results increase the number of snoRNAs targeting snRNAs from 4 to 12 and thus provide more candidates for the study of this group of snoRNAs (Tables 2and 3). The predicted target sites on Arabidopsis U2, U5 and U6 for newly identified snoRNAs have previously been validated in Vicia faba or Pisum sativum (40). Among these 12 snoRNAs, nine were predicted to target RNA polymerase II-transcribed snRNAs, U2 and U5, and are presumably localized in the Cajal body as human scaRNAs. However, the localization of plant scaRNAs and the determinants of their localization have not been well characterized. Moreover, our study did not identify any potential Arabidopsis scaRNAs with the signatures for both box C/D and box H/ACA scaRNAs as were previously described for U85, U87, U88 and U89 in humans (37,39). In contrast to the intergenic localization of most Arabidopsis snoRNAs targeting rRNAs, eight of the nine snoRNAs targeting U2 and U5 are located in genes (Tables 2and 3). This finding suggests that snoRNAs targeting snRNAs may evolve differently from those targeting rRNAs in Arabidopsis. Further investigation of snoRNAs with snRNA targets in other plant species such as rice and poplar may shed light on snoRNA evolution. Our current target prediction failed to yield rRNA or snRNA targets for six box C/D snoRNAs and one box H/ACA snoRNA. Although snoRNAs largely target rRNAs or snRNAs for modification, in rare cases, box C/D snoRNAs target tRNAs for methylation or pre-mRNA for alterative splicing (41,42). When including tRNAs in the target search for the newly identified box C/D snoRNAs without identified rRNA/snRNA targets, none of these snoRNAs could target tRNAs on the basis of the criteria we applied (data not shown). We have also performed genome-wide target prediction for all new box C/D snoRNAs. With our current criteria, hundreds to thousands of potential target sites throughout the genome could be identified (data not shown). More stringent criteria should be applied, but with only one preceding report of mRNA targeted by snoRNAs, deriving such criteria remains immature. Because of the decreasing cost and increasing throughput, next-generation sequencing is becoming a popular or even a regular approach to profile mRNAs or small RNAs in a genome-wide scale. RNA degradation is a crucial process to regulate RNA homeostasis within a cell. Therefore, degraded RNA fragments will be seen in any RNA sequence data set to various degrees. As demonstrated in this study, the mining of degraded RNA sequence data can provide unforeseen opportunities to study noncoding RNAs. SUPPLEMENTARY DATA Supplementary Data are available at NAR Online. FUNDING A research grant from Academia Sinica (to S-H Wu, Foresight Project L20-2). Funding for open access charge: Academia Sinica Grant 034002. Conflict of interest statement. None declared. [Supplementary Data]
ACKNOWLEDGEMENT We thank the Wu lab members for helpful discussions. REFERENCES 1. Maden BE. The numerous modified nucleotides in eukaryotic ribosomal RNA. Prog. Nucleic Acid Res. Mol. Biol. 1990;39:241–303. [PubMed] 2. Watkins NJ, Segault V, Charpentier B, Nottrott S, Fabrizio P, Bachi A, Wilm M, Rosbash M, Branlant C, Luhrmann R. A common core RNP structure shared between the small nucleoar box C/D RNPs and the spliceosomal U4 snRNP. Cell. 2000;103:457–466. [PubMed] 3. Ganot P, Caizergues-Ferrer M, Kiss T. The family of box ACA small nucleolar RNAs is defined by an evolutionarily conserved secondary structure and ubiquitous sequence elements essential for RNA accumulation. Genes Dev. 1997;11:941–956. [PubMed] 4. Bachellerie JP, Cavaille J, Huttenhofer A. The expanding snoRNA world. Biochimie. 2002;84:775–790. [PubMed] 5. Jady BE, Darzacq X, Tucker KE, Matera AG, Bertrand E, Kiss T. Modification of Sm small nuclear RNAs occurs in the nucleoplasmic Cajal body following import from the cytoplasm. EMBO J. 2003;22:1878–1888. [PubMed] 6. Marker C, Zemann A, Terhorst T, Kiefmann M, Kastenmayer JP, Green P, Bachellerie JP, Brosius J, Huttenhofer A. Experimental RNomics: identification of 140 candidates for small non-messenger RNAs in the plant Arabidopsis thaliana. Curr. Biol. 2002;12:2002–2013. [PubMed] 7. Yang JH, Zhang XC, Huang ZP, Zhou H, Huang MB, Zhang S, Chen YQ, Qu LH. snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res. 2006;34:5112–5123. [PubMed] 8. Schattner P, Decatur WA, Davis CA, Ares M, Jr, Fournier MJ, Lowe TM. Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. Nucleic Acids Res. 2004;32:4281–4296. [PubMed] 9. Schattner P, Barberan-Soler S, Lowe TM. A computational screen for mammalian pseudouridylation guide H/ACA RNAs. RNA. 2006;12:15–25. [PubMed] 10. Lowe TM, Eddy SR. A computational screen for methylation guide snoRNAs in yeast. Science. 1999;283:1168–1171. [PubMed] 11. Edvardsson S, Gardner PP, Poole AM, Hendy MD, Penny D, Moulton V. A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction. Bioinformatics. 2003;19:865–873. [PubMed] 12. Chen CL, Chen CJ, Vallon O, Huang ZP, Zhou H, Qu LH. Genomewide analysis of box C/D and box H/ACA snoRNAs in Chlamydomonas reinhardtii reveals an extensive organization into intronic gene clusters. Genetics. 2008;179:21–30. [PubMed] 13. Wold B, Myers RM. Sequence census methods for functional genomics. Nat. Methods. 2008;5:19–21. [PubMed] 14. German MA, Pillay M, Jeong DH, Hetawal A, Luo S, Janardhanan P, Kannan V, Rymarquis LA, Nobuta K, German R, et al. Global identification of microRNA-target RNA pairs by parallel analysis of RNA ends. Nat. Biotechnol. 2008;26:941–946. [PubMed] 15. Addo-Quaye C, Eshoo TW, Bartel DP, Axtell MJ. Endogenous siRNA and miRNA targets identified by sequencing of the Arabidopsis degradome. Curr. Biol. 2008;18:758–762. [PubMed] 16. Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, et al. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS ONE. 2007;2:e219. [PubMed] 17. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev. 2006;20:3407–3425. [PubMed] 18. Ender C, Krek A, Friedlander MR, Beitzinger M, Weinmann L, Chen W, Pfeffer S, Rajewsky N, Meister G. A human snoRNA with microRNA-like functions. Mol. Cell. 2008;32:519–528. [PubMed] 19. Howell MD, Fahlgren N, Chapman EJ, Cumbie JS, Sullivan CM, Givan SA, Kasschau KD, Carrington JC. Genome-wide analysis of the RNA-dependent RNA polymerase6/dicer-like4 pathway in Arabidopsis reveals dependency on miRNA- and tasiRNA-directed targeting. Plant Cell. 2007;19:926–942. [PubMed] 20. Kasschau KD, Fahlgren N, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Carrington JC. Genome-wide profiling and analysis of Arabidopsis siRNAs. PLoS Biol. 2007;5:e57. [PubMed] 21. Axtell MJ, Jan C, Rajagopalan R, Bartel DP. A two-hit trigger for siRNA biogenesis in plants. Cell. 2006;127:565–577. [PubMed] 22. Lister R, O'Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell. 2008;133:523–536. [PubMed] 23. Gregory BD, O'Malley RC, Lister R, Urich MA, Tonti-Filippini J, Chen H, Millar AH, Ecker JR. A link between RNA metabolism and silencing affecting Arabidopsis development. Dev. Cell. 2008;14:854–866. [PubMed] 24. Brown JW, Echeverria M, Qu LH, Lowe TM, Bachellerie JP, Huttenhofer A, Kastenmayer JP, Green PJ, Shaw P, Marshall DF. Plant snoRNA database. Nucleic Acids Res. 2003;31:432–435. [PubMed] 25. Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–3415. [PubMed] 26. Wang BB, Brendel V. The ASRG database: identification and survey of Arabidopsis thaliana genes involved in pre-mRNA splicing. Genome Biol. 2004;5:R102. [PubMed] 27. Piekna-Przybylska D, Decatur WA, Fournier MJ. New bioinformatic tools for analysis of nucleotide modifications in eukaryotic rRNA. RNA. 2007;13:305–312. [PubMed] 28. Lestrade L, Weber MJ. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 2006;34:D158–D162. [PubMed] 29. Watkins NJ, Dickmanns A, Luhrmann R. Conserved stem II of the box C/D motif is essential for nucleolar localization and is required, along with the 15.5K protein, for the hierarchical assembly of the box C/D snoRNP. Mol. Cell Biol. 2002;22:8342–8352. [PubMed] 30. Brown JW, Echeverria M, Qu LH. Plant snoRNAs: functional evolution and new modes of gene expression. Trends Plant Sci. 2003;8:42–49. [PubMed] 31. Chanfreau G, Legrain P, Jacquier A. Yeast RNase III as a key processing enzyme in small nucleolar RNAs metabolism. J. Mol. Biol. 1998;284:975–988. [PubMed] 32. Qu LH, Henras A, Lu YJ, Zhou H, Zhou WX, Zhu YQ, Zhao J, Henry Y, Caizergues-Ferrer M, Bachellerie JP. Seven novel methylation guide small nucleolar RNAs are processed from a common polycistronic transcript by Rat1p and RNase III in yeast. Mol. Cell Biol. 1999;19:1144–1158. [PubMed] 33. Barneche F, Steinmetz F, Echeverria M. Fibrillarin genes encode both a conserved nucleolar protein and a novel small nucleolar RNA involved in ribosomal RNA methylation in Arabidopsis thaliana. J. Biol. Chem. 2000;275:27212–27220. [PubMed] 34. Barneche F, Gaspin C, Guyot R, Echeverria M. Identification of 66 box C/D snoRNAs in Arabidopsis thaliana: extensive gene duplications generated multiple isoforms predicting new ribosomal RNA 2′-O-methylation sites. J. Mol. Biol. 2001;311:57–73. [PubMed] 35. Westin G, Lund E, Murphy JT, Pettersson U, Dahlberg JE. Human U2 and U1 RNA genes use similar transcription signals. EMBO J. 1984;3:3295–3301. [PubMed] 36. Tycowski KT, Aab A, Steitz JA. Guide RNAs with 5′ caps and novel box C/D snoRNA-like domains for modification of snRNAs in metazoa. Curr. Biol. 2004;14:1985–1995. [PubMed] 37. Darzacq X, Jady BE, Verheggen C, Kiss AM, Bertrand E, Kiss T. Cajal body-specific small nuclear RNAs: a novel class of 2′-O-methylation and pseudouridylation guide RNAs. EMBO J. 2002;21:2746–2756. [PubMed] 38. Kiss AM, Jady BE, Bertrand E, Kiss T. Human box H/ACA pseudouridylation guide RNA machinery. Mol. Cell Biol. 2004;24:5797–5807. [PubMed] 39. Jady BE, Kiss T. A small nucleolar guide RNA functions both in 2′-O-ribose methylation and pseudouridylation of the U5 spliceosomal RNA. EMBO J. 2001;20:541–551. [PubMed] 40. Massenet S, Mougin A, Branlant C. Posttranscriptional modifications in the U small nuclear RNAs. In: Grosjean H, Benne R, editors. Modification and Editing of RNA. Washington DC: ASM Press; 1998. pp. 201–227. 41. Clouet d'Orval B, Bortolin ML, Gaspin C, Bachellerie JP. Box C/D RNA guides for the ribose methylation of archaeal tRNAs. The tRNATrp intron guides the formation of two ribose-methylated nucleosides in the mature tRNATrp. Nucleic Acids Res. 2001;29:4518–4529. [PubMed] 42. Kishore S, Stamm S. The snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C. Science. 2006;311:230–232. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||
Prog Nucleic Acid Res Mol Biol. 1990; 39():241-303.
[Prog Nucleic Acid Res Mol Biol. 1990]Cell. 2000 Oct 27; 103(3):457-66.
[Cell. 2000]Genes Dev. 1997 Apr 1; 11(7):941-56.
[Genes Dev. 1997]Biochimie. 2002 Aug; 84(8):775-90.
[Biochimie. 2002]EMBO J. 2003 Apr 15; 22(8):1878-88.
[EMBO J. 2003]Curr Biol. 2002 Dec 10; 12(23):2002-13.
[Curr Biol. 2002]Nucleic Acids Res. 2006; 34(18):5112-23.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2004; 32(14):4281-96.
[Nucleic Acids Res. 2004]RNA. 2006 Jan; 12(1):15-25.
[RNA. 2006]Science. 1999 Feb 19; 283(5405):1168-71.
[Science. 1999]Nat Methods. 2008 Jan; 5(1):19-21.
[Nat Methods. 2008]Nat Biotechnol. 2008 Aug; 26(8):941-6.
[Nat Biotechnol. 2008]Curr Biol. 2008 May 20; 18(10):758-62.
[Curr Biol. 2008]PLoS One. 2007 Feb 14; 2(2):e219.
[PLoS One. 2007]Genes Dev. 2006 Dec 15; 20(24):3407-25.
[Genes Dev. 2006]PLoS One. 2007 Feb 14; 2(2):e219.
[PLoS One. 2007]Plant Cell. 2007 Mar; 19(3):926-42.
[Plant Cell. 2007]PLoS Biol. 2007 Mar; 5(3):e57.
[PLoS Biol. 2007]Genes Dev. 2006 Dec 15; 20(24):3407-25.
[Genes Dev. 2006]Cell. 2006 Nov 3; 127(3):565-77.
[Cell. 2006]Nucleic Acids Res. 2003 Jan 1; 31(1):432-5.
[Nucleic Acids Res. 2003]Nat Biotechnol. 2008 Aug; 26(8):941-6.
[Nat Biotechnol. 2008]Nucleic Acids Res. 2003 Jul 1; 31(13):3406-15.
[Nucleic Acids Res. 2003]Genome Biol. 2004; 5(12):R102.
[Genome Biol. 2004]RNA. 2007 Mar; 13(3):305-12.
[RNA. 2007]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D158-62.
[Nucleic Acids Res. 2006]Curr Biol. 2002 Dec 10; 12(23):2002-13.
[Curr Biol. 2002]Curr Biol. 2002 Dec 10; 12(23):2002-13.
[Curr Biol. 2002]Genes Dev. 2006 Dec 15; 20(24):3407-25.
[Genes Dev. 2006]Cell. 2000 Oct 27; 103(3):457-66.
[Cell. 2000]Mol Cell Biol. 2002 Dec; 22(23):8342-52.
[Mol Cell Biol. 2002]Curr Biol. 2002 Dec 10; 12(23):2002-13.
[Curr Biol. 2002]Genes Dev. 1997 Apr 1; 11(7):941-56.
[Genes Dev. 1997]Nat Biotechnol. 2008 Aug; 26(8):941-6.
[Nat Biotechnol. 2008]Trends Plant Sci. 2003 Jan; 8(1):42-9.
[Trends Plant Sci. 2003]J Mol Biol. 1998 Dec 11; 284(4):975-88.
[J Mol Biol. 1998]Mol Cell Biol. 1999 Feb; 19(2):1144-58.
[Mol Cell Biol. 1999]J Biol Chem. 2000 Sep 1; 275(35):27212-20.
[J Biol Chem. 2000]J Mol Biol. 2001 Aug 3; 311(1):57-73.
[J Mol Biol. 2001]EMBO J. 1984 Dec 20; 3(13):3295-301.
[EMBO J. 1984]Curr Biol. 2004 Nov 23; 14(22):1985-95.
[Curr Biol. 2004]J Mol Biol. 2001 Aug 3; 311(1):57-73.
[J Mol Biol. 2001]Curr Biol. 2002 Dec 10; 12(23):2002-13.
[Curr Biol. 2002]EMBO J. 2002 Jun 3; 21(11):2746-56.
[EMBO J. 2002]Mol Cell Biol. 2004 Jul; 24(13):5797-807.
[Mol Cell Biol. 2004]EMBO J. 2001 Feb 1; 20(3):541-51.
[EMBO J. 2001]Mol Cell. 2008 Nov 21; 32(4):519-28.
[Mol Cell. 2008]Nat Biotechnol. 2008 Aug; 26(8):941-6.
[Nat Biotechnol. 2008]EMBO J. 2002 Jun 3; 21(11):2746-56.
[EMBO J. 2002]EMBO J. 2001 Feb 1; 20(3):541-51.
[EMBO J. 2001]Nucleic Acids Res. 2001 Nov 15; 29(22):4518-29.
[Nucleic Acids Res. 2001]Science. 2006 Jan 13; 311(5758):230-2.
[Science. 2006]