• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Apr 2008; 18(4): 571–584.
PMCID: PMC2279245

Comparative analysis of the small RNA transcriptomes of Pinus contorta and Oryza sativa

Abstract

The diversity of microRNAs and small-interfering RNAs has been extensively explored within angiosperms by focusing on a few key organisms such as Oryza sativa and Arabidopsis thaliana. A deeper division of the plants is defined by the radiation of the angiosperms and gymnosperms, with the latter comprising the commercially important conifers. The conifers are expected to provide important information regarding the evolution of highly conserved small regulatory RNAs. Deep sequencing provides the means to characterize and quantitatively profile small RNAs in understudied organisms such as these. Pyrosequencing of small RNAs from O. sativa revealed, as expected, ~21- and ~24-nt RNAs. The former contained known microRNAs, and the latter largely comprised intergenic-derived sequences likely representing heterochromatin siRNAs. In contrast, sequences from Pinus contorta were dominated by 21-nt small RNAs. Using a novel sequence-based clustering algorithm, we identified sequences belonging to 18 highly conserved microRNA families in P. contorta as well as numerous clusters of conserved small RNAs of unknown function. Using multiple methods, including expressed sequence folding and machine learning algorithms, we found a further 53 candidate novel microRNA families, 51 appearing specific to the P. contorta library. In addition, alignment of small RNA sequences to the O. sativa genome revealed six perfectly conserved classes of small RNA that included chloroplast transcripts and specific types of genomic repeats. The conservation of microRNAs and other small RNAs between the conifers and the angiosperms indicates that important RNA silencing processes were highly developed in the earliest spermatophytes. Genomic mapping of all sequences to the O. sativa genome can be viewed at http://microrna.bcgsc.ca/cgi-bin/gbrowse/rice_build_3/.

In plants, small RNAs play an important role in transcriptional and post-transcriptional gene regulation (Bartel 2004) that include viral defense (Wang and Metzlaff 2005), silencing of transposable elements, and general heterochromatin maintenance (Herr 2005). The small RNAs produced by angiosperms such as Arabidopsis thaliana can be broadly classified by the mechanism of their maturation and ultimate function. The microRNAs (miRNAs) are cleaved from a stem–loop precursor molecule (Park et al. 2002) by the endonuclease DCL1 and are ~19–24 nt long (Bartel 2004; Jones-Rhoades and Bartel 2004). The other endogenous small RNAs, collectively termed siRNAs, derive from double-stranded RNA precursors that are processed by homologs of DCL2, DCL3, and DCL4 (Vazquez 2006). The heterochromatin siRNAs are a diverse set of 24-nt-long small RNAs that are processed by DCL3 from double-stranded RNA precursors produced by RDR2 (Xie et al. 2004). These RNAs are involved in heterochromatin formation and maintenance by directing sequence-specific DNA and histone methylation of transposable elements and some larger genomic loci (Pontier et al. 2005). Other 24-nt long siRNAs produced by DCL2 in A. thaliana can direct an initial cleavage of target transcripts, which are further cleaved into 21-nt siRNAs by DCL1 (Borsani et al. 2005). Finally, the trans-acting siRNAs (tasiRNAs), which are 21 nt long, are matured by a poorly understood mechanism involving DCL4. These tasiRNAs perform post-transcriptional gene silencing much like the miRNAs (Xie et al. 2004).

Identification of functional small RNAs in other plant species has, until recently, been accomplished by searching for homologous sequences in expressed sequence data (Zhang et al. 2006a) and genomic sequences (Bonnet et al. 2004) and has been, with a few exceptions (Williams et al. 2005; Talmor-Neiman et al. 2006), limited to the discovery of the more highly conserved families of miRNAs. Recent evidence suggests that the miRNA repertoire of any plant or animal species comprises a set of conserved ancient miRNAs as well as many recently evolved species-specific miRNAs (Lindow and Krogh 2005; Rajagopalan et al. 2006), which would elude detection by most comparative methods. As they are likely under relaxed selective constraint, the nonconserved miRNAs appear to be rapidly evolving (Rajagopalan et al. 2006; Fahlgren et al. 2007).

For this study, we chose to perform a deep sampling of small RNA sequences from the gymnosperm P. contorta accompanied by a lighter sampling of small RNA sequences from a previously studied angiosperm O. sativa to facilitate the direct comparison of small RNA populations between these two distantly related species. This choice was made based on a recent RNA silencing survey we performed across the vascular land plants (E.V. Dolgosheina, R.D. Morin, G. Aksay, S.C. Sahinalp, V. Magrini, E.R. Mardis, J. Mattsson, and P.J. Unrau, unpubl.). This survey indicated that the gymnosperms and in particular the conifers have an unusual RNA silencing signature relative to the angiosperms. Specifically, all conifers tested to date have failed to show appreciable amounts of 24-nt small RNA and instead produce substantial amounts of RNA that is exactly 21-nt long. The reason for this unusual change in RNA expression is not fully understood, but may be related to the presence of a new DCL family in the conifers that is suggested by an analysis of available ESTs from these plants. This difference together with the ~350 million years ago (Mya) divergence between conifers and angiosperms prompted us to perform a detailed comparison of the small RNAs in hope of elucidating evolutionarily important small RNA sequences in the plants.

Without experimental support for the existence of the mature miRNA molecule or direct homology with experimentally confirmed miRNAs, any in silico predictions of miRNAs generally do not qualify as candidates for submission to the microRNA registry miRBase (Griffiths-Jones 2006), thus remaining as predictions until they are ultimately sequenced or their expression confirmed by a hybridization-based method (Ambros et al. 2003). Some recent work in this field has attempted to improve miRNA annotation to better handle the type of data currently produced using high-throughput small RNA sequencing strategies (Rajagopalan et al. 2006; Johnson et al. 2007) accomplished by either massively parallel signature sequencing (Nakano et al. 2006), pyrosequencing (Yao et al. 2007), or Solexa sequencing (Morin et al. 2008). Rather than focusing on searching genomic and expressed sequences for miRNAs, the emerging challenge is to sort through diverse sequences from small RNA cDNA libraries and identify the functional classes including, but not limited to, miRNAs that are likely to be biologically significant.

Sequencing-based small RNA discovery produces hundreds of thousands of small RNA sequences with only a small fraction representing known miRNAs (Gustafson et al. 2005; Lu and Tei 2005; Rajagopalan et al. 2006; Johnson et al. 2007). The remaining sequences reveal complete or fragmented noncoding RNAs (ncRNAs; i.e., rRNA, tRNA, snoRNAs, and snRNAs) or messenger RNAs (mRNAs) in addition to diverse populations of siRNAs. With an annotated genome, the former can be readily identified based on their perfect alignment to genomic regions annotated as ncRNAs. Though little is known about the specific siRNAs of plants other than those of A. thaliana, conserved small RNAs that are observed in evolutionarily distant plant species will likely prove to be either siRNAs, miRNAs, or fragments of larger ncRNAs involved in important aspects of cellular regulation.

Results

Small RNA sequencing and sequence processing

A total of 142,493 and 11,436 sequence reads were obtained from the P. contorta and O. sativa libraries, respectively. After removing artifacts (see Methods), 130,998 (92%) and 11,329 (99%) sequences remained for analysis. The sequence artifacts include products of multiple adapter ligation or “empty” constructs, in which the two adaptors ligated to one another without containing a small RNA. There were a total of 58,466 (44.6% of data set) and 8615 (76.0% of data set) unique sequences in the P. contorta and O. sativa libraries, respectively. Of these, 11,375 P. contorta and 707 O. sativa sequences were counted at least twice in the libraries, leaving 47,091 and 7908 singletons (35.9% and 69.8% of the total sequences, respectively). This observation suggests that a huge diversity of small RNA sequences existed in each library.

For P. contorta, the lengths of small RNAs ranged from <11 nt to >30 nt, and the distribution of unique small RNA length is summarized in Figure 1 (black bars). The O. sativa sequences spanned the same range of lengths; however, the overall distribution of these lengths was strikingly different (Fig. 1, light gray bars), with one major peak at 24 nt and another minor peak at 21 nt. Of all the length classes, the 24-nt fraction was the most diverse in O. sativa, with only 305 (16.5%) of the 24-nt sequences sharing high sequence similarity with at least one other sequence in that population based on a clustering analysis (see Methods). The relative representation of 24-nt RNA in P. contorta was small (2.5%) in comparison to O. sativa and had a lower diversity, with 36% of the 24-nt sequences sharing high similarity with another sequence. In contrast, the 21-nt RNAs obtained from P. contorta were more diverse than the O. sativa 21-nt sequences, comprising a total of 29,924 unique sequences. This suggests an expansion of miRNA families in the gymnosperms, diversification of other 21-nt RNA producing pathways, or a functional replacement of most of the heterochromatin siRNAs by 21-nt sequences. In either case, this observation reveals a significant difference in the small RNA biogenesis pathways of the angiosperms and gymnosperms. The apparent absence of 24-nt small RNA in the P. contorta sequencing data is entirely consistent with our survey of the vascular plants, which found no evidence for 24-nt RNA expression in the conifers, as judged by the direct 5′ end labeling of total RNA extracts (E. Dolgosheina, R.D. Morin, G. Aksay, S.C. Sahinalp, V. Magrini, E.R. Mardis, J. Mattsson, and P.J. Unrau, unpubl.).

Figure 1.
Lengths of unique small RNA sequences from P. contorta (black bars, 58,466 sequences) and O. sativa (gray bars, 8615 sequences). The bulk of P. contorta small RNAs are 21 nt long with low variance (σ = 8.1). The rice sequences have a major peak ...

Genome mapping and small RNA annotation—O. sativa small RNAs

Of 8615 unique O. sativa small RNA sequences, 3814 had at least one perfect alignment in the O. sativa genome. Small RNA sequences were annotated as one of six broad groups based on their overlap with O. sativa genome annotations (Itoh et al. 2007) or their alignment to sequences in Rfam (Griffiths-Jones et al. 2003) (miRNA, repeat-derived siRNAs, tRNA, rRNA, snRNA/snoRNA, or un-annotated). The smallest group, which was excluded from further analysis, contained the small nuclear and small nucleolar RNAs (snRNAs and snoRNAs). This group comprised 63 sequences from O. sativa and 646 from P. contorta. While the majority of annotated sequences mapped to the O. sativa genome only once, subsets of small RNAs when separated by type mapped to the genome with interesting and distinct distributions (Fig. 2; Supplemental Fig. 1). Our classification supports the notion that the 21-nt fraction of the O. sativa small RNA sequences (Fig. 2B) includes members of 18 conserved miRNA families (summarized in Supplemental Table 1), whereas the 24-nt fraction was dominated by small RNAs derived from genomic repeats and intergenic regions (Fig. 2A,E), suggesting they are mainly acting as heterochromatin siRNAs. Many of the sequences classified here as miRNAs could not be unambiguously assigned to one miRNA gene, as members of many miRNA families share a common, or highly similar, mature miRNA sequence. For example, the sequence TGAAGCTGCCAGCATGATC could belong to any of nine current members of the MIR167 family. Further, there was not a direct one-to-one relationship between RNA sequences and miRNAs, with most of the miRNA genes apparently producing multiple variants (termed isomiRs here) that resulted from slight variability in the DCL cleavage sites or subsequent processing/degradation leading to removal of terminal nucleotides (example in Supplemental Fig. 2b).

Figure 2.
Length distribution of unique P. contorta (black bars) and O. sativa (gray bars) small RNAs sorted by class and that map perfectly to at least one genomic locus in the O. sativa genome. A total of 5129 unique P. contorta sequences mapped to the O. sativa ...

As the majority of O. sativa sequences were 24 nt in length, many of which were derived from genomic repeats, we assumed that the library contained a large proportion of siRNAs generated by a DCL3-mediated pathway similar to that in A. thaliana (Xie et al. 2004). In an attempt to identify genomic target sites of the O. sativa siRNAs, all small RNA sequences were aligned to the O. sativa genome allowing some degeneracy (Methods). This approach should highlight all sites in the genome to which a given small RNA could readily anneal; 6859 (80%) of the O. sativa sequences aligned at least once by this method. This suggested that many of the unmapped sequences were not the result of simple sequencing errors. Since this degenerate alignment method allowed for up to three mismatches, only reads with multiple errors or belonging to unsequenced regions of the genome should not be mapped. A histogram of small RNA alignment frequency for 21- and 24-nt small RNAs along the length of each of the 12 nuclear chromosomes is shown in Figure 3, demonstrating that the small RNA alignments were nonuniform. It is known that heterochromatin siRNAs can elicit DNA methylation and histone modification at partially complementary regions of chromosomal DNA at repetitive elements (Herr 2005) as well as rRNA loci (Xie et al. 2004). The alignment distribution allows a global visualization of the regions potentially targeted for modification by siRNAs. These sites include many of the rRNA clusters and most of the known centromeres, the latter of which are known to contain a specific repertoire of genomic repeats (Cheng et al. 2002).

Figure 3.
Degenerate alignment density of O. sativa 24- and 21-nt small RNAs to the nuclear genome. All distinct O. sativa small RNA sequences were aligned to all 12 chromosomes allowing degenerate alignments (see Methods). The 24-nt (positive axis) and 21-nt sequences ...

Partitioning small RNAs into clusters of similar sequences

Annotation of small RNAs from P. contorta was precluded by the lack of any genomic sequence. Further, the direct comparison of these sequences to known miRNAs allows the identification of only the highly conserved miRNA family members. Clustering of P. contorta and O. sativa small RNA sequences based on sequence similarity was performed to enable identification of related, but not identical, sequences. Also, we assume the observed low success rate in mapping O. sativa sequences to the parent O. sativa genome likely results from a combination of systematic and sporadic sequencing errors, polymorphisms, RNA editing, and errors in the reference genome. Clustering highly similar sequences provides a mechanism to resolve this issue. The most frequently observed sequences in a cluster are likely to represent real sequences, and the more rare variants likely represent sequencing errors or reads deriving from polymorphic regions. Any sequence that was not mapped to the O. sativa genome due to evolutionary divergence (for P. contorta sequences) or the aforementioned reasons (O. sativa sequences) could still be annotated based on its presence in a cluster with an annotated sequence. We performed sequence-based clustering of all O. sativa and P. contorta sequences along with those known miRNA sequences from miRBase (see Methods). The result was 4722 clusters, ranging in size from two sequences (2366 clusters) to 4386 sequences (one cluster). A total of 20,434 P. contorta (of 58,466 unique) and 3959 O. sativa (of 8615 unique) sequences reside in these clusters; 4511 of the clusters contained at least one P. contorta sequence, whereas only 547 contained at least one O. sativa sequence, with 373 clusters comprising sequences from both species. These clusters were therefore very likely to represent conserved classes of either small RNAs or recurrent degradation fragments of larger ncRNAs.

The quality of each cluster of sequences was assessed by calculating the mean information content of the multiple sequence alignment of its sequences (Schneider and Stephens 1990). The information content is a function of the entropy of each position in the alignment, with more insertions/deletions (indels) and discordant sites decreasing the mean information content of an aligned cluster. The mean information content of a cluster was roughly dependent on the number of sequences it contained (Supplemental Fig. 3), with some variability caused by incomplete overlap of aligned sequences and the number of insertions and deletions between sequences within a cluster. With this clustering method, sequences of known miRNAs generally clustered within their miRBase family. Supplemental Table 1 summarizes the conserved miRNA families represented by these clusters. In some cases, multiple families resided in the same cluster due to similarity between one or more sequences within the families. For example, one cluster contained sequences from MIR165 and MIR166, while another contained sequences from MIR319 and MIR159. This was not surprising as alignment of sequences between these families results in three or fewer mismatches. Homologous miRNA genes can result in the production of miRNAs that differ only at the 5′ or 3′ termini. As mentioned, many of the miRNAs have multiple isomiRs, reflecting variability in miRNA maturation or later processing steps. Our sequence-based clusters containing known miRNAs demonstrated overall higher information content (Supplemental Fig. 3b,c) than the clusters of the same size but comprising degradation products of larger ncRNAs. This is a reflection of the high sequence conservation of miRNAs within the same family as well as the presence of isomiRs with similar yet distinct sequences (Morin et al. 2008). These observations were the basis for our application of a machine learning method to predict candidate miRNA families from un-annotated clusters (discussed below).

Identifying miRNA candidates specific to P. contorta or O. sativa

A specific pre-miRNA secondary structure is necessary to establish a small RNA as a miRNA candidate. Owing to the lack of any Pinus genomic sequence, P. contorta small RNA sequences were aligned to EST sequences originating from Pinus taeda, a closely related species of pine, while O. sativa sequences were aligned to the publicly available O. sativa EST sequences. The number of UniGene clusters (Pontius et al. 2002) (~14,000) can be used as a rough estimate of the number of P. taeda genes currently represented by expressed sequences in GenBank. Hence, these cDNA and EST sequences (collectively referred to as ESTs) are not a full representation of the transcriptome as compared to some other EST libraries for gymnosperms (Quackenbush et al. 2001; Pavy et al. 2005). A total of 13,396 (10.2%) P. contorta small RNA sequences had perfect alignments to one or more P. taeda EST; 318 of the sequences had multiple semi-overlapping alignment regions with numerous distinct small RNAs, a signature that would suggest these derived from degradation of larger ncRNAs (see Supplemental Fig. 2a).

Some P. taeda and O. sativa small RNAs could be aligned to their parent EST sequences in only a few discrete regions, a characteristic shared with many of the known miRNA sequences in these libraries (see Supplemental Fig. 2b). Three hundred thirty-six EST sequences had small RNA alignment patterns that were deemed similar to pre-miRNAs and were carried forward for folding analysis (Methods). Along with candidate novel miRNA genes, this procedure correctly identified three sequence clusters belonging to known miRNA-families (MIR166, MIR396, and MIR159). The novel miRNA sequences as well as the EST sequence folding results are summarized in Table 1 and in the Supplemental tables. Two examples of these novel miRNAs are also included in Figure 4, and the supplements include the folded structure of all candidate pre-miRNAs and a sequence logo representing a consensus of each cluster of novel miRNA sequences. Potential homologs, noted in Figure 4, were identified by searching miRBase (v. 10.0) using the SSearch tool.

Table 1.
Small RNA clusters annotated as miRNA families by EST alignments and folding properties
Figure 4.
Summary of four novel miRNA clusters identified by two separate methods. Structures and sequence logos are shown for four novel miRNA clusters identified by either EST folding (A) or support vector machine (SVM) techniques (B). Sequence logos were generated ...

Not surprisingly, the putative miRNA families (clusters) identified in this process typically displayed above-average information content (Supplemental Fig. 3), following the trend of the known miRNAs previously identified. Two of the clusters identified as comprising miRNAs by alignment to P. taeda sequences also included small RNA sequences obtained from O. sativa, suggesting the identification of two novel conserved miRNA families by this process (Table 1). From the alignment of our small RNAs to O. sativa EST sequences, a few novel O. sativa miRNAs not found in the P. contorta library were also identified. Because this analysis was based on the data from the previous release of miRBase, these novel miRNAs can be compared to a more recent set of known miRNAs. Strikingly, a search of the current miRBase (v. 10.0) sequences reveals potential homologs for all but two of these sequences. Based on these results, this method appears to provide a fast and reliable way to identify putative novel miRNAs from this type of data using the a priori assumption of variable mature sequence length (isomiRs) and degradation of the remaining structural fragments of pre-miRNAs (Ambros et al. 2003).

Many clusters of P. contorta specific sequences could not be aligned to any P. taeda EST. This can be partially attributed to the limiting number of P. taeda ESTs currently available in GenBank. Identification of nonconserved P. contorta miRNAs in those remaining clusters relied on other characteristics besides the standard method of annotation (i.e., sequence folding and hairpin characterization). Our observation that miRNAs cluster to one another in a distinctive way suggested that the characteristics of a cluster—for example, its size, length, and information content—would allow classification of unknown clusters as miRNAs, bypassing the necessity of full pre-miRNA sequences for annotation. A statistical machine learning approach known as a support vector machine (SVM) was used to test this hypothesis (see Methods).

When our SVM classifier was applied to all sequence clusters comprising sequences classified as “un-annotated,” 44 were predicted to represent novel miRNAs (Table 2); 19 of these clusters contained at least one sequence with an alignment to a P. contorta EST sequence, allowing validation using the aforementioned method of RNA folding and structure evaluation. Of these, five aligned perfectly to one or more EST that could fold into a signature hairpin structure with the nucleotides yielding the small RNA positioned within the stem region (Fig. 4; Table 2; Supplemental Tables), thus confirming they were miRNAs by current standards. One of these miRNAs was also found by the previous annotation method (based on alignment of small RNA sequences to ESTs), but the remaining four were missed by that approach because they only had a single representative small RNA that aligned to that EST; hence, no isomiRs were sequenced for this miRNA. To validate the remaining putative miRNA sequences, one would require additional expressed sequence data (facilitating more folding analyses) or genomic sequences.

Table 2.
Small RNA clusters annotated as miRNA families by SVM classifier

Experimental verification of putative miRNA expression

While it is difficult to functionally verify a new miRNA, we sought to confirm the presence of a limited set of some of the small RNAs presented here using a hybridization-based method. We queried three RNA sequences (as indicated in Tables 1 and 2 and in Supplemental Table 1): miR396, which was found to be perfectly conserved between O. sativa and P. contorta, together with two sequences classified as novel miRNAs. All of these sequences were present in the small RNA fraction of P. contorta as judged by Northern analysis (Supplemental Fig. 6). This simple screen indicates that RNAs with sequences identical to, or highly similar to, those found in our sequencing analysis must exist in P. contorta and lends support to our mode of analysis.

Classifying other P. contorta/O. sativa conserved small RNAs

Though focus was placed on miRNAs in this study, the bulk of the sequences produced in this study, 57,619 from P. contorta and 8486 from O. sativa, cannot be shown to represent conserved or novel miRNAs by any of our annotation methods. Conservation of small RNA sequences between a gymnosperm and an angiosperm provides strong support that the sequences regulate highly conserved RNA-mediated processes. The P. contorta sequences that could be aligned perfectly to the O. sativa genome or to O. sativa small RNAs are of particular interest as they represent highly conserved small RNAs in two distantly related organisms. Using the perfect-match alignment approach described for the O. sativa sequences, all P. contorta small RNA sequences were aligned to the O. sativa genomic sequence. A total of 3567 (2.5%) P. contorta sequences >16 nt in length aligned perfectly to the O. sativa genome (5129 sequences if shorter sequences are included). Of these sequences, only 91 corresponded to known miRNAs, while 632 were deemed fragments of tRNAs, and 2117 fragments of rRNAs.

The remaining 727 P. contorta sequences with perfect matches in the O. sativa genome correspond to loci of unknown function, though 253 of them correspond to annotated genomic repeats, implicating them as possible conserved repeat-derived heterochromatin siRNAs. The apparent lack of 24-nt sequences in the P. contorta small RNA sequences provoked us to ask whether these heterochromatin siRNAs belong to a less diverse class of P. contorta 24-nt sequences masked by the overwhelming dominance of the 21-nt RNAs expressed in this species. The histogram of the lengths of the P. contorta small RNA sequences that mapped to O. sativa repetitive elements highlights that intergenic and repeat-derived sequences contained a higher fraction of perfectly conserved 24-nt RNAs than those either 23 or 25 nt in length (Fig. 2A,E). These P. contorta sequences with perfect matches in the O. sativa genome comprise at least 66 distinct repeat-derived siRNAs. For some of these, corresponding small RNAs were sequenced from O. sativa as well (see below). In contrast, the majority of the perfectly conserved P. contorta 24-nt sequences appeared to derive from both the sense and antisense strands of rRNA genes, supporting the notion that rRNA can be processed into functional siRNAs (see example in Supplemental Fig. 2a) and that this mechanism of small RNA-mediated regulation is conserved in the gymnosperms.

Those sequence-based clusters comprising small RNAs from both species facilitated further identification of conserved P. contorta/O. sativa 24-nt RNAs while allowing for some sequence divergence. Twenty-six of these clusters had median lengths of ~24 nt and contained sequences from both species. These clusters (Table 3) were mostly rRNA sequences, but a few were tRNA or repeat-derived. The P. contorta sequences within these clusters showed a slightly higher variability in sequence lengths (O. sativa mean length variance = 4.45, P. contorta mean length variance = 5.15). The median length of sequences amongst these clusters was slightly lower for the P. contorta sequences (22 nt) as compared to those from O. sativa (24.3 nt). This result suggests that these may represent homologs of O. sativa heterochromatin siRNAs, many of ribosomal RNA origin, that deviate from a strict 24-nt length criterion.

Table 3.
Small RNA sequence clusters comprising ~24-nt RNAs from O. sativa and P. contorta

The list of 727 P. contorta small RNA sequences perfectly conserved in O. sativa was semi-redundant. Many of these sequences overlapped the same genomic loci either partially or completely. By grouping sequences that align to at least one shared genomic site, this set could be further subdivided into six unclassified distinct groups of small RNAs. Each of these groups was of interest because all the sequences in them mapped to a common set of genomic loci and did not seem to derive from any known ncRNAs. In the first group, the small RNAs consistently aligned to a discrete position in a subset of the LTR-type transposable elements in the O. sativa genome. None of these perfectly conserved small RNAs appeared to be miRNAs by current classification standards; however, the consistent observation of sequences from a few loci within these structures also makes them distinct from the usual phased pattern of siRNAs. Intriguingly, the region of the SZ-37/Osr10 LTR repeat (McCarthy et al. 2002) that appears to encode this small RNA has a near-perfect match to many of the other known O. sativa repetitive elements (Supplemental Fig. 4). This suggests the possibility that this site may be a potential target of this siRNA in many of the O. sativa LTR elements, thus evolutionarily constraining this short region of LTR elements.

In two separate groups of conserved small RNAs, the sequences always appeared to derive from a region upstream of a nuclear-inserted copy of a chloroplast tRNA gene, suggesting a role in, or byproduct of, tRNA maturation. A similar observation has recently been made in Arabidopsis, and it was suggested that these sequences are the sequence leaders cleaved during tRNA maturation (Rajagopalan et al. 2006). Notably, however, most of the small RNAs of this type appear to derive from the opposite strand of the tRNA transcript, as do many of the small RNAs that overlap with the neighboring tRNA annotation (Supplemental Fig. 5a). The three remaining groups of small RNAs also appear to derive from chloroplast genes or their nuclear-encoded counterparts. In one case, all small RNAs aligned to one discrete region of the dicistronic chloroplast transcript of the ndhB and rps7 genes. The position of these small RNAs was situated over the site of enzymatic cleavage of this transcript (Hashimoto et al. 2003) just upstream of the ndhB start codon (Supplemental Fig. 5b). This site is perfectly conserved in A. thaliana and Zea mays as well, suggesting its importance and a high likelihood of small RNA involvement in the processing of this transcript. In another example, the remaining two groups of sequences aligned to the chloroplast polycistronic transcript that includes psbB, psbT, psbH, petB, and petD in two separate regions. Multiple examples of each of these transcripts have been inserted into the O. sativa chromosome, and it is impossible to determine whether these sequences derive from their nuclear or chloroplast copies. However, as the small RNA processing machinery is generally thought to reside in the nucleus, it is likely that these small RNAs are of nuclear origin.

Discussion

In-depth analysis of small RNAs from organisms at strategically chosen phylogenetic distances is necessary to gain a better understanding of the evolution of plant cellular process mediated by small RNA. However, to date, few efforts have focused on small RNAs in any plant species outside of the angiosperms. In this global survey of small RNAs in P. contorta, many gymnosperm miRNAs with known angiosperm homologs have been identified (Supplemental Table 1). Though many of the so-called conserved miRNAs have been found in a variety of plant species, only direct sequencing of the small RNA molecules provides a definitive route to discover novel miRNAs outside these families. With our introduction of a novel sequence-based clustering method and a support vector machine that classifies the resultant clusters, this study has also provided an unprecedented view of the miRNAs present in P. contorta, a species with limited expressed sequence and no genomic sequence data.

Owing to our application of the miRNA annotation techniques developed in this work, these data have not been limited to the identification of only those miRNAs with homologs in A. thaliana or other plants (see Tables 1 and 2). Two methods for identifying novel miRNAs were applied to the P. contorta small RNA sequences. One of these methods considers putative pre-miRNA structure, and the other relies on identification of miRNA-like sequence clusters. These techniques have revealed a set of likely gymnosperm-specific novel miRNAs, a few novel O. sativa miRNAs, as well as two novel miRNAs that appear to be conserved between O. sativa and P. contorta (Table 1).

Though many of their pre-miRNA structures have not been validated here, the high expression of many of these predicted miRNAs as well as biases toward 5′ terminal uridine and 3′ terminal guanosine residues provides secondary support that these sequences are miRNA-like. Many (19) of the P. contorta small RNA clusters summarized in Table 2 had at least one small RNA sequence with a perfect alignment to an EST sequence, but only five of these folded into an acceptable fold-back structure. Though this may appear to reject these remaining candidates as true miRNAs, this is not necessarily the case. The mRNA targets of most plant miRNAs have perfect or near-perfect recognition sites. The observation that a small RNA sequence has perfect alignments to multiple unrelated mRNAs implicates those mRNAs as potential target transcripts. To support this, putative P. contorta target transcripts for all miRNA sequences in Tables 1 and 2 were predicted using miRU (Zhang 2005). If one method for miRNA identification performed better, one might expect a difference in the number of predicted targets for these miRNAs owing to a lack of complementarity of non-miRNA sequences within the transcriptome. The number of predicted target genes did not differ significantly between the miRNAs in Tables 1 and 2, supporting that our novel SVM method is comparable to the standard folding-based method for novel miRNA identification (P = 0.3409, Kruskal–Wallis rank-sum test). Still, the predicted novel miRNAs presented in Table 2 should be approached with caution until their pre-miRNA structure can be validated by another approach. Furthermore, even if some of these small RNA molecules cannot presently be defined as miRNAs by current standards, their high expression and reproducible endonuclease cleavage positions support that they perform an important function in P. contorta, perhaps acting as trans-acting siRNAs or potentially in some other uncharacterized processes. That the putative miRNAs provided here were identified by a signature of precise and reproducible maturation suggests that they are processed in a pathway separate from the apparent randomly-derived degradation products of rRNA and tRNA that were observed in high numbers in this and similar studies.

Apart from known O. sativa miRNAs and novel gymnosperm miRNAs, other groups of small RNAs were identified that are highly conserved between these two species. The O. sativa 24-nt sequences showed evidence for heterochromatin siRNA activity as their predicted target sites corresponded to centromeric regions and rRNA gene clusters (Fig. 3). Though they contained a relatively low proportion of 24-nt small RNAs, those 24-nt sequences found in P. contorta were more apt to align to the O. sativa genome than 23-nt or 25-nt small RNAs (Fig. 2A,E). However, very few of the O. sativa 24-nt heterochromatin siRNAs have identifiable P. contorta homologs based on sequence clustering. Also, those that do appear to have homologs are not strictly 24 nt in length, and many correspond to full-length or partial rRNA genes rather than transposable elements. Taken together, it appears that the pathways producing heterochromatin siRNAs existed prior to the divergence of these species. Though the length difference of supposed siRNAs between P. contorta and O. sativa suggests that this class of siRNAs may derive from related genomic repeats, their exact mode of maturation may differ in the gymnosperms. A larger population of siRNAs responsible for controlling transposable elements would be expected in the gymnosperms since much of the difference in genome size between these plants is posited to be due to expansion of such entities (Bennett and Leitch 2005). The larger diversity of 21-nt RNAs in P. contorta is consistent with the hypothesis that ~21-nt small RNAs are fulfilling a substantial portion of this role.

A few other interesting groups of small RNAs that are conserved between O. sativa and P. contorta were found here. Small RNAs from a few of these groups matched only to genes from the chloroplast (or chloroplast-derived genes that have inserted into the nuclear genome). Some of these small RNA sequences have also been observed in similar studies of O. sativa (Chen et al. 2006) and Arabidopsis (Lu and Tei 2005; Rajagopalan et al. 2006). It is unclear whether these sequences are derived from these transcripts or target them for enzyme processing (or both). Two of these sequences map ~150 nt upstream of a number of chloroplast tRNA genes or their nuclear-encoded counterparts. The other three groups aligned to distinct regions of two polycistronic chloroplast transcripts. The site between the two genes on the ndhB/rps7 transcript is of particular interest because it is centered across the experimentally determined endonuclease cleavage site (Hashimoto et al. 2003) (Supplemental Fig. 5b). One of the two small RNA alignment sites within the polycistron containing psbB, psbT, psbH, petB, and petD has previously been identified, using comparative genomics, as a potential recognition site of a protein regulating mRNA processing (Seliverstov and Lyubetsky 2006). These data suggest that this site is conserved not for recognition by a protein but rather as a small RNA binding site that effects cleavage of the polycistron, perhaps by the enzyme(s) responsible for degrading transcripts targeted by the miRNA pathway. This may suggest another function of the miRNA pathway and, considering that it appears to be limited to chloroplast-derived transcripts and is deeply conserved, it may reflect a more ancient application of small RNA-directed transcript processing.

Comparison of the small RNAs expressed in P. contorta to O. sativa has revealed known and novel conserved miRNAs. As well, alignment of the P. contorta sequences to the O. sativa genome identifies a set of perfectly conserved small RNAs, likely ancient and diverse in origin and function. This result supports that miRNA and siRNA pathways were likely functional in the common ancestor of these two species. We also conclude, based on our discovery of up to 53 novel miRNA families in P. contorta (12 in Table 1 and 44 in Table 2, with 1 shared) and a plethora of remaining un-annotated small RNAs, that the diversity of small RNA-mediated processes in the gymnosperms when combined with that of the angiosperms, will provide important context for understanding the evolution of RNA silencing in the spermatophytes.

Methods

Small RNA isolation (P. contorta)

Approximately 0.5 g of young needles was collected and ground into fine powder with a pestle and mortar in the presence of liquid nitrogen. The powder was transferred into a precooled tube and suspended quickly in 1 mL of RNA extraction buffer (100 mM LiCl, 1% SDS, 10 mM EDTA, 100 mM Tris at pH 9); ~1.5 mL of warm phenol was added, and after the tube cooled down to room temperature, ~1.5 mL of chloroform was added, and the mixture was extracted by inverting the tube for 20 min. The tube was centrifuged, and the supernatant was transferred to a fresh tube and extracted twice again in a 50:50 mixture of phenol/chloroform. Nucleic acids were precipitated by the addition of 0.1 volume of 3 M sodium acetate (pH 5) and 2 volumes of 100% ethanol (relative to the extracted sample volume), pelleted by centrifugation, air-dried, and dissolved in 0.2 mL of water. To precipitate small RNA molecules, the nucleic acid solution was mixed with NaCl at the final concentration of 300 mM, glycogen at the final concentration of 12 μg/mL, and 2.5 volumes of 100% ethanol. Nucleic acids were ultimately precipitated at −20°C overnight, which was followed by centrifugation at 4°C for 30 min at 13,200g. O. sativa small RNAs were extracted using the same protocol using leaf buds as a starting material.

Small RNA pool preparation

Small RNA from O. sativa and P. contorta, samples with gel mobility in the 15- to 30-nt size range were gel purified and ligated to an adenylated DNA 3′ adapter with the sequence 5′-AppGAAGAGCCTACGACGA (adapter at 20 μM, 50 mM HEPES, pH 8.3, 10 mM MgCl2, 3.3 mM DTT, 10 μg/mL BSA, 8.3% glycerol) using 4 U/μL T4 RNA ligase (GE Amersham Biosciences) for 90 min at room temperature. The resulting RNA–DNA hybrids were gel purified using 10% PAGE and ligated to a 5′ RNA adaptor having the sequence 5′-rAUCGUAGGCACCUGAAA. This ligation utilized the adaptor at 30 μM. The resulting material was ethanol precipitated and reverse transcribed using a long primer containing sequence required for 454 Sequencing together with a 10-nt random sequence element, 5′-CCTATCCCCTGTGTGCC TTGCCTATCCCCTGTTGCGTGTCTCAG(N)10TCGTCGTAGGC TCTTC. Reverse transcription was with SuperScript II (Invitrogen) using the supplied protocol. The RNA template was destroyed by heating in the presence of 100 mM KOH, and the resulting cDNA was isolated on a 10% denaturing polyacrylamide gel. PCR was conducted using a biotinylated primer (5′-BCCTACCCCTGTGTGCCTTGCCTATCCCCTGTTGCGTGTC TCAG, B indicates location of Biotin residue) and 5′-primers (O. sativa, 5′-CCATCTCATCCCTGCGTGTCCCATCTGTTCCCTCCC TGTCTCAGTGCTAAGCATCGTAGGCACCTGAAA; P. contorta, 5′-CCATCTCATCCCTGCGTGTCCCATCTGTTCCCTCCCTGTCT CAGTAGATACGATCGTAGGCACCTGAAA). Both primers were used at a final concentration of 0.5 μM.

Northern blot detection of miRNAs in P. contorta

RNA was extracted from the green needles of cold acclimatized P. contorta seedlings. Northern blot was produced by running 20 μg of RNA into a 15% denaturing polyacrylamide gel and transferring to Hybond-N+ (Amersham) membrane using a NovaBlot (Pharmacia) electrophoresis unit. The gel was stained with SYBR Green II to visualize a 21-nt size standard. All other steps in the protocol were according to those previously described (Lau et al. 2001). DNA probes were 5′-AAGTTCAAGAAAGCTGTGGA (for miR396), 5′-AACCATCGAGGACCTGACGT (for the first novel miRNA, likely homologous to miR950b), and 5′-TTGGGAAA TAGTTGATAATA (for the novel miRNA with no EST support).

Small RNA sequencing and sequence processing

Small RNA samples were sequenced using the high-throughput pyrosequencing developed by 454 Life Sciences (454 Life Sciences, GS20 platform) (Margulies et al. 2005). Each sequence read is thought to represent a single adaptor-ligated small RNA molecule. The reads were searched for adaptor sequences and library identification tags. The small RNA sequences were assumed to be the sequence between the 5′ and 3′ adaptor sequences. Intervening sequences 10 nt or shorter were ignored. The random 10-nt sequence was assumed to be the first 10 bases following the 3′ adaptor sequence. All sequences were extracted and stored in our custom-built database named MyRNA, which will be made available for public use in the near future. Upon encountering the same small RNA sequence more than once, the sequence count was incremented only if the 10-nt-long random sequence tag was unique. For sequence comparisons, the MyRNA database was also loaded with all plant mature miRNA sequences from miRBase release 9.1 (http://microrna.sanger.ac.uk), which were included in the sequence-clustering pipeline (see below).

Partitioning small RNAs into clusters of related sequences

Small RNA sequences in the MyRNA database were compared in an all-against-all comparison using a heuristic implementation of the Needleman–Wunsch global alignment algorithm (Needleman and Wunsch 1970). Rather than producing alignments, this implementation returns an approximate edit distance between two sequences. As sequences differing in length would be missed by this approach, the algorithm was modified to perform global alignments between all equal-length subsequences of the longer sequence to the shorter sequence of a given pair. Only alignments of 19-nt or longer were kept, since shorter alignments increased the clustering of unrelated sequences. Edit distances of less than four in any alignment were deemed significant and stored in the database. The sequences were considered nodes, and the calculated edit distances were used to weight the edges of the graph. All connected components were extracted from this graph, revealing highly similar groupings of small RNA sequences within the MyRNA database. This method mimics the use of single-linkage hierarchical clustering for three iterations (Gower and Ross 1969). Sequences within each cluster were aligned using ClustalW. The information content was calculated at each position of the alignment and averaged across the total length of the consensus (Schneider and Stephens 1990).

Genome mapping and small RNA annotation

Sequences were mapped to the O. sativa genome using MegaBlast (Zhang et al. 2000) with low-complexity filtering disabled. Only alignments with perfect identity across the length of the query sequence were stored in the MyRNA database. P. contorta small RNA sequences were aligned both to the O. sativa genome and to the set of P. taeda ESTs and cDNAs available from GenBank as of July 2006. Perfect matches of P. contorta small RNA sequences to P. taeda ESTs were also stored in the database.

O. sativa genome annotations (build 3) were downloaded from the RAP1 website (http://rapdownload.lab.nig.ac.jp/index.html). Ribosomal RNA gene coordinates were obtained separately from this group. Coordinates of genomic repeats were determined using RepeatMasker (www.repeatmasker.org), RepBase version 9.04. The known positions of O. sativa miRNA genes were obtained from miRBase. All annotations and small RNA sequence positions were loaded into a local database and accessed through our instance of the GBrowse genome browser (Stein et al. 2002), available at our website (http://microrna.bcgsc.ca/cgi-bin/gbrowse/rice_build_3/).

All small RNA sequences with known genomic positions were annotated based on any overlap with these annotations. Because a small RNA sequence could have more than one genomic position, and thus could overlap with multiple annotations, a priority was assigned to each annotation type. The highest priority was given to overlap with known miRNA genes, followed by overlap with rRNA and tRNA genes. Any un-annotated small RNAs were then annotated as “genomic repeat” if they overlapped at least one RepeatMasker annotation. The remaining small RNAs were further annotated by BLASTN search against Rfam (release 8.0) using an E-value threshold of 0.01. Sequences with no hits in Rfam were labeled “un-annotated” in the database.

There were many sequences that did not map to the O. sativa genome perfectly or were still classified as “un-annotated” after the positional annotation process. Further annotation of these sequences was accomplished by employing the small RNA sequence-based clusters described above. In general, all sequences in the same cluster share high sequence identity. The appropriate annotation was first added to small RNAs residing in a cluster that contained annotated sequences (from above) following the same priority to annotations as before. Sequences in clusters containing more than 100 sequences were ignored in this process, as these clusters were generally low in information content.

Degenerate mapping of O. sativa siRNAs to genomic sequence

The modified local alignment algorithm described in the clustering section was altered to allow searches against genomic sequence. Alignments were kept if they were no shorter than L − 3, where L is the query sequence length. No more than a total of three mismatches or insertions/deletions were allowed within the aligned region.

Discovery of nonconserved P. contorta miRNA families using ESTs

Owing to the lack of genomic sequence, annotation of small RNAs as miRNAs could only employ publicly available ESTs and cDNA sequence as well as miRNAs from other plant species. Because most of the pre-miRNA sequences are rapidly degraded after DCL cleavage, only the mature miRNA and sometimes the miRNA* sequence is obtained. Using this information, the small RNAs with cDNA/EST alignments were checked for miRNA-like alignment patterns. An EST/cDNA with miRNA-like alignment was defined here as having no more than three clusters of small RNA sequences on it, with each cluster of length no larger than 30-nt. Folding these sequences with a standard free energy minimization folding algorithm (Hofacker 2003) facilitated an improved method of true miRNA identification. It is known that plant pre-miRNAs vary from ~80 to ~160 nt in length (Zhang et al. 2006b). Rather than folding the entire EST sequence, only a region of 150 nt on either side of the sequence clusters was folded. Of these folded flanking sequences, the one with the lower MFE was considered the putative pre-miRNA structure. The MFE of the folded sequences was considered as an additional support for the potential of many of these sequences to form stable pre-miRNA structures. Sequences producing structures with MFE > −25 kcal/mol were ignored as well as those with less than half the nucleotides of the small RNA nucleotides paired in the most stable structure.

SVM-aided discovery of novel nonconserved P. contorta miRNAs

Trends of information content and cluster size suggested that clusters of miRNAs might be distinguished based on various parameters derived from the clusters. A SVM was employed as a secondary approach to classify small RNA clusters that may lack EST sequences. The support vector machine implementation entitled SMO was used in the Weka software package (Witten and Frank 2005). The positive training data comprised all clusters containing less than 100 sequences and at least one known (conserved) miRNA sequence. Negative training examples were those clusters that could be classified as other types of small RNAs (tRNA, repeat-derived or rRNA). The following traits were computed for each cluster as parameters for the classifier: number of sequences from P. contorta and O. sativa, mean information content, number of genomic loci (rice genome), mean length of sequences in the cluster, and the variance of these lengths. Suitable parameters were chosen by applying 10-fold cross validation to the training set. The non-default parameters used for prediction were as follows: exponent = 3, c = 1.5. When the classifier was trained as described, it could classify clusters of known miRNAs with modest sensitivity (0.645) but high specificity (0.952). This suggested it was a reliable method for the prediction of putative novel miRNA clusters from these data with a very low rate of false-positive predictions.

Acknowledgments

The P. contorta samples were provided by Jim Mattsson, Department of Biology, Simon Fraser University, Canada. Rice small RNA extracts were a generous gift from M.B. Wang. This project was funded in part by the Natural Sciences and Engineering Research Council of Canada (H.A.E. and P.J.U.). We thank Marco Marra of the BC Genome Sciences Centre for critical evaluation of the manuscript. P.J.U. is a Senior Michael Smith scholar. S.C.S. is a Michael Smith scholar and a Canadian Research Chair. R.D.M. and G.A. receive stipends from the Canadian Institutes for Health Research and the Michael Smith Foundation for Health Research.

Footnotes

[Supplemental material is available online at www.genome.org.]

Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6897308.

References

  • Ambros V., Bartel B., Bartel D.P., Burge C.B., Carrington J.C., Chen X., Dreyfuss G., Eddy S.R., Griffiths-Jones S., Marshall M., Bartel B., Bartel D.P., Burge C.B., Carrington J.C., Chen X., Dreyfuss G., Eddy S.R., Griffiths-Jones S., Marshall M., Bartel D.P., Burge C.B., Carrington J.C., Chen X., Dreyfuss G., Eddy S.R., Griffiths-Jones S., Marshall M., Burge C.B., Carrington J.C., Chen X., Dreyfuss G., Eddy S.R., Griffiths-Jones S., Marshall M., Carrington J.C., Chen X., Dreyfuss G., Eddy S.R., Griffiths-Jones S., Marshall M., Chen X., Dreyfuss G., Eddy S.R., Griffiths-Jones S., Marshall M., Dreyfuss G., Eddy S.R., Griffiths-Jones S., Marshall M., Eddy S.R., Griffiths-Jones S., Marshall M., Griffiths-Jones S., Marshall M., Marshall M., et al. A uniform system for microRNA annotation. RNA. 2003;9:277–279. [PMC free article] [PubMed]
  • Bartel D.P. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. [PubMed]
  • Bennett M.D., Leitch I.J., Leitch I.J. Nuclear DNA amounts in angiosperms: Progress, problems and prospects. Ann. Bot. 2005;95:45–90. [PubMed]
  • Bonnet E., Wuyts J., Rouze P., de Peer Y.V., Wuyts J., Rouze P., de Peer Y.V., Rouze P., de Peer Y.V., de Peer Y.V. Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. Proc. Natl. Acad. Sci. 2004;101:11511–11516. [PMC free article] [PubMed]
  • Borsani O., Zhu J., Verslues P.E., Sunkar R., Zhu J.K., Zhu J., Verslues P.E., Sunkar R., Zhu J.K., Verslues P.E., Sunkar R., Zhu J.K., Sunkar R., Zhu J.K., Zhu J.K. Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell. 2005;123:1279–1291. [PMC free article] [PubMed]
  • Chen Z., Zhang J., Kong J., Li S., Fu Y., Li S., Zhang H., Zhang J., Kong J., Li S., Fu Y., Li S., Zhang H., Kong J., Li S., Fu Y., Li S., Zhang H., Li S., Fu Y., Li S., Zhang H., Fu Y., Li S., Zhang H., Li S., Zhang H., Zhang H. Diversity of endogenous small non-coding RNAs in Oryza sativa. Genetica. 2006;128:21–31. [PubMed]
  • Cheng Z., Dong F., Langdon T., Ouyang S., Buell C.R., Gu M., Blattner F.R., Jiang J., Dong F., Langdon T., Ouyang S., Buell C.R., Gu M., Blattner F.R., Jiang J., Langdon T., Ouyang S., Buell C.R., Gu M., Blattner F.R., Jiang J., Ouyang S., Buell C.R., Gu M., Blattner F.R., Jiang J., Buell C.R., Gu M., Blattner F.R., Jiang J., Gu M., Blattner F.R., Jiang J., Blattner F.R., Jiang J., Jiang J. Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell. 2002;14:1691–1704. [PMC free article] [PubMed]
  • Fahlgren N., Howell M.D., Kasschau K.D., Chapman E.J., Sullivan C.M., Cumbie J.S., Givan S.A., Law T.F., Grant S.R., Dangl J.L., Howell M.D., Kasschau K.D., Chapman E.J., Sullivan C.M., Cumbie J.S., Givan S.A., Law T.F., Grant S.R., Dangl J.L., Kasschau K.D., Chapman E.J., Sullivan C.M., Cumbie J.S., Givan S.A., Law T.F., Grant S.R., Dangl J.L., Chapman E.J., Sullivan C.M., Cumbie J.S., Givan S.A., Law T.F., Grant S.R., Dangl J.L., Sullivan C.M., Cumbie J.S., Givan S.A., Law T.F., Grant S.R., Dangl J.L., Cumbie J.S., Givan S.A., Law T.F., Grant S.R., Dangl J.L., Givan S.A., Law T.F., Grant S.R., Dangl J.L., Law T.F., Grant S.R., Dangl J.L., Grant S.R., Dangl J.L., Dangl J.L., et al. High-throughput sequencing of Arabidopsis microRNAs: Evidence for frequent birth and death of miRNA genes. PLoS ONE. 2007;2:e219. doi: 10.1371/journal.pone.0000219. [PMC free article] [PubMed] [Cross Ref]
  • Gower J.C., Ross G.J.S., Ross G.J.S. Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 1969;18:54–64.
  • Griffiths-Jones S. miRBase: The microRNA sequence database. Methods Mol. Biol. 2006;342:129–138. [PubMed]
  • Griffiths-Jones S., Bateman A., Marshall M., Khanna A., Eddy S.R., Bateman A., Marshall M., Khanna A., Eddy S.R., Marshall M., Khanna A., Eddy S.R., Khanna A., Eddy S.R., Eddy S.R. Rfam: An RNA family database. Nucleic Acids Res. 2003;31:439–441. [PMC free article] [PubMed]
  • Gustafson A.M., Allen E., Givan S., Smith D., Carrington J.C., Kasschau K.D., Allen E., Givan S., Smith D., Carrington J.C., Kasschau K.D., Givan S., Smith D., Carrington J.C., Kasschau K.D., Smith D., Carrington J.C., Kasschau K.D., Carrington J.C., Kasschau K.D., Kasschau K.D. ASRP: The Arabidopsis Small RNA Project Database. Nucleic Acids Res. 2005;33:D637–D640. [PMC free article] [PubMed]
  • Hashimoto M., Endo T., Peltier G., Tasaka M., Shikanai T., Endo T., Peltier G., Tasaka M., Shikanai T., Peltier G., Tasaka M., Shikanai T., Tasaka M., Shikanai T., Shikanai T. A nucleus-encoded factor, CRR2, is essential for the expression of chloroplast ndhB in Arabidopsis. Plant J. 2003;36:541–549. [PubMed]
  • Herr A.J. Pathways through the small RNA world of plants. FEBS Lett. 2005;579:5879–5888. [PubMed]
  • Hofacker I.L. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31:3429–3431. [PMC free article] [PubMed]
  • Itoh T., Tanaka T., Barrero R.A., Yamasaki C., Fujii Y., Hilton P.B., Antonio B.A., Aono H., Apweiler R., Bruskiewich R., Tanaka T., Barrero R.A., Yamasaki C., Fujii Y., Hilton P.B., Antonio B.A., Aono H., Apweiler R., Bruskiewich R., Barrero R.A., Yamasaki C., Fujii Y., Hilton P.B., Antonio B.A., Aono H., Apweiler R., Bruskiewich R., Yamasaki C., Fujii Y., Hilton P.B., Antonio B.A., Aono H., Apweiler R., Bruskiewich R., Fujii Y., Hilton P.B., Antonio B.A., Aono H., Apweiler R., Bruskiewich R., Hilton P.B., Antonio B.A., Aono H., Apweiler R., Bruskiewich R., Antonio B.A., Aono H., Apweiler R., Bruskiewich R., Aono H., Apweiler R., Bruskiewich R., Apweiler R., Bruskiewich R., Bruskiewich R., et al. Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome Res. 2007;17:175–183. [PMC free article] [PubMed]
  • Johnson C., Bowman L., Adai A.T., Vance V., Sundaresan V., Bowman L., Adai A.T., Vance V., Sundaresan V., Adai A.T., Vance V., Sundaresan V., Vance V., Sundaresan V., Sundaresan V. CSRDB: A small RNA integrated database and browser resource for cereals. Nucleic Acids Res. 2007;35:D829–D833. [PMC free article] [PubMed]
  • Jones-Rhoades M.W., Bartel D.P., Bartel D.P. Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol. Cell. 2004;14:787–799. [PubMed]
  • Lau L.C., Lim L.P., Weinstein E.G., Bartel D.P., Lim L.P., Weinstein E.G., Bartel D.P., Weinstein E.G., Bartel D.P., Bartel D.P. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science. 2001;5543:858–862. [PubMed]
  • Lindow M., Krogh A., Krogh A. Computational evidence for hundreds of non-conserved plant microRNAs. BMC Genomics. 2005;6:119. doi: 10.1186/1471-2164-6-119. [PMC free article] [PubMed] [Cross Ref]
  • Lu C., Tei S.S., Tei S.S. Elucidation of the small RNA component of the transcriptome. Science. 2005;309:1567–1569. [PubMed]
  • Margulies M., Egholm M., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z., Egholm M., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z., Altman W.E., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z., Attiya S., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z., Bader J.S., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z., Bemben L.A., Berka J., Braverman M.S., Chen Y.J., Chen Z., Berka J., Braverman M.S., Chen Y.J., Chen Z., Braverman M.S., Chen Y.J., Chen Z., Chen Y.J., Chen Z., Chen Z., et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376–380. [PMC free article] [PubMed]
  • McCarthy E.M., Liu J., Lizhi G., McDonald J.F., Liu J., Lizhi G., McDonald J.F., Lizhi G., McDonald J.F., McDonald J.F. Long terminal repeat retrotransposons of Oryza sativa. Genome Biol. 2002;3:RESEARCH0053. doi: 10.1186/gb-2002-3-10-research0053. [PMC free article] [PubMed] [Cross Ref]
  • Morin R.D., O’Connor M.D., Griffith M., Kuchenbauer F., Delaney A., Prabhu A.-L., Zhao Y., McDonald H., Zeng T., Hirst M., O’Connor M.D., Griffith M., Kuchenbauer F., Delaney A., Prabhu A.-L., Zhao Y., McDonald H., Zeng T., Hirst M., Griffith M., Kuchenbauer F., Delaney A., Prabhu A.-L., Zhao Y., McDonald H., Zeng T., Hirst M., Kuchenbauer F., Delaney A., Prabhu A.-L., Zhao Y., McDonald H., Zeng T., Hirst M., Delaney A., Prabhu A.-L., Zhao Y., McDonald H., Zeng T., Hirst M., Prabhu A.-L., Zhao Y., McDonald H., Zeng T., Hirst M., Zhao Y., McDonald H., Zeng T., Hirst M., McDonald H., Zeng T., Hirst M., Zeng T., Hirst M., Hirst M., et al. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res. 2008;(this issue) doi: 10.1101/gr.7179508. [PMC free article] [PubMed] [Cross Ref]
  • Nakano M., Nobuta K., Vemaraju K., Tej S.S., Skogen J.W., Meyers B.C., Nobuta K., Vemaraju K., Tej S.S., Skogen J.W., Meyers B.C., Vemaraju K., Tej S.S., Skogen J.W., Meyers B.C., Tej S.S., Skogen J.W., Meyers B.C., Skogen J.W., Meyers B.C., Meyers B.C. Plant MPSS databases: Signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res. 2006;34:D731–D735. [PMC free article] [PubMed]
  • Needleman S.B., Wunsch C.D., Wunsch C.D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970;48:443–453. [PubMed]
  • Park W., Li J., Song R., Messing J., Chen X., Li J., Song R., Messing J., Chen X., Song R., Messing J., Chen X., Messing J., Chen X., Chen X. CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr. Biol. 2002;12:1484–1495. [PubMed]
  • Pavy N., Paule C., Parsons L., Crow J.A., Morency M.J., Cooke J., Johnson J.E., Noumen E., Guillet-Claude C., Butterfield Y., Paule C., Parsons L., Crow J.A., Morency M.J., Cooke J., Johnson J.E., Noumen E., Guillet-Claude C., Butterfield Y., Parsons L., Crow J.A., Morency M.J., Cooke J., Johnson J.E., Noumen E., Guillet-Claude C., Butterfield Y., Crow J.A., Morency M.J., Cooke J., Johnson J.E., Noumen E., Guillet-Claude C., Butterfield Y., Morency M.J., Cooke J., Johnson J.E., Noumen E., Guillet-Claude C., Butterfield Y., Cooke J., Johnson J.E., Noumen E., Guillet-Claude C., Butterfield Y., Johnson J.E., Noumen E., Guillet-Claude C., Butterfield Y., Noumen E., Guillet-Claude C., Butterfield Y., Guillet-Claude C., Butterfield Y., Butterfield Y., et al. Generation, annotation, analysis and database integration of 16,500 white spruce EST clusters. BMC Genomics. 2005;6:144. [PMC free article] [PubMed]
  • Pontier D., Yahubyan G., Vega D., Bulski A., Saez-Vasquez J., Hakimi M.A., Lerbs-Mache S., Colot V., Lagrange T., Yahubyan G., Vega D., Bulski A., Saez-Vasquez J., Hakimi M.A., Lerbs-Mache S., Colot V., Lagrange T., Vega D., Bulski A., Saez-Vasquez J., Hakimi M.A., Lerbs-Mache S., Colot V., Lagrange T., Bulski A., Saez-Vasquez J., Hakimi M.A., Lerbs-Mache S., Colot V., Lagrange T., Saez-Vasquez J., Hakimi M.A., Lerbs-Mache S., Colot V., Lagrange T., Hakimi M.A., Lerbs-Mache S., Colot V., Lagrange T., Lerbs-Mache S., Colot V., Lagrange T., Colot V., Lagrange T., Lagrange T. Reinforcement of silencing at transposons and highly repeated sequences requires the concerted action of two distinct RNA polymerases IV in Arabidopsis. Genes & Dev. 2005;19:2030–2040. [PMC free article] [PubMed]
  • Pontius J.U., Wagner L., Schuler G.D., Wagner L., Schuler G.D., Schuler G.D. The NCBI handbook. National Library of Medicine; Bethesda, MD: 2002. UniGene: A unified view of the transcriptome.
  • Quackenbush J., Cho J., Lee D., Liang F., Holt I., Karamycheva S., Parvizi B., Pertea G., Sultana R., White J., Cho J., Lee D., Liang F., Holt I., Karamycheva S., Parvizi B., Pertea G., Sultana R., White J., Lee D., Liang F., Holt I., Karamycheva S., Parvizi B., Pertea G., Sultana R., White J., Liang F., Holt I., Karamycheva S., Parvizi B., Pertea G., Sultana R., White J., Holt I., Karamycheva S., Parvizi B., Pertea G., Sultana R., White J., Karamycheva S., Parvizi B., Pertea G., Sultana R., White J., Parvizi B., Pertea G., Sultana R., White J., Pertea G., Sultana R., White J., Sultana R., White J., White J. The TIGR gene indices: Analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 2001;29:159–164. [PMC free article] [PubMed]
  • Rajagopalan R., Vaucheret H., Trejo J., Bartel D.P., Vaucheret H., Trejo J., Bartel D.P., Trejo J., Bartel D.P., Bartel D.P. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes & Dev. 2006;20:3407–3425. [PMC free article] [PubMed]
  • Schneider T.D., Stephens R.M., Stephens R.M. Sequence logos: A new way to display consensus sequences. Nucleic Acids Res. 1990;18:6097–6100. [PMC free article] [PubMed]
  • Seliverstov A., Lyubetsky V., Lyubetsky V. Translation regulation of intron-containing genes in chloroplasts. J. Bioinform. Comput. Biol. 2006;4:783–792. [PubMed]
  • Stein L.D., Mungall C., Shu S., Caudy M., Mangone M., Day A., Nickerson E., Stajich J.E., Harris T.W., Arva A., Mungall C., Shu S., Caudy M., Mangone M., Day A., Nickerson E., Stajich J.E., Harris T.W., Arva A., Shu S., Caudy M., Mangone M., Day A., Nickerson E., Stajich J.E., Harris T.W., Arva A., Caudy M., Mangone M., Day A., Nickerson E., Stajich J.E., Harris T.W., Arva A., Mangone M., Day A., Nickerson E., Stajich J.E., Harris T.W., Arva A., Day A., Nickerson E., Stajich J.E., Harris T.W., Arva A., Nickerson E., Stajich J.E., Harris T.W., Arva A., Stajich J.E., Harris T.W., Arva A., Harris T.W., Arva A., Arva A., et al. The generic genome browser: A building block for a model organism system database. Genome Res. 2002;12:1599–1610. [PMC free article] [PubMed]
  • Talmor-Neiman M., Stav R., Klipcan L., Buxdorf K., Baulcombe D.C., Arazi T., Stav R., Klipcan L., Buxdorf K., Baulcombe D.C., Arazi T., Klipcan L., Buxdorf K., Baulcombe D.C., Arazi T., Buxdorf K., Baulcombe D.C., Arazi T., Baulcombe D.C., Arazi T., Arazi T. Identification of trans-acting siRNAs in moss and an RNA-dependent RNA polymerase required for their biogenesis. Plant J. 2006;48:511–521. [PubMed]
  • Vazquez F. Arabidopsis endogenous small RNAs: Highways and byways. Trends Plant Sci. 2006;11:460–468. [PubMed]
  • Wang M.B., Metzlaff M., Metzlaff M. RNA silencing and antiviral defense in plants. Curr. Opin. Plant Biol. 2005;8:216–222. [PubMed]
  • Williams L., Carles C.C., Osmont K.S., Fletcher J.C., Carles C.C., Osmont K.S., Fletcher J.C., Osmont K.S., Fletcher J.C., Fletcher J.C. A database analysis method identifies an endogenous trans-acting short-interfering RNA that targets the Arabidopsis ARF2, ARF3, and ARF4 genes. Proc. Natl. Acad. Sci. 2005;102:9703–9708. [PMC free article] [PubMed]
  • Witten I.H., Frank E., Frank E. Data mining: Practical machine learning tools and techniques. Morgan Kaufmann; San Francisco, CA: 2005.
  • Xie Z., Johansen L.K., Gustafson A.M., Kasschau K.D., Lellis A.D., Zilberman D., Jacobsen S.E., Carrington J.C., Johansen L.K., Gustafson A.M., Kasschau K.D., Lellis A.D., Zilberman D., Jacobsen S.E., Carrington J.C., Gustafson A.M., Kasschau K.D., Lellis A.D., Zilberman D., Jacobsen S.E., Carrington J.C., Kasschau K.D., Lellis A.D., Zilberman D., Jacobsen S.E., Carrington J.C., Lellis A.D., Zilberman D., Jacobsen S.E., Carrington J.C., Zilberman D., Jacobsen S.E., Carrington J.C., Jacobsen S.E., Carrington J.C., Carrington J.C. Genetic and functional diversification of small RNA pathways in plants. PLoS Biol. 2004;2:e104. doi: 10.1371/journal.pbio.0020104. [PMC free article] [PubMed] [Cross Ref]
  • Yao Y., Guo G., Ni Z., Sunkar R., Du J., Zhu J.K., Sun Q., Guo G., Ni Z., Sunkar R., Du J., Zhu J.K., Sun Q., Ni Z., Sunkar R., Du J., Zhu J.K., Sun Q., Sunkar R., Du J., Zhu J.K., Sun Q., Du J., Zhu J.K., Sun Q., Zhu J.K., Sun Q., Sun Q. Cloning and characterization of microRNAs from wheat (Triticum aestivum L.) Genome Biol. 2007;8:R96. doi: 10.1186/gb-2007-8-6-r96. [PMC free article] [PubMed] [Cross Ref]
  • Zhang B., Pan X., Cannon C.H., Cobb G.P., Anderson T.A., Pan X., Cannon C.H., Cobb G.P., Anderson T.A., Cannon C.H., Cobb G.P., Anderson T.A., Cobb G.P., Anderson T.A., Anderson T.A. Conservation and divergence of plant microRNA genes. Plant J. 2006a;46:243–259. [PubMed]
  • Zhang B., Pan X.P., Cox S.B., Cobb G.P., Anderson T.A., Pan X.P., Cox S.B., Cobb G.P., Anderson T.A., Cox S.B., Cobb G.P., Anderson T.A., Cobb G.P., Anderson T.A., Anderson T.A. Evidence that miRNAs are different from other RNAs. Cell. Mol. Life Sci. 2006b;63:246–254. [PubMed]
  • Zhang Y. miRU: An automated plant miRNA target prediction server. Nucleic Acids Res. 2005;33:W701–W704. [PMC free article] [PubMed]
  • Zhang Z., Schwartz S., Wagner L., Miller W., Schwartz S., Wagner L., Miller W., Wagner L., Miller W., Miller W. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 2000;7:203–214. [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...