• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Dec 2007; 17(12): 1865–1879.
PMCID: PMC2099594

Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes

Abstract

MicroRNAs (miRNAs) are short regulatory RNAs that inhibit target genes by complementary binding in 3′ untranslated regions (3′ UTRs). They are one of the most abundant classes of regulators, targeting a large fraction of all genes, making their comprehensive study a requirement for understanding regulation and development. Here we use 12 Drosophila genomes to define structural and evolutionary signatures of miRNA hairpins, which we use for their de novo discovery. We predict >41 novel miRNA genes, which encompass many unique families, and 28 of which are validated experimentally. We also define signals for the precise start position of mature miRNAs, which suggest corrections of previously known miRNAs, often leading to drastic changes in their predicted target spectrum. We show that miRNA discovery power scales with the number and divergence of species compared, suggesting that such approaches can be successful in human as dozens of mammalian genomes become available. Interestingly, for some miRNAs sense and anti-sense hairpins score highly and mature miRNAs from both strands can indeed be found in vivo. Similarly, miRNAs with weak 5′ end predictions show increased in vivo processing of multiple alternate 5′ ends and have fewer predicted targets. Lastly, we show that several miRNA star sequences score highly and are likely functional. For mir-10 in particular, both arms show abundant processing, and both show highly conserved target sites in Hox genes, suggesting a possible cooperation of the two arms, and their role as a master Hox regulator.

Regulation of gene expression in response to developmental and environmental stimuli is central to animal development. In addition to regulation at the transcriptional level, it is increasingly recognized that an important fraction of regulation occurs post-transcriptionally, and to a large extent by microRNAs (miRNAs) (Lai 2003; Ambros 2004; Bartel 2004; Alvarez-Garcia and Miska 2005; Plasterk 2006; Chen and Rajewsky 2007). These are short RNA genes that direct the inhibition of target messenger-RNA expression via complementary binding sites in the 3′ untranslated region (3′ UTR) (for reviews, see Zamore and Haley 2005; Valencia-Sanchez et al. 2006). miRNAs play an integral part of animal gene regulatory networks. First, they are one of the most abundant classes of regulators, currently estimated to comprise 1%–5% of animal genes (Bartel 2004; Bentwich et al. 2005a; Berezikov et al. 2005). Second, they target a large fraction of all genes, a typical miRNA regulating hundreds of target genes (Brennecke et al. 2005; Krek et al. 2005; Lewis et al. 2005; Xie et al. 2005). Third, as many genes contain target sites for several miRNAs, combinatorial control similar to that known for transcription factors is likely to increase miRNA regulatory versatility (Hobert 2004). Finally, several miRNAs are deeply conserved in the animal kingdom (e.g., let-7; Pasquinelli et al. 2000), suggesting an ancient role similar to some transcription factors in the core of developmental regulatory kernels (Davidson and Erwin 2006).

Thus, a comprehensive understanding of all miRNAs and their targets in an animal genome presents a necessary milestone in our understanding of gene regulation in animal genomes. This is especially true, as knowledge of the miRNA sequence alone can allow the identification of the physiologically relevant target genes (e.g., mir-iab-4 [Stark et al. 2003; Ronshaugen et al. 2005]; bantam [Brennecke et al. 2003]; mir-9a [Li et al. 2006]). Although the first miRNA-target pair was identified genetically more than a decade ago (Lee et al. 1993; Wightman et al. 1993), most miRNAs have since been discovered by small RNA cloning, starting in 2001 (Lagos-Quintana et al. 2001; Lau et al. 2001; Lee and Ambros 2001). Recent advances in massive parallel sequencing technologies have greatly increased the sensitivity of such approaches, leading to the discovery of many novel miRNAs and suggesting that miRNA discovery in well-studied organisms might be reaching saturation (Berezikov et al. 2006b; Ruby et al. 2006). Nevertheless, miRNAs that are expressed in low levels, only in a few cells, or under highly specific conditions remain difficult to detect experimentally (e.g., lsy-6 in Caenorhabditis elegans; Johnston and Hobert 2003; Ruby et al. 2006). Ideally, computational approaches should be able to complement these studies and discover miRNAs directly from their sequence features in complete genomes. In contrast to experimental approaches, which are inherently limited by the developmental stages and tissues surveyed, computational studies should provide a global view of miRNAs regardless of their expression. Several structural features of miRNA hairpins have been defined and used to predict novel miRNA genes (for review, see Berezikov et al. 2006a). However, in the absence of sufficiently specific miRNA hairpin characteristics (Ng Kwang Loong and Mishra 2007; Ritchie et al. 2007), these are insufficient to recognize the small number of true miRNA genes amidst the enormous number of nonbiological miRNA-like hairpins in large animal genomes (see below and Berezikov et al. 2006a).

Comparative genomics provides an opportunity to discover functional miRNAs systematically, making use of their conservation across multiple species. Comparative approaches have been applied to discover a wide range of functional elements, including protein-coding genes, RNA genes, various classes of regulatory elements or motifs (e.g., for review, see Miller et al. 2004), and have also been used for the prediction of miRNA target genes (for review, see Lai 2004; Rajewsky 2006). For miRNA identification, comparative approaches using a small number of species have led to the discovery of novel miRNAs in flies, worms, and mammals (Lai et al. 2003; Lim et al. 2003a, b; Bentwich et al. 2005b; for review, see Berezikov et al. 2006a). It is still unclear, however, to what extent a purely de novo approach can be used to identify novel miRNAs with high specificity and whether it can define their precise boundaries. It is also unclear how discovery power scales with the number of sequenced species, and whether comparative genomics can lead to functional insights on miRNA function beyond their discovery. The recent availability of 12 fully sequenced Drosophila genomes spanning over 40 million years of evolution (Drosophila 12 Genomes Consortium 2007; Stark et al. 2007) provides a unique opportunity to address these questions, providing a rich comparative genomics data set, at a range of evolutionary distances.

Drosophila melanogaster presents perhaps the most important model organism for understanding the basic principles and molecular mechanisms of animal development. Similarly, Drosophila genetics has played an important role in understanding the functional roles of animal miRNAs. Several miRNA loci were first discovered in the 90s by means of gain-of-functions screens, and only later recognized to be miRNAs (e.g., mir-7, Li and Carthew 2005; bantam, Brennecke et al. 2003; mir-278, Teleman and Cohen 2006). Gain-of-function alleles also led to the discovery of regulatory motifs in the 3′ UTRs of several genes that were later identified as miRNA target sites (Lai and Posakony 1997; Lai et al. 1998; Lai 2002). The sequencing of Drosophila pseudoobscura (Richards et al. 2005) enabled the first comparative genomics approaches in Drosophila, which proved successful in the identification of novel miRNAs (Lai et al. 2003), the prediction of miRNA target genes (Enright et al. 2003; Stark et al. 2003; Grun et al. 2005), the refinement of rules for miRNA-target recognition (Brennecke et al. 2005), and the description of global patterns of miRNA regulatory networks (Stark et al. 2005). Nevertheless, our picture on Drosophila miRNAs is far from complete. In fact, with 60 cloned miRNAs in the Rfam miRNA registry (Griffiths-Jones et al. 2006), only half of the conservative upper estimate of 120 loci are known (Aravin et al. 2003; Lai et al. 2003).

In this paper, we use 12 Drosophila genomes (Drosophila 12 Genomes Consortium 2007; Stark et al. 2007) to undertake a systematic de novo discovery of miRNA genes and gain insights into miRNA biology. We define distinguishing properties of known miRNA genes, both structural and evolutionary. We use these properties for the de novo discovery of miRNA hairpins and the prediction of the precise start position of mature miRNAs. Our methods predict >41 novel miRNA genes, of which 28 are validated experimentally, and 19 more extend known miRNA families or cluster with known miRNAs. In eight cases, our prediction and/or validation correct the current Rfam annotation of the mature miRNAs by shifts of one or more bases, leading to drastic differences in the set of predicted target genes. The novel predicted miRNAs lead to 37 novel miRNA families with unique 5′ seed sequences, doubling the number of miRNA families and, thus, the diversity of miRNA targeting in Drosophila. Our results also lead to several new insights into miRNA biology. We find that, in some cases, both sense and anti-sense sequences have miRNA-like characteristics and can be processed into mature miRNAs. We also show that a single hairpin can give rise to multiple mature miRNAs, whose abundance correlates with the strength of our computational signals. We also show that miRNA star sequences can be functional (miRNA* denotes the small RNA processed from the hairpin arm opposite of the mature miRNA). For example, for mir-10, a miRNA in the Hox cluster, both miR-10 and miR-10* appear functional, with highly conserved target sites in multiple Hox genes. In fact, miR-10* shows stronger evolutionary signals and is more abundant, suggesting it may be the primary product. Lastly, we show that the newly discovered miRNAs target an overall similar gene set to known miRNAs, albeit in different combinations stemming from distinct seeds. This implies a much denser miRNA regulatory network than previously thought, with increased potential for combinatorial control.

Results

Structural and evolutionary properties of known miRNA hairpins

We studied the structural and evolutionary properties of the known Drosophila miRNAs to derive discriminating features, which distinguish them from nonfunctional miRNA-like hairpins that occur frequently throughout the genome (Fig. 1). We compared the cloned miRNAs in Rfam release 9.0 (Griffiths-Jones et al. 2006) with hairpins in randomly chosen genomic sequences, and with a subset of the random hairpins, selected to resemble known miRNAs in their length and pairing characteristics (see Methods).

Figure 1.
Evolutionary and structural properties of miRNA hairpins. (A) Typical miRNA hairpin showing mature miRNA (red) and miRNA* (blue). (B) Structural properties of miRNA hairpins for cloned miRNAs (blue), random hairpins of similar lengths and arm pairing ...

We found several distinct structural characteristics of miRNA hairpins (Fig. 1A,B), partly also reported and used in previous prediction efforts (e.g., Lai et al. 2003; Lim et al. 2003b; Bentwich et al. 2005b; for review, see Berezikov et al. 2006a). First, the length of miRNA hairpins is much more precisely defined than that of random hairpins: 90% of all miRNA hairpins are between 73 and 102 nucleotides (nt) long, whereas 90% of random hairpins have lengths between 47 and 117 (2.4× larger range). Second, the lengths of the arms and the hairpin loop are more restricted for miRNAs and show a clear trend to longer arms and shorter loops on average: 90% of all miRNAs have arms lengths between 31 and 47 and loops between 4 and 26 nt, while random hairpins are much less defined. Third, miRNA precursor sequences fold into significantly more stable secondary structures than random hairpins, between 3 and 6 standard deviations above what is expected given their lengths and sequence composition. Fourth, miRNA hairpins have more symmetric loops but fewer asymmetric or bulged loops than random hairpins of similar length. Such loop structures in the arms may direct hairpin cleavage, facilitate asymmetric strand separation, or allow for efficient loading of mature miRNAs into effector complexes, processes which are not yet fully understood (Khvorova et al. 2003; Schwarz et al. 2003; Han et al. 2006; Seitz and Zamore 2006).

We next assessed the evolutionary constraints of miRNAs in 12 Drosophila genomes (Drosophila 12 Genomes Consortium 2007; Stark et al. 2007). We aligned each of the 60 cloned miRNAs and their flanking regions across all 12 species (we determined the corresponding sequence in each of the 12 genomes by BLAST, as existing whole-genome alignments were not found to be reliable). These alignments show a characteristic conservation profile, which closely follows the hairpin structure (Fig. 1C), as previously reported in flies and mammals (Lai et al. 2003; Berezikov et al. 2005). The loop and the flanking regions show abundant mutations, insertions, and deletions, while the arms are very highly conserved. In particular, while compensatory mutations are frequently observed in other RNA genes, they are not found in miRNA hairpins, especially within the mature miRNA. In fact, for all but eight cloned miRNAs, the mature sequence shows a 100% conservation in all species where the corresponding hairpin sequence can be found (Supplemental Table 1). For four of these eight (mir-277, mir-2b-1, mir-305, mir-309), the only sequence difference is found in the closely related Drosophila simulans or persimilis, which have been sequenced at lower coverage, suggesting the differences may represent sequencing errors rather than evolutionary divergence. Perfect conservation of the entire mature miRNAs might reflect additional functional constraints on the arm sequences beyond simply stabilizing an RNA secondary structure (e.g., compensatory pairing of the miRNA 3′ end; Brennecke et al. 2005) and has been found to extend across species as distant as worms and vertebrates in some cases (e.g., let-7; Pasquinelli et al. 2000). In contrast, the high sequence divergence of the loop suggests that it may simply be a linker with no additional functions.

Taken together, these structural and evolutionary features set real miRNA hairpins apart from random miRNA-like hairpins. We next use combinations of these features for the genome-wide identification of Drosophila miRNAs.

Distinguishing Drosophila miRNA hairpins from random genomic hairpins

Discovering novel miRNA genes computationally is an enormous challenge: Cloned miRNAs represent only 60 of 760,355 hairpins in the fly genome with miRNA-like properties (total length between ~60 and 120 nt, arm length >20 nt, with at least 70% paired bases). Therefore, in addition to selecting for miRNA-like hairpins, several additional features must be defined in order to achieve the needed discriminatory power of >99.99% specificity. For example, a 0.5% false-positive rate would result in 3500 spurious predictions.

We tested how well each of the structural and evolutionary properties above discriminates between the known miRNA hairpins and all miRNA-like hairpins in the genome (Fig. 1E). For example, scoring the miRNA conservation profile (Fig. 1C) alone shows >300-fold discrimination between miRNAs (70% pass) and other hairpins (only 0.2% pass) across 12 genomes. Our conservation profile score is related to the metric of arm vs. loop conservation ratio, used for miRNA prediction in two Drosophila genomes (Lai et al. 2003), albeit much more precise—for comparison, the previous metric shows an enrichment of only 19-fold across 12 genomes.

Amongst structural features, the free energy score of the hairpin, corrected for hairpin length and sequence composition, is most discriminative with a 39-fold discrimination. The folding energy of the consensus sequence, which measures structural conservation (Washietl et al. 2005), shows a sixfold enrichment, followed by the overall loop-symmetry (threefold) and the extent of base-pairing in the hairpin arms (2.3-fold). Remaining features such as the length of the entire hairpin, the arms, or the loop show only moderate contribution (less than twofold). We conclude that, after selection for hairpin length and stem pairing, the remaining contribution of structural features is only moderate.

Although several individual properties provide a significant discrimination, the specificity of even the best of them is not sufficient to reliably predict miRNAs. Given the large number of genomic hairpins, many would be selected that score well for that property but lack other miRNA characteristics and are likely false. For example, most of the 1625 miRNA hairpins selected by the conservation profile above (99.8% rejection rate for random hairpins) are likely false: 40% alone fall into exons or repeats or have accumulated mutations in their mature sequence. Therefore, several features need to be combined to reach a sufficient discrimination power. To identify optimal combinations and weightings of these features, we used a machine learning approach, using tallied votes of 500 independently constructed decision trees (see Methods; Breiman 2001). The combined approach achieves >4,500-fold discrimination after cross-validation, recovering 85% of known miRNAs in <0.02% of genomic hairpins.

Recovering known miRNA genes

To discover Drosophila miRNAs in the genome, we ranked all 760,355 miRNA-like hairpins in the entire genome according to this combined score. The top ranking hairpins are strongly enriched in cloned Rfam miRNAs, whose discovery rate plateaus at a score of 0.95 (Fig. 2B). At this cutoff, our method results in 101 hairpins in the fly genome, including 51 of the 60 cloned miRNAs (85%) and novel miRNA candidates with structural and conservation scores similar to, and sometimes higher than, known miRNAs (Fig. 2; Table 1). Upon inspection, of the remaining nine cloned miRNAs which are not recovered, mir-9b ranks only slightly below the cutoff (0.938; rank 111), mir-278 ranks 278th with significantly lower conservation and structural scores, four appear conserved only in very close species (mir-303, mir-309, mir-311, mir-312), and two contain very long insertions or deletions in at least one species (mir-6-2 and mir-31b); finally, mir-314 shows a notable D. melanogaster-specific sequence difference, substituting a highly conserved U with a C (Supplemental Fig. S1).

Table 1.
Predicted miRNAs (score ≥ 0.95)
Figure 2.
Novel Drosophila miRNAs. (A) Prediction and validation of miRNA mir-190. mir-190 (black) is predicted in the intron of the cytoskeleton anchor protein rhea (blue, UCSC browser screen-shot) in the direction of transcription, sequence alignment of mir-190 ...

Our top predictions also contain nine of the 18 previously predicted but not yet cloned Rfam miRNAs (Lai et al. 2003). Of the nine Rfam miRNAs which we do not recover, two are poorly conserved and found only in close species (mir-310, mir-313), two have a highly conserved hairpin loop, which is very unusual for Drosophila miRNAs (mir-100 and mir-125), and one has an unusually large hairpin loop (mir-2c). The remaining four appear to be spurious (mir-280, mir-287, mir-288, mir-289) and, in contrast to all other Rfam miRNAs (see below), none of them could be verified in a large scale sequencing effort for Drosophila miRNAs (Ruby et al. 2007, this issue).

Strong support for our method and score cutoff stems from the genomic positions of the candidate hairpins, a feature not used for scoring. We plotted the fraction of all hairpins residing in exons, introns, repeats, and intergenic regions as a function of the hairpin score (Fig. 2C). Strikingly, while the occurrence of miRNA-like hairpins in the genome is indistinguishable from a random distribution according to the overall region lengths, the top scoring hairpins lie almost exclusively in introns and intergenic regions. Indeed, all known miRNA genes in animals lie in introns of protein coding genes or as separate, intergenic transcription units. For scores <0.95, the fraction of exonic and repeat-derived hairpins increases, and, for scores <0.80, it becomes indistinguishable from random, suggesting that any remaining miRNAs in this score range represent only an extreme minority amidst mostly spurious candidates. We conclude that our approach successfully discriminates real miRNA hairpins from abundant spurious hairpins based on patterns of evolutionary conservation and structural features.

Experimental validation of hairpin predictions

The top 101 hairpins include 41 novel miRNA candidates, for which several measures suggest that they represent functional, novel miRNA genes, nearly doubling the number of miRNA genes in the fly. To experimentally validate our predictions, we obtained 763,111 Solexa sequencing reads corresponding to 1524 distinct sequences from Drosophila ovaries and testes that match our predictions (see Methods and Supplemental data). We required multiple independent reads of the mature miRNA and manually inspected the reads for miRNA-like processing, providing a very stringent validation, essentially free of false positives (Ruby et al. 2006). Sixty-nine of our 101 predictions with scores ≥0.95 were supported by ≥10 reads each, including 17 of the 41 novel predictions. In addition, when we required three or more reads per hairpin position, our data supported 19 novel and 73 hairpins in total. In contrast, none of 500 random hairpins with scores <0.1 were supported by three or more reads. Indeed, only two matched to spurious reads that were each cloned only once. As additional predicted miRNAs might be valid yet not expressed in ovaries or testes, we also intersected our predictions with curated reads from 454 sequencing of small RNA libraries from several developmental stages and tissues (Table 1; Fig. 2A,B; intersection kindly performed by Graham Ruby, Eric Lai, and David Bartel [Ruby et al. 2007]). In total, we validate 84 of our 101 predictions (83%) with scores ≥0.95, including 24 of the 41 novel predictions (59%). At a less stringent cutoff of 0.90, an additional 49 novel miRNA candidates are predicted, of which a total of four are validated (Fig. 2B). This increases the number of cloned Drosophila miRNAs by 28 (47%) and suggests that our top predictions include the vast majority of conserved miRNAs.

Additional evidence supports novel miRNAs are functional

We next evaluated the 61 candidate miRNAs, that did not intersect with sufficient sequencing reads (17 with scores ≥0.95 and an additional 44 with scores ≥0.90; Table 1). We evaluated these using additional properties of known miRNA genes, which were not used as features during the prediction of these hairpins. We found that, indeed, numerous candidates show several striking properties. For example, 17 of the 61 miRNAs are family members of known Drosophila miRNA families, four have orthologs in mosquito, six in worm, and five in human. An additional five candidates do not belong to known families but have predicted 5′ ends (see below) with 7-mers that scored equally high or higher than most miRNAs (MCS ≥ 40; see below). We found that three of the 61 hairpins are clustered in the genome, a property of many known miRNAs. Overall, 19 of the 61 hairpins are found in introns, common among miRNAs and suggestive of transcription. Indeed, five of these and an additional three intergenic hairpins are expressed during embryogenesis, as revealed by a recent genome-wide expression study using tiling arrays (Manak et al. 2006). Even though this is not significantly more than expected given the overall high prevalence of transcription (~1/6 intergenic probes), it demonstrates the presence of hairpin transcripts necessary for miRNA processing.

We conclude that several of these candidate miRNAs may be functional. These are likely to be of low abundance (similar to worm lsy-6; Johnston and Hobert 2003; Ruby et al. 2006), or expressed in tissues or under conditions that were not surveyed. We did not experimentally validate hairpin candidates with scores <0.9, but several metrics suggest they no longer include meaningful miRNAs at a reasonable rate. Amidst 386 hairpins scoring between 0.8 and 0.9, a considerable fraction (25%) falls within protein-coding exons, UTRs, and repeat elements, which are unlikely to contain functional miRNA genes. Below 0.8, the fraction of annotated exons and repeat elements is indistinguishable from random (Fig. 2C). We conclude that few conserved miRNAs are missed by our analysis.

Overall, our analysis resulted in 28 novel miRNA genes with strong evolutionary conservation and experimental support, and additional candidates. Given the recovery rate of 85% of cloned miRNAs, we estimate a total of ~120 conserved miRNAs, which is similar to previous estimates based on comparative information (Lai et al. 2003) and the number of known miRNAs in worm (Ruby et al. 2006). Note, however, that the observation that less deeply conserved miRNAs are expressed at lower levels (see below and Ruby et al. 2006) does not allow an accurate estimation of the number of nonconserved miRNAs.

High accuracy stems from feature set and number of species

Overall, our method recovers 85% of known miRNAs, and at least 83% of all predictions are correct. This high accuracy stems from the features we define, the increased number of species in our comparison, and the large evolutionary distances they span.

To study the effect of evolutionary distance on our performance, we applied our methods to different subsets of species and assessed the number of true miRNAs that were found among the top 100 predictions (Supplemental Fig. S3). For pairwise comparisons, the discovery rate increases approximately linearly with the evolutionary distance of the two species: Close species in the melanogaster subgroup (D. simulans, sechellia, yakuba, and erecta) allow the discovery of up to 49 valid miRNAs, those in the D. sophophora clade (D. persimilis, pseudoobscura, and willistonii) up to 73, and the most remote species (D. mojavensis, virilis, and grimshawii) up to 76 among the top 100 predictions. For multispecies comparisons, inclusion of all species at the same evolutionary distance cutoffs led to 49, 78, and 84 recovered miRNAs, respectively (Supplemental Fig. S3).

To study the effect of our feature set and methodology, we compared the number of recovered miRNAs to those predicted by Lai et al. (2003), by using D. pseudoobscura as our only informant species. Lai and colleagues reported 208 predicted miRNA loci, including 60 Rfam miRNA loci, and 13 novel miRNA loci reported by Ruby et al. (2007). By comparison, using the same rank cutoff, we recover a similar number of previously known miRNAs (n = 61), but nearly twice as many newly cloned ones (n = 23). The difference is likely due to the new features we define, the new methods for combining them, and also the new Rfam miRNAs available for training.

Finally, we asked whether clade-specific miRNAs could be discovered in clade-specific searches, evaluated using a set of 28 miRNAs that are not conserved outside the melanogaster species (i.e., beyond D. ananassae) and a set of 32 miRNAs that are not conserved outside the sophophora species (i.e., beyond D. willistoni). We tried 21 species combinations, but we found only one case of a discovered clade-specific miRNA that was not found using all 12 species, suggesting that clade-specific miRNAs are generally missed by this approach. Two factors contribute to this: First, the currently sequenced species do not provide sufficient discovery power for clade-specific miRNAs, due to insufficient genome sequencing at short branch lengths; second, the conservation properties of clade-specific miRNAs are likely to differ from the conservation properties of Rfam miRNAs, which are generally much more deeply conserved. Although we retrained our features for each species combination, we only used Rfam miRNAs as our training set, likely biasing against clade-specific structural and conservation patterns. It is likely that an improved understanding of the structural requirements for precursor processing, a larger training set including clade-specific miRNAs, and additional sequencing of closely related species will enable de novo discovery of additional miRNAs. However, it is currently unclear if structural properties in the absence of evolutionary signatures can be sufficient to reliably detect truly species-specific miRNAs that are diverged even between very close species (for such miRNAs in primates, see Bentwich et al. 2005a; Zhang et al. 2007).

In summary, we found that discovery power continued to increase with additional species and additional evolutionary distance, without reaching saturation. Indeed, using 12 species always performed best, whether we considered all miRNAs, or the clade-specific sets. We thus expect that additional species will lead to continued increase in discovery power, especially in the ability to discover clade-specific genes. Overall, for conserved miRNA genes, the 12 Drosophila species provided an ideal substrate for miRNA discovery, resulting in highly accurate de novo predictions.

Genomic clustering of novel miRNA genes and relationship with protein-coding genes

Many of the novel candidates show genomic clustering, a feature frequently observed among animal miRNA genes. Among the 28 validated novel miRNAs, six are clustered with known or novel miRNA genes (Fig. 2D,E). These include three novel miRNA genes within 4 kb of mir-318, mir-279, and mir-210, respectively, one novel miRNA gene 500 nt from mir-11 in the E2F intron, and two novel miRNA genes within 1 kb of each other in consecutive introns of CG31646. Among the 61 nonvalidated candidates with scores ≥0.90, three additional hairpins are within 3 kb of each other. Given the abundance of directed knockout experiments in Drosophila, recognizing clustered miRNAs is essential, as neighboring miRNAs will undoubtedly influence knockout phenotypes, and sometimes may be primarily responsible for them.

We find that 14 of the 28 novel verified miRNAs are within introns of protein-coding genes (Fig. 2D; Table 1), also a common feature of animal miRNAs. For 10 of these, the miRNA is in the same strand as the host gene, suggesting common transcriptional regulation (Baskerville and Bartel 2005). We particularly point out mir-1000, which lies within the intron of mushashi (msi), an RNA-binding protein that has been reported to down-regulate proteins translationally (Okabe et al. 2001). This novel miRNA might be involved in, or responsible for, the previously reported mushashi functions. Similarly, mir-998 and mir-995 reside in the introns of the cell-cycle regulators E2F and cdc2c, respectively (Fig. 2D), and are likely involved in the regulation of cell division. Notably, their vertebrate homolog mir-29 has been implicated in cancer (Pekarsky et al. 2006).

In the remaining four cases, intronic miRNAs lie on the opposite strand of annotated genes, suggesting expression as independent transcription units (Aboobaker et al. 2005). For example, mir-964 and mir-959, both validated, lie on the opposite strand of CG31646. Indeed, these are supported by reverse-strand cDNAs and ESTs, which are correlated to each other in expression, but uncorrelated with CG31646.

Finally, two novel miRNAs overlap exons of previously annotated protein coding genes. Candidate Novel-60 overlapped the annotated coding region of CG33311 and validated mir-996 resides in the annotated 5′ UTR of CG31044 (Fig. 2E). In both cases, an independent analysis of protein-coding evolutionary constraint had in fact rejected these genes as unlikely to encode for proteins (Lin et al. 2007): 12-species alignments for both genes are littered with stop codons, frame-shifting insertions and deletions, and nonconservative codon substitutions, suggesting they are not under protein-coding selection. Consequently, the newly discovered miRNA genes provide an explanation for the previously observed transcripts, which we conclude are not encoding proteins. This finding highlights the importance of systematic high-quality annotation of both protein-coding and non-protein-coding genes, based on their specific evolutionary constraints. It also illustrates the power of our unbiased genome-wide prediction of miRNAs that, unlike previous methods, did not explicitly exclude exons from the regions searched.

Both strands can be transcribed and processed

Intuitively, one may expect that the reverse complementary sequence of a miRNA hairpin would also fold into an equivalent hairpin. However, due to GU base pairs that translate into incompatible AC base pairs, and sequence-specific energy terms, this is generally not the case. In fact, we found large differences between sense and anti-sense hairpins for most cloned miRNAs: while 51 sense hairpins scored ≥0.95, only 21 of the complementary reverse-strand hairpins reached that score, and only four of 51 reverse complements scored more highly than the correct strand (Supplemental Fig. S2). While the contribution of hairpin conservation is independent of the strand, high scores, when considering a wide variety of structural features including loop lengths, structure, and symmetry, suggest that these anti-sense hairpins would constitute bona fide miRNAs if transcribed.

Expression data confirmed that both strands of a miRNA gene can be correctly processed into mature miRNAs and that they may be independently regulated. In four cases (mir-iab-4, mir-307, mir-124, and mir-305), sequencing reads were found that corresponded to a correctly processed mature miRNA product for the reverse strand (Ruby et al. 2007), albeit at a much lower level than the forward-strand product. In all four cases, both forward and reverse strand showed scores >0.97 based on our evolutionary and structural metrics. For one case in particular (mir-iab-4), both sense and anti-sense transcripts have been detected by in situ analyses in fly embryos (Bae et al. 2002), indicating robust transcription of both strands. Interestingly, the two strands of mir-iab-4 are expressed in distinct and mutually exclusive embryonic segments, suggesting independent regulation of two distinct miRNAs, likely with distinct functions.

We conclude that anti-sense transcription, whether fortuitous or regulated, can lead to functional processing of anti-sense pre-miRNAs that resemble canonical miRNA hairpins into distinct mature miRNAs. The reverse-strand miRNAs are likely to have distinct promoters and enhancers, and also a distinct target spectrum, since their 5′ ends differ from the forward strand, leading to increased versatility for miRNA regulation.

Accurate prediction of mature miRNAs

In addition to the discovery of miRNA genes themselves, comparative genomics can help pinpoint the exact position of cleavage, allowing accurate prediction of the mature miRNA sequence. Knowledge about the precise 5′ end is particularly important because it dictates the core of the target recognition sequence (seed) (Lai 2002; Lewis et al. 2003; Stark et al. 2003; Doench and Sharp 2004; Kiriakidou et al. 2004; Kloosterman et al. 2004; Brennecke et al. 2005). As a likely consequence, miRNA 5′ ends are under strong selection and exhibit higher processing accuracy than miRNA star 5′ ends or the two respective 3′ ends (Ruby et al. 2006). Several distinctive evolutionary and structural features can be used to identify the precise 5′ end of mature miRNAs (Fig. 3A). First, almost all mature miRNAs are perfectly conserved (see above), such that the completely conserved sequence following the miRNA 5′ end is much longer than for most other positions in the hairpin. Second, as observed previously (Lau et al. 2001), >78% of Drosophila miRNA genes start with a 5′ uridine (perhaps reflecting binding affinities of Argonaute proteins or preferred cleavage sites for RNaseIII enzymes), while the overall frequency of uridine in miRNA hairpins is only ~30%. Third, at the structural level, we found that the number of paired bases in a 7-nt window centered at the mature start is highly constrained, requiring at least two to three paired bases, but typically excluding perfect pairing (100% of miRNAs had at least two base pairs in that window; 95% had three or more; but only 25% had perfect pairing). This suggests that miRNA processing cannot occur within large loops, and also that a certain thermodynamic instability is required between the miRNA 5′ ends and the star sequence, which is likely linked to the asymmetric strand loading of miRNAs into RISC (Khvorova et al. 2003; Schwarz et al. 2003).

Figure 3.
Properties of mature miRNAs. (A) Properties mature miRNA 5′ ends. 7-mers complementary to the start of mature miRNAs show a characteristic profile of 3′ UTR motif conservation scores (MCS) and avoidance in 3′ UTR of anti-target ...

In addition to these direct signals, computational prediction of mature miRNAs can use indirect signals, stemming from the relationship of miRNAs with their target genes (Fig. 3A). For example, the 7-mers complementary to miRNA 5′ ends (seed matches) are abundant in 3′ UTRs and highly preserved throughout evolution (Lewis et al. 2003, 2005; Stark et al. 2003; Xie et al. 2005). In contrast, 7-mers complementary to other parts of mature miRNAs are generally under no selective constraint. Moreover, these 7-mer seed matches are generally avoided in 3′ UTRs of ubiquitously expressed genes, thus preventing their targeting by miRNAs, while 7-mers starting at other miRNA positions are generally not avoided (Farh et al. 2005; Stark et al. 2005). Because overlapping 7-mers starting at adjacent nucleotide positions share a large fraction of their occurrences, these conservation and avoidance properties showed a larger characteristic profile, peaking at the true 5′ end and extending with moderate scores over multiple positions surrounding it. Interestingly the overall profile was more discriminative than the measures at individual positions (Fig. 3A).

Using these direct and indirect features of miRNA 5′ ends, we developed a computational framework to predict mature miRNAs (Fig. 3B). We combined each of these properties and adjusted their relative contributions using a support vector machine trained on a nonredundant set of cloned Drosophila miRNAs using cross-validation. We found that the resulting score is much more accurate than either of the scores alone and can pinpoint precise 5′ ends for known and novel miRNA genes: When evaluated based on previously cloned miRNAs (see Methods), we found that our method pinpointed the exact start position in 47/60 cases (78%) and was within 1 bp in 51/60 (85%).

Refined annotation of known miRNA genes leads to refined target spectrum

Among the 14 annotated but not previously cloned Rfam miRNAs, our predictions disagreed with the previous 5′ annotation in nine cases, often by several nucleotides (Fig. 3C). Comparison with our sequencing reads and curated miRNAs from an accompanying paper (Ruby et al. 2007) revealed that, in fact, for six of nine cases our predictions provided a more accurate view of the 5′ end than the previous annotation, pinpointing their exact position in four cases and falling within one nucleotide for the other two (in the five cases where our predictions agreed with the previous annotation, sequencing confirmed four were exactly correct, and one was shifted by 1 nt).

The revised 5′ end annotation of Rfam miRNAs leads to dramatic changes in the inferred target spectrum, according to our published method, which combines seed matches, extended pairing information, and evolutionary conservation (Brennecke et al. 2005; Stark et al. 2005). Strikingly, we found that the overlap in predicted targets between the old and new annotation is as low as 40% for an adjustment of +1 nt, and it drops sharply to 1%–5% target overlap when the discrepancy increases to +2, +3, or more nucleotides (Fig. 3C).

The revised 5′ end annotations also result in several miRNAs now being dissimilar in sequence, family membership, and targeting properties from their currently annotated family members. In particular, miR-2c is no longer a K-box miRNA, and miR-263a no longer matches miR-263b.

Our results emphasize the importance of high-confidence annotations of miRNA 5′ ends to understand the target spectrum, biological function, and family membership of a miRNA. They also illustrate the power of comparative methods to reveal such information, solely based on the genome sequence, when experimental data are missing.

Novel miRNAs show fewer conserved targets and lower miRNA expression

Our prediction accuracy for mature miRNA gene boundaries relies on both structural features, and 7-mer-based scores derived from target 3′ UTRs. Thus, for miRNAs with few or poorly conserved targets, our accuracy may diminish. Indeed, for the 28 novel miRNAs, only 43% of our mature 5′ end predictions were confirmed and a total of 61% were within 1 nt. These show significantly fewer predicted targets on average than previously known miRNAs (100 vs. 154, P < 10−2, counting 5′ dominant and 3′ compensatory sites; Brennecke et al. 2005) and in fact show much lower 7-mer-based scores of motif conservation (MCS of 15 vs. 36; P < 10−5).

We reasoned that these miRNAs may be of overall lower abundance, perhaps explaining why they have been missed by earlier cloning attempts. In fact, when examining the sequencing data of Ruby et al. (2007), these 28 novel miRNAs showed significantly fewer reads on average than previously cloned miRNAs (700 vs. 4337; P < 10−4). Overall, we found a strong correlation (Pearson coefficient: 0.72) between the number of reads supporting a mature miRNA, and the genome-wide motif conservation score of the corresponding 3′ UTR motifs (Fig. 3E).

This strong correlation suggests that the impact of miRNAs on average target 3′ UTRs increases with their abundance, stemming from either their overall expression levels, or range of expression across tissues or developmental stages. miRNAs that are expressed at low levels in restricted domains are likely only able to interact with few genes such that the number of targets but also the number of anti-targets are small. This scenario has, for example, been proposed as a likely evolutionary mechanism for the emergence of novel miRNAs and their targets (Chen and Rajewsky 2007).

Novel miRNA families lead to increased potential for combinatorial regulation

To assess the impact of the new miRNA genes on the miRNA regulatory network, we determined targets of all known and novel miRNAs, applying our previously published algorithm (Brennecke et al. 2005; Stark et al. 2005) and using the validated 5′ end for each miRNA. We find that, despite the relatively smaller number of targets, the novel miRNAs have a significant impact on the overall miRNA regulatory network.

First, they introduce many new miRNA families, as defined by shared 7-mer seeds in positions 2–8 of the mature miRNA. The novel miRNAs were discovered by their structural and evolutionary signatures, rather than by their (sequence) similarity to existing miRNAs, and this is reflected in the many unique seed sequences they introduce. Most novel validated miRNAs (n = 28) have unique 5′ ends, an additional six form new families, and only three extend existing families. Thus, although the novel miRNAs introduce 50% more miRNA genes (from 61 to 94), they result in an even >70% increase in the number of families (from 43 to 73).

Second, although the novel miRNAs introduce 3300 new target sites, these are heavily biased toward genes already targeted by miRNAs. On one hand, the new miRNAs target known target categories, such as developmental genes, and exclude anti-target categories, such as ribosomal genes (Stark et al. 2005). On the other hand, they show great overlap with individual genes already targeted by miRNAs, leading to a denser regulatory network. It is already known that the existing miRNAs network is dense, with most target genes targeted by multiple miRNAs (Enright et al. 2003; Grun et al. 2005; Krek et al. 2005; Lewis et al. 2005; Stark et al. 2005). The novel miRNAs reinforce this dense network, leading to >3000 new sites, but only a small number of novel target genes. Overall, the number of miRNA-targeted genes increases only by a relatively modest 16%, from 3000 genes (21% of the genome) to 3500 genes (25% of the genome; note that the accompanying paper by Ruby et al. [2007] extends this analysis with an enlarged set of novel Drosophila miRNAs).

This effect is not additive, but it is combinatorial. Since the new miRNA genes introduce many new miRNA families, they introduce new sets of target genes, which can be differentially regulated in response to developmental and environmental stimuli; these are distinct from previous sets of targets and cutting across them. The result is a significantly higher potential for combinatorial regulation, giving opportunity for the precise regulation of individual subsets of genes using different combinations of miRNAs, resulting in a much more versatile and precise network of miRNA regulation.

A single miRNA gene can lead to multiple functional mature products

Our ability to score structural and evolutionary properties of the novel miRNAs, and correlate them with sequencing reads, led to new insights into miRNA biogenesis and function. Interestingly, when our algorithm did not predict a precise miRNA 5′ end, the sequencing reads also showed several alternate mature products, with different 5′ ends (Fig. 3D). For example, the predicted alternate 5′ end of miR-964, which is shifted by 1 nt, is supported by more than half of the reads supporting the annotated 5′ end (3461 vs. 6393 in ovaries/testes and 29 vs. 57 otherwise). Overall, when our algorithm predicted the correct 5′ end, 90% of reads supported that start; when our algorithm did not predict the correct start, only 78% of reads supported it (P = 6 × 10−3), and a significant number of hairpins were in fact processed into alternate mature products. Interestingly, more highly expressed miRNAs showed overall more accurate processing (Pearson correlation: 0.39), suggesting that inaccuracies may be tolerated for less abundant miRNAs (or young miRNAs, which are not yet evolutionarily fixed; Chen and Rajewsky 2007), while highly expressed miRNAs are under strong selection for accurate processing. Overall, our results suggest that, when the evolutionary signal is mixed, the signals for processing are also less accurate. It is currently unclear if the alternate forms are regulated and functional, or arise solely due to processing inaccuracies.

Similarly, in several cases, our algorithm yielded high scores for the miRNA* arm, suggesting these may also have functional targets. While in most cases, the miRNA* arm showed no significant signal (Fig. 3B), we found that, for 10 miRNAs, the 5′ end of the star sequence scored as highly as some miRNAs. Since our score is based on 7-mer conservation and avoidance of target sites, these high scores suggest that the miRNA* may be functional. In fact, in four cases the miRNA* scored even more highly than the primary arm: Two of these star products showed convincing similarity to known miRNAs (miR-5 and miR-4; Lai et al. 2004), likely to contribute to the high scores. However, seven of the 10 have unique 5′ ends such that high 7-mer scores cannot be attributed to a known miRNA, suggesting a distinct role of the star arm in targeting 3′ UTRs.

The sequencing reads strongly support these findings. The 10 miRNAs with high-scoring star sequences also showed abundant reads from the star arm. For example, miR-5* is supported by 1142 reads, far exceeding the abundance of many miRNAs and accounting for greater than one-fourth of the mature reads obtained in the mir-5 locus (Fig. 3D). On average, the number of reads increases from 19 to 92 and the fraction of star reads among all reads of the hairpin increases from 2% to 9%, suggesting a specifically increased preference of the star arm (Fig. 3D). The abundance of star sequences alone could be explained by RISC incorporation of star arms, due to the thermodynamic stability of the miRNA–miRNA* duplex (Khvorova et al. 2003; Schwarz et al. 2003), and thus star sequences have not received much attention in miRNA target prediction (Lai 2004; Rajewsky 2006). However, the observation that abundant miRNA star sequences also show strong signals stemming from their interaction with target gene 3′ UTRs suggests they may in fact have physiologically relevant targets.

mir-10 as a master Hox regulator

The miRNA mir-10, which lies in the Hox cluster, presents a particularly striking case of miRNA* functionality (Fig. 4). On one hand, the miRNA* sequence for mir-10 shows an even higher score than the mature miRNA itself, in terms of structure, conservation, and also 7-mer motif conservation and avoidance (Fig. 4A). Moreover, we found 34 times as many products of miR-10* than miR-10 in our reads from ovaries and testes (306 vs. 9; Fig. 4B) and this trend also holds for other tissues (1319 vs. 189; Ruby et al. 2007). Further, an independent analysis of miRNA–miRNA* duplex energy showed that both miR-10 and miR-10* can be detected and suggested that both are likely incorporated into RISC (Schwarz et al. 2003). We thus reason that miR-10* may be functional and is in fact likely to represent the primary product.

Figure 4.
miR-10 and miR-10* target Hox genes. (A) SVM Z-scores indicate that miR-10 and miR-10* are both functional and that miR-10* is likely the major miRNA (green and red are positive and negative scores, respectively). (B) Cloning confirms that both sequences ...

To infer the potential functional role of miR-10*, we studied its target spectrum, revealing striking insights. First, miR-10* has nearly 10 times more targets than the annotated miR-10, suggesting it plays a major role in gene regulation (Fig. 4C). Moreover, these targets include several Hox genes, in particular Abdominal-B (Abd-B) and Ultrabithorax (Ubx), which show highly conserved canonical target sites (Fig. 4D). This provides another example of regulatory relationships between Hox miRNAs and Hox genes, highly reminiscent of two additional Hox miRNAs, mir-iab-4 in Drosophila, and mir-196 in vertebrates (Stark et al. 2003; Yekta et al. 2004; Ronshaugen et al. 2005).

Unique to mir-10, however, both arms appear to be functional and appear to be targeting Hox genes: The originally annotated miR-10 has a highly conserved compensatory target site in the 3′ UTR of Sex-combs-reduced (Scr; Brennecke et al. 2005; Fig. 4D). To our knowledge, this is the first demonstration of functional relevance of both miRNA arms, and their potential cooperative action, targeting similar genes.

Overall, our results suggest that mir-10 may in fact play the role of a master Hox regulator, which was previously missed likely due to its incomplete annotation. The observation that both arms of a miRNA are expressed and have highly conserved targets in similar processes suggests that miRNA-star arms might frequently be functional, with major implications for the identification of functional miRNA targets and overall role and versatility of miRNA regulation.

Discussion

A complete knowledge of all miRNAs is extremely important, especially in Drosophila, where the study of mutants from genetic screens heavily relies on accurate genome annotation and where powerful reverse genetics tools allow the systematic analysis of miRNA functions. The recent availability of 12 closely related Drosophila genomes enables the use of comparative genomics for recognizing miRNAs systematically, based on their structural and evolutionary signatures. We report >41 novel predicted miRNAs and validate 28 experimentally using sequencing reads from small RNA library sequencing (see Supplemental material; Ruby et al. 2007). In addition, the predictions show several miRNA-like properties, which were not used in their discovery, including genomic clustering, family membership, intronic occurrences, and transcription.

The newly discovered miRNAs significantly increase the number of miRNA genes known in the fly, from 60 prior to this work to 89 experimentally confirmed genes (a 50% increase) and 17 additional candidates. This increase is even more drastic when considering miRNA families: The new miRNAs introduce 30 new miRNA families (a 70% increase), resulting in increased versatility by forming new groups of potentially coregulated target genes. Finally, we found that the genes targeted by these new miRNAs heavily overlap with existing miRNA targets, resulting in a denser miRNA network.

In addition to miRNA annotation, we find that comparative genomics can lead to a deeper understanding of the functions and biogenesis of miRNAs. We show that a single miRNA gene can give rise to several mature sequences: First, both strands can be expressed and correctly processed; second, multiple mature products can be produced at small offsets from the primary miRNA product; third, both arms of the miRNA hairpin can lead to mature miRNAs, with many potential functional targets. These alternate mature forms are sometimes of surprisingly similar abundance, share similarly strong evolutionary and structural signatures, and show similar relationships with target 3′ UTRs. As these alternate forms often have drastically different target gene spectra, they could be employed to increase the number of targets for each miRNA gene and may constitute a powerful evolutionary mechanism for the emergence of new miRNAs.

Particularly striking was the relationship between miRNAs and miRNA stars. For 13 miRNAs, both hairpin arms scored highly, reflecting their interaction with miRNA targets and anti-targets and indicating that star arms might have previously overlooked functional roles. In the case of mir-10, located in the Hox cluster, we find highly conserved target genes for both arms, targeting multiple distinct genes in the Hox cluster, suggesting mir-10 is a master regulator of Hox genes, while, previously, Hox miRNAs were assumed to only have a single dominant Hox target gene each (Stark et al. 2003; Yekta et al. 2004; Ronshaugen et al. 2005). As miRNA and miRNA* are coexpressed, the potential of sharing a functional target spectrum is intriguing and has major implications for miRNA targeting and biology.

Conversely, mature products from opposite DNA strands have the potential of distinct regulatory domains. For example, for the mir-iab-4 locus, we predict high scores for both strands, and both are validated by sequencing reads as expressed and correctly processed. These two strands are known to be transcribed in adjacent and nonoverlapping domains in fly embryos (Bae et al. 2002), suggesting distinct developmental functions for two miRNAs from a single locus. In our study, we find four such examples of opposite-strand high-scoring miRNAs that are confirmed by sequencing reads. The extent of this mechanism is still unclear, but reverse-strand expression may represent a novel biologically relevant regulatory principle for miRNA genes and RNAs more generally.

The methods presented here are general and applicable for the systematic annotation of any species. In particular, this study can serve as a powerful model for the analysis of the human genome: The total evolutionary distance across the 12 flies is comparable to the total evolutionary distance of the mammalian and vertebrate genomes (Stark et al. 2007) and, given the much larger number of 474 annotated human miRNAs (Griffiths-Jones et al. 2006), comparative approaches may reveal many new insights into the biology, regulation, and targeting of human miRNAs. By measuring specific evolutionary constraints, comparative studies can complement ongoing sequencing efforts by recognizing miRNA genes with low abundance or tissue-specific expression, and help indicate if sequencing reads matching hairpin structures constitute noise or functional miRNAs (Berezikov et al. 2006a). More importantly, though, they can reveal biological insights that are not directly accessible from sequencing reads alone. More generally outside the realm of miRNAs, the ability to define precise evolutionary and structural signatures for specific classes of RNA genes represents a powerful approach for the systematic discovery of functional RNA genes and structures in any species.

Methods

miRNA training sets

From miRBase release 9.0 (Griffiths-Jones et al. 2006), we selected all 60 D. melanogaster miRNAs that have been cloned. For both hairpin and mature prediction, miRNAs that may bias the score due to overfitting are excluded when scoring the known miRNAs (see the appropriate section below for details).

Collecting all melanogaster hairpins

To identify miRNA-like hairpins, we ran RNAfold from the Vienna package (Hofacker et al. 1994) on 120-nt windows (overlap of 90 nt) in the D. melanogaster genome (rel 4). We considered all hairpins in each window (including branching hairpins) and trimmed them to the end of the stem. We use these folds to infer the arms and loop of each hairpin. As a lenient prescreening, we removed all hairpins <63 nt, with an arm of <20 nt or with <70% arm base-pairing. We were left with all the known hairpins and an additional list of 760,000 potentially overlapping putative miRNA hairpins.

Hairpin sequence alignments

For each melanogaster hairpin sequence, we selected the best BLAST (Altschul et al. 1997) match with E-value ≤ 1 × 10−5 in each of the 11 other genomes (CAF1 assemblies). We performed a multiple alignment of the corresponding sequences plus 50 nt flanking sequence on each side using ClustalW (Thompson et al. 1994).

miRNA hairpin discovery

For each hairpin we derived several structural and conservation features. The most important of these features are summarized in Figure 1E and the complete list of features is available in Supplemental Table 2. We scored the list of 760,355 putative miRNAs with a method similar to Random Forests (Breiman 2001). Using the combined conservation and structural feature set, 500 decision trees were trained on the positive training set of 60 miRNAs and a different randomly selected negative set of 250 of the remaining putative miRNAs. The final score for a hairpin was derived through cross validation (the score of each hairpin is evaluated only by trees that exclude it, all redundant sequence similar miRNAs and overlapping miRNAs). From all hairpins that overlap on the same strand, only the hairpin with the highest score is kept. This can lead to known miRNAs having a slightly revised hairpin selected.

miRNA mature 5′ identification

For each position in the hairpin, we computed several features indicative of the start (5′ end) of mature miRNAs. 7-mer scores are determined for the sequence complementary 7-mers for each position in the hairpin. 7-mer conservation scores are motif-conservation scores (MCS) of 7-mers calculated in all annotated 3′ UTRs (FlyBase release 4.3) as described in Kellis et al. (2003, 2004) and Xie et al. (2005). Additionally, we assessed the avoidance of the 7-mers in 3′ UTRs of global anti-target genes (Stark et al. 2005) by computing the deviation relative to all genes by Z-scores. In addition, we considered the nucleotide to account for the U-bias often observed in mature miRNAs, the number of paired bases in a window of 7 around the position, and others (see Supplemental Table 2). We excluded potential start positions for which the corresponding miRNA would fall outside the hairpin or span the hairpin loop region (for positions in the left arm, we required at least 15 nt before the start of the loop; for positions in the right arm, we allowed no more than a 3-nt overlap with the loop and required at least 18 nt before the end of the hairpin). Within each hairpin, we linearly normalized each feature to be from 0 to 1 and marked each known mature site as a positive and all remaining sites as negative. We augmented the features for each position with the features of the position of the left and the right. We used the SVMlight (Joachims 1999) package to train an SVM with default parameters (linear kernel and positive gain 1) on all the permissible locations from all the known hairpins. The SVM scores for each hairpin are linearly normalized so that the scores of the permissible regions have mean 0 and standard deviation 1. We predict the mature location by taking the permissible location in each hairpin with the highest SVM score. Each hairpin is only scored by models trained on cloned Rfam hairpins (Rfam 9.0), excluding itself and all family members. For evaluation (not training), we use the partly corrected 5′ end annotation of Ruby et al. (2007). To test if we predicted the star sequences, we determined the star sequence based on the fold-back structure as a 2-nt 5′ overhang of the mature miRNA sequence.

Validation of novel miRNAs

To validate our predictions experimentally, we obtained 763,111 Solexa reads corresponding to 1524 distinct sequences that matched to our predictions. These were cloned from adult Drosophila ovaries and testes as described previously (Brennecke et al. 2007). We excluded short reads (<15 nt) and those that matched the genome more than three times and aligned the remaining reads to the predicted hairpins. For validation, we required that at least one position in the hairpin was supported by at least 10 reads and manually inspected the alignments for miRNA-like processing patterns (e.g., dominant sequence, presence of star sequence, no sign of degradation products). In addition, we intersected the predicted hairpins with curated sequencing reads of several Drosophila libraries kindly provided by Graham Ruby, David Bartel, and Eric Lai. To validate mature miRNA 5′ end predictions, we used the curated mature miRNAs reported by Ruby et al. (2007) and refer to them by their newly assigned Rfam names (Griffiths-Jones et al. 2006). In Figures 2, ,3,3, ,4,4, we report the sum of reads from Solexa and 454 sequencing when showing individual miRNAs. However, given the large number of Solexa reads stemming from only ovaries and testes, we report only the 454 read-count when comparing several miRNAs.

miRNA recovery using different species sets

We investigated the dependency of genome-wide miRNA discovery on the number and evolutionary distance of the contributing species. For this, we obtained all novel miRNAs defined by Ruby et al. (2007) and tested how many we recovered with our protocol when using selected subsets of species. In each case, we allowed for the optimal re-weighting of feature contributions (e.g., to down-weight conservation features when comparing only close species, if appropriate).

Acknowledgments

We thank Graham Ruby and David Bartel (Whitehead Institute/HHMI, MA) and Eric Lai (Sloan-Kettering Institute, NY) for providing access to their sequencing data prior to publication and for helpful comments on the final manuscript. We thank Matt Rasmussen, Mike Lin (CSAIL, Broad Institute, MA), and other members of the Kellis lab for helpful discussions and for sharing unpublished data. A.S. and J.B. thank the Schering AG/Ernst Schering Foundation for postdoctoral fellowships. P.K. was supported in part by a National Science Foundation Graduate Research Fellowship. L.P. thanks Jaak Vilo (University of Tartu, Estonia) for support and helpful discussions.

Footnotes

[Supplemental material is available online at www.genome.org. All data and predictions are available at http://compbio.mit.edu/fly/mirnas/.]

Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6593807

References

  • Aboobaker A.A., Tomancak P., Patel N., Rubin G.M., Lai E.C., Tomancak P., Patel N., Rubin G.M., Lai E.C., Patel N., Rubin G.M., Lai E.C., Rubin G.M., Lai E.C., Lai E.C. Drosophila microRNAs exhibit diverse spatial expression patterns during embryonic development. Proc. Natl. Acad. Sci. 2005;102:18017–18022. [PMC free article] [PubMed]
  • Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J., Zhang J., Zhang Z., Miller W., Lipman D.J., Zhang Z., Miller W., Lipman D.J., Miller W., Lipman D.J., Lipman D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
  • Alvarez-Garcia I., Miska E.A., Miska E.A. MicroRNA functions in animal development and human disease. Development. 2005;132:4653–4662. [PubMed]
  • Ambros V. The functions of animal microRNAs. Nature. 2004;431:350–355. [PubMed]
  • Aravin A.A., Lagos-Quintana M., Yalcin A., Zavolan M., Marks D., Snyder B., Gaasterland T., Meyer J., Tuschl T., Lagos-Quintana M., Yalcin A., Zavolan M., Marks D., Snyder B., Gaasterland T., Meyer J., Tuschl T., Yalcin A., Zavolan M., Marks D., Snyder B., Gaasterland T., Meyer J., Tuschl T., Zavolan M., Marks D., Snyder B., Gaasterland T., Meyer J., Tuschl T., Marks D., Snyder B., Gaasterland T., Meyer J., Tuschl T., Snyder B., Gaasterland T., Meyer J., Tuschl T., Gaasterland T., Meyer J., Tuschl T., Meyer J., Tuschl T., Tuschl T. The small RNA profile during Drosophila melanogaster development. Dev. Cell. 2003;5:337–350. [PubMed]
  • Bae E., Calhoun V.C., Levine M., Lewis E.B., Drewell R.A., Calhoun V.C., Levine M., Lewis E.B., Drewell R.A., Levine M., Lewis E.B., Drewell R.A., Lewis E.B., Drewell R.A., Drewell R.A. Characterization of the intergenic RNA profile at abdominal-A and Abdominal-B in the Drosophila bithorax complex. Proc. Natl. Acad. Sci. 2002;99:16847–16852. [PMC free article] [PubMed]
  • Bartel D.P. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. [PubMed]
  • Baskerville S., Bartel D.P., Bartel D.P. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA. 2005;11:241–247. [PMC free article] [PubMed]
  • Bentwich I., Avniel A., Karov Y., Aharonov R., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Avniel A., Karov Y., Aharonov R., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Karov Y., Aharonov R., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Aharonov R., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Barzilai A., Einat P., Einav U., Meiri E., Einat P., Einav U., Meiri E., Einav U., Meiri E., Meiri E., et al. Identification of hundreds of conserved and nonconserved human microRNAs. Nat. Genet. 2005a;37:766–770. [PubMed]
  • Bentwich I., Avniel A., Karov Y., Aharonov R., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Avniel A., Karov Y., Aharonov R., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Karov Y., Aharonov R., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Aharonov R., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Gilad S., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Barad O., Barzilai A., Einat P., Einav U., Meiri E., Barzilai A., Einat P., Einav U., Meiri E., Einat P., Einav U., Meiri E., Einav U., Meiri E., Meiri E., et al. Identification of hundreds of conserved and nonconserved human microRNAs. Nat. Genet. 2005b;37:766–770. [PubMed]
  • Berezikov E., Guryev V., de van Belt J., Wienholds E., Plasterk R.H., Cuppen E., Guryev V., de van Belt J., Wienholds E., Plasterk R.H., Cuppen E., de van Belt J., Wienholds E., Plasterk R.H., Cuppen E., Wienholds E., Plasterk R.H., Cuppen E., Plasterk R.H., Cuppen E., Cuppen E. Phylogenetic shadowing and computational identification of human microRNA genes. Cell. 2005;120:21–24. [PubMed]
  • Berezikov E., Cuppen E., Plasterk R.H., Cuppen E., Plasterk R.H., Plasterk R.H. Approaches to microRNA discovery. Nat. Genet. 2006a;38:S2–S7. [PubMed]
  • Berezikov E., Thuemmler F., van Laake L.W., Kondova I., Bontrop R., Cuppen E., Plasterk R.H., Thuemmler F., van Laake L.W., Kondova I., Bontrop R., Cuppen E., Plasterk R.H., van Laake L.W., Kondova I., Bontrop R., Cuppen E., Plasterk R.H., Kondova I., Bontrop R., Cuppen E., Plasterk R.H., Bontrop R., Cuppen E., Plasterk R.H., Cuppen E., Plasterk R.H., Plasterk R.H. Diversity of microRNAs in human and chimpanzee brain. Nat. Genet. 2006b;38:1375–1377. [PubMed]
  • Breiman L. Random forests. Mach. Learn. 2001;45:5–32.
  • Brennecke J., Hipfner D.R., Stark A., Russell R.B., Cohen S.M., Hipfner D.R., Stark A., Russell R.B., Cohen S.M., Stark A., Russell R.B., Cohen S.M., Russell R.B., Cohen S.M., Cohen S.M. bantam encodes a developmentally regulated microRNA that controls cell proliferation and regulates the proapoptotic gene hid in Drosophila. Cell. 2003;113:25–36. [PubMed]
  • Brennecke J., Stark A., Russell R.B., Cohen S.M., Stark A., Russell R.B., Cohen S.M., Russell R.B., Cohen S.M., Cohen S.M. Principles of microRNA-target recognition. PLoS Biol. 2005;3:e85. doi: 10.1371/journal.pbio.0030085. [PMC free article] [PubMed] [Cross Ref]
  • Brennecke J., Aravin A.A., Stark A., Dus M., Kellis M., Sachidanandam R., Hannon G.J., Aravin A.A., Stark A., Dus M., Kellis M., Sachidanandam R., Hannon G.J., Stark A., Dus M., Kellis M., Sachidanandam R., Hannon G.J., Dus M., Kellis M., Sachidanandam R., Hannon G.J., Kellis M., Sachidanandam R., Hannon G.J., Sachidanandam R., Hannon G.J., Hannon G.J. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell. 2007;128:1089–1103. [PubMed]
  • Chen K., Rajewsky N., Rajewsky N. The evolution of gene regulation by transcription factors and microRNAs. Nat. Rev. Genet. 2007;8:93–103. [PubMed]
  • Davidson E.H., Erwin D.H., Erwin D.H. Gene regulatory networks and the evolution of animal body plans. Science. 2006;311:796–800. [PubMed]
  • Doench J.G., Sharp P.A., Sharp P.A. Specificity of microRNA target selection in translational repression. Genes & Dev. 2004;18:504–511. [PMC free article] [PubMed]
  • Drosophila 12 Genomes Consortium Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007 doi: 10.1038/nature06341. (in press) [PubMed] [Cross Ref]
  • Enright A.J., John B., Gaul U., Tuschl T., Sander C., Marks D.S., John B., Gaul U., Tuschl T., Sander C., Marks D.S., Gaul U., Tuschl T., Sander C., Marks D.S., Tuschl T., Sander C., Marks D.S., Sander C., Marks D.S., Marks D.S. MicroRNA targets in Drosophila. Genome Biol. 2003;5:R1. doi: 10.1186/gb-2003-5-1-r1. [PMC free article] [PubMed] [Cross Ref]
  • Farh K.K., Grimson A., Jan C., Lewis B.P., Johnston W.K., Lim L.P., Burge C.B., Bartel D.P., Grimson A., Jan C., Lewis B.P., Johnston W.K., Lim L.P., Burge C.B., Bartel D.P., Jan C., Lewis B.P., Johnston W.K., Lim L.P., Burge C.B., Bartel D.P., Lewis B.P., Johnston W.K., Lim L.P., Burge C.B., Bartel D.P., Johnston W.K., Lim L.P., Burge C.B., Bartel D.P., Lim L.P., Burge C.B., Bartel D.P., Burge C.B., Bartel D.P., Bartel D.P. The widespread impact of mammalian microRNAs on mRNA repression and evolution. Science. 2005;310:1817–1821. [PubMed]
  • Griffiths-Jones S., Grocock R.J., van Dongen S., Bateman A., Enright A.J., Grocock R.J., van Dongen S., Bateman A., Enright A.J., van Dongen S., Bateman A., Enright A.J., Bateman A., Enright A.J., Enright A.J. miRBase: MicroRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006;34:D140–D144. doi: 10.1093/nar/gkj112. [PMC free article] [PubMed] [Cross Ref]
  • Grun D., Wang Y.L., Langenberger D., Gunsalus K.C., Rajewsky N., Wang Y.L., Langenberger D., Gunsalus K.C., Rajewsky N., Langenberger D., Gunsalus K.C., Rajewsky N., Gunsalus K.C., Rajewsky N., Rajewsky N. microRNA target predictions across seven Drosophila species and comparison to mammalian targets. PLoS Comput. Biol. 2005;1:e13. doi: 10.1371/journal.pcbi.0010013. [PMC free article] [PubMed] [Cross Ref]
  • Han J., Lee Y., Yeom K.H., Nam J.W., Heo I., Rhee J.K., Sohn S.Y., Cho Y., Zhang B.T., Kim V.N., Lee Y., Yeom K.H., Nam J.W., Heo I., Rhee J.K., Sohn S.Y., Cho Y., Zhang B.T., Kim V.N., Yeom K.H., Nam J.W., Heo I., Rhee J.K., Sohn S.Y., Cho Y., Zhang B.T., Kim V.N., Nam J.W., Heo I., Rhee J.K., Sohn S.Y., Cho Y., Zhang B.T., Kim V.N., Heo I., Rhee J.K., Sohn S.Y., Cho Y., Zhang B.T., Kim V.N., Rhee J.K., Sohn S.Y., Cho Y., Zhang B.T., Kim V.N., Sohn S.Y., Cho Y., Zhang B.T., Kim V.N., Cho Y., Zhang B.T., Kim V.N., Zhang B.T., Kim V.N., Kim V.N. Molecular basis for the recognition of primary microRNAs by the Drosha–DGCR8 complex. Cell. 2006;125:887–901. [PubMed]
  • Hobert O. Common logic of transcription factor and microRNA action. Trends Biochem. Sci. 2004;29:462–468. [PubMed]
  • Hofacker I.L., Fontana W., Stadler P.F., Bonhoeffer L.S., Tacker M., Schuster P., Fontana W., Stadler P.F., Bonhoeffer L.S., Tacker M., Schuster P., Stadler P.F., Bonhoeffer L.S., Tacker M., Schuster P., Bonhoeffer L.S., Tacker M., Schuster P., Tacker M., Schuster P., Schuster P. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie / Chemical Monthly. 1994;125:167–188.
  • Hofacker I.L., Fekete M., Stadler P.F., Fekete M., Stadler P.F., Stadler P.F. Secondary structure prediction for aligned RNA sequences. J. Mol. Biol. 2002;319:1059–1066. [PubMed]
  • Joachims T. Making large-scale SVM learning practical. In: Schölkopf B., et al., editors. Advances in kernel methods—Support vector learning. MIT Press; Cambridge, MA: 1999. pp. 41–56.
  • Johnston R.J., Hobert O., Hobert O. A microRNA controlling left/right neuronal asymmetry in Caenorhabditis elegans. Nature. 2003;426:845–849. [PubMed]
  • Kellis M., Patterson N., Endrizzi M., Birren B., Lander E.S., Patterson N., Endrizzi M., Birren B., Lander E.S., Endrizzi M., Birren B., Lander E.S., Birren B., Lander E.S., Lander E.S. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003;423:241–254. [PubMed]
  • Kellis M., Patterson N., Birren B., Berger B., Lander E.S., Patterson N., Birren B., Berger B., Lander E.S., Birren B., Berger B., Lander E.S., Berger B., Lander E.S., Lander E.S. Methods in comparative genomics: Genome correspondence, gene identification and regulatory motif discovery. J. Comput. Biol. 2004;11:319–355. [PubMed]
  • Khvorova A., Reynolds A., Jayasena S.D., Reynolds A., Jayasena S.D., Jayasena S.D. Functional siRNAs and miRNAs exhibit strand bias. Cell. 2003;115:209–216. [PubMed]
  • Kiriakidou M., Nelson P.T., Kouranov A., Fitziev P., Bouyioukos C., Mourelatos Z., Hatzigeorgiou A., Nelson P.T., Kouranov A., Fitziev P., Bouyioukos C., Mourelatos Z., Hatzigeorgiou A., Kouranov A., Fitziev P., Bouyioukos C., Mourelatos Z., Hatzigeorgiou A., Fitziev P., Bouyioukos C., Mourelatos Z., Hatzigeorgiou A., Bouyioukos C., Mourelatos Z., Hatzigeorgiou A., Mourelatos Z., Hatzigeorgiou A., Hatzigeorgiou A. A combined computational-experimental approach predicts human microRNA targets. Genes & Dev. 2004;18:1165–1178. [PMC free article] [PubMed]
  • Kloosterman W.P., Wienholds E., Ketting R.F., Plasterk R.H., Wienholds E., Ketting R.F., Plasterk R.H., Ketting R.F., Plasterk R.H., Plasterk R.H. Substrate requirements for let-7 function in the developing zebrafish embryo. Nucleic Acids Res. 2004;32:6284–6291. doi: 10.1093/nar/gkh968. [PMC free article] [PubMed] [Cross Ref]
  • Krek A., Grun D., Poy M.N., Wolf R., Rosenberg L., Epstein E.J., Macmenamin P., da Piedade I., Gunsalus K.C., Stoffel M., Grun D., Poy M.N., Wolf R., Rosenberg L., Epstein E.J., Macmenamin P., da Piedade I., Gunsalus K.C., Stoffel M., Poy M.N., Wolf R., Rosenberg L., Epstein E.J., Macmenamin P., da Piedade I., Gunsalus K.C., Stoffel M., Wolf R., Rosenberg L., Epstein E.J., Macmenamin P., da Piedade I., Gunsalus K.C., Stoffel M., Rosenberg L., Epstein E.J., Macmenamin P., da Piedade I., Gunsalus K.C., Stoffel M., Epstein E.J., Macmenamin P., da Piedade I., Gunsalus K.C., Stoffel M., Macmenamin P., da Piedade I., Gunsalus K.C., Stoffel M., da Piedade I., Gunsalus K.C., Stoffel M., Gunsalus K.C., Stoffel M., Stoffel M., et al. Combinatorial microRNA target predictions. Nat. Genet. 2005;37:495–500. [PubMed]
  • Lagos-Quintana M., Rauhut R., Lendeckel W., Tuschl T., Rauhut R., Lendeckel W., Tuschl T., Lendeckel W., Tuschl T., Tuschl T. Identification of novel genes coding for small expressed RNAs. Science. 2001;294:853–858. [PubMed]
  • Lai E.C. Micro RNAs are complementary to 3′ UTR sequence motifs that mediate negative post-transcriptional regulation. Nat. Genet. 2002;30:363–364. [PubMed]
  • Lai E.C. microRNAs: Runts of the genome assert themselves. Curr. Biol. 2003;13:R925–R936. [PubMed]
  • Lai E.C. Predicting and validating microRNA targets. Genome Biol. 2004;5:115. doi: 10.1186/gb-2004-5-9-115. [PMC free article] [PubMed] [Cross Ref]
  • Lai E.C., Posakony J.W., Posakony J.W. The Bearded box, a novel 3′ UTR sequence motif, mediates negative post-transcriptional regulation of Bearded and Enhancer of split Complex gene expression. Development. 1997;124:4847–4856. [PubMed]
  • Lai E.C., Burks C., Posakony J.W., Burks C., Posakony J.W., Posakony J.W. The K box, a conserved 3′ UTR sequence motif, negatively regulates accumulation of enhancer of split complex transcripts. Development. 1998;125:4077–4088. [PubMed]
  • Lai E.C., Tomancak P., Williams R.W., Rubin G.M., Tomancak P., Williams R.W., Rubin G.M., Williams R.W., Rubin G.M., Rubin G.M. Computational identification of Drosophila microRNA genes. Genome Biol. 2003;4:R42. doi: 10.1186/gb-2003-4-7-r42. [PMC free article] [PubMed] [Cross Ref]
  • Lai E.C., Wiel C., Rubin G.M., Wiel C., Rubin G.M., Rubin G.M. Complementary miRNA pairs suggest a regulatory role for miRNA:miRNA duplexes. RNA. 2004;10:171–175. [PMC free article] [PubMed]
  • Lau N.C., Lim L.P., Weinstein E.G., Bartel D.P., Lim L.P., Weinstein E.G., Bartel D.P., Weinstein E.G., Bartel D.P., Bartel D.P. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science. 2001;294:858–862. [PubMed]
  • Lee R.C., Ambros V., Ambros V. An extensive class of small RNAs in Caenorhabditis elegans. Science. 2001;294:862–864. [PubMed]
  • Lee R.C., Feinbaum R.L., Ambros V., Feinbaum R.L., Ambros V., Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. [PubMed]
  • Lewis B.P., Shih I.H., Jones-Rhoades M.W., Bartel D.P., Burge C.B., Shih I.H., Jones-Rhoades M.W., Bartel D.P., Burge C.B., Jones-Rhoades M.W., Bartel D.P., Burge C.B., Bartel D.P., Burge C.B., Burge C.B. Prediction of mammalian microRNA targets. Cell. 2003;115:787–798. [PubMed]
  • Lewis B.P., Burge C.B., Bartel D.P., Burge C.B., Bartel D.P., Bartel D.P. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15–20. [PubMed]
  • Li X., Carthew R.W., Carthew R.W. A microRNA mediates EGF receptor signaling and promotes photoreceptor differentiation in the Drosophila eye. Cell. 2005;123:1267–1277. [PubMed]
  • Li Y., Wang F., Lee J.A., Gao F.B., Wang F., Lee J.A., Gao F.B., Lee J.A., Gao F.B., Gao F.B. MicroRNA-9a ensures the precise specification of sensory organ precursors in Drosophila. Genes & Dev. 2006;20:2793–2805. [PMC free article] [PubMed]
  • Lim L.P., Glasner M.E., Yekta S., Burge C.B., Bartel D.P., Glasner M.E., Yekta S., Burge C.B., Bartel D.P., Yekta S., Burge C.B., Bartel D.P., Burge C.B., Bartel D.P., Bartel D.P. Vertebrate microRNA genes. Science. 2003a;299:1540. [PubMed]
  • Lim L.P., Lau N.C., Weinstein E.G., Abdelhakim A., Yekta S., Rhoades M.W., Burge C.B., Bartel D.P., Lau N.C., Weinstein E.G., Abdelhakim A., Yekta S., Rhoades M.W., Burge C.B., Bartel D.P., Weinstein E.G., Abdelhakim A., Yekta S., Rhoades M.W., Burge C.B., Bartel D.P., Abdelhakim A., Yekta S., Rhoades M.W., Burge C.B., Bartel D.P., Yekta S., Rhoades M.W., Burge C.B., Bartel D.P., Rhoades M.W., Burge C.B., Bartel D.P., Burge C.B., Bartel D.P., Bartel D.P. The microRNAs of Caenorhabditis elegans. Genes & Dev. 2003b;17:991–1008. [PMC free article] [PubMed]
  • Lin M.F., Carlson J.W., Crosby M.A., Matthews B.B., Yu C., Park S., Wan K.H., Schroeder A.J., Gramates L.S., St Pierre S.E., Carlson J.W., Crosby M.A., Matthews B.B., Yu C., Park S., Wan K.H., Schroeder A.J., Gramates L.S., St Pierre S.E., Crosby M.A., Matthews B.B., Yu C., Park S., Wan K.H., Schroeder A.J., Gramates L.S., St Pierre S.E., Matthews B.B., Yu C., Park S., Wan K.H., Schroeder A.J., Gramates L.S., St Pierre S.E., Yu C., Park S., Wan K.H., Schroeder A.J., Gramates L.S., St Pierre S.E., Park S., Wan K.H., Schroeder A.J., Gramates L.S., St Pierre S.E., Wan K.H., Schroeder A.J., Gramates L.S., St Pierre S.E., Schroeder A.J., Gramates L.S., St Pierre S.E., Gramates L.S., St Pierre S.E., St Pierre S.E., et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes. Genome Res. 2007;(this issue) doi: 10.1101/gr.6679507. [PMC free article] [PubMed] [Cross Ref]
  • Manak J.R., Dike S., Sementchenko V., Kapranov P., Biemar F., Long J., Cheng J., Bell I., Ghosh S., Piccolboni A., Dike S., Sementchenko V., Kapranov P., Biemar F., Long J., Cheng J., Bell I., Ghosh S., Piccolboni A., Sementchenko V., Kapranov P., Biemar F., Long J., Cheng J., Bell I., Ghosh S., Piccolboni A., Kapranov P., Biemar F., Long J., Cheng J., Bell I., Ghosh S., Piccolboni A., Biemar F., Long J., Cheng J., Bell I., Ghosh S., Piccolboni A., Long J., Cheng J., Bell I., Ghosh S., Piccolboni A., Cheng J., Bell I., Ghosh S., Piccolboni A., Bell I., Ghosh S., Piccolboni A., Ghosh S., Piccolboni A., Piccolboni A., et al. Biological function of unannotated transcription during the early development of Drosophila melanogaster. Nat. Genet. 2006;38:1151–1158. [PubMed]
  • Miller W., Makova K.D., Nekrutenko A., Hardison R.C., Makova K.D., Nekrutenko A., Hardison R.C., Nekrutenko A., Hardison R.C., Hardison R.C. Comparative genomics. Annu. Rev. Genomics Hum. Genet. 2004;5:15–56. [PubMed]
  • Mitchell T.M. Machine learning. McGraw-Hill; New York: 1997.
  • Ng Kwang Loong S., Mishra S.K., Mishra S.K. Unique folding of precursor microRNAs: Quantitative evidence and implications for de novo identification. RNA. 2007;13:170–187. [PMC free article] [PubMed]
  • Okabe M., Imai T., Kurusu M., Hiromi Y., Okano H., Imai T., Kurusu M., Hiromi Y., Okano H., Kurusu M., Hiromi Y., Okano H., Hiromi Y., Okano H., Okano H. Translational repression determines a neuronal potential in Drosophila asymmetric cell division. Nature. 2001;411:94–98. [PubMed]
  • Pasquinelli A.E., Reinhart B.J., Slack F., Martindale M.Q., Kuroda M.I., Maller B., Hayward D.C., Ball E.E., Degnan B., Muller P., Reinhart B.J., Slack F., Martindale M.Q., Kuroda M.I., Maller B., Hayward D.C., Ball E.E., Degnan B., Muller P., Slack F., Martindale M.Q., Kuroda M.I., Maller B., Hayward D.C., Ball E.E., Degnan B., Muller P., Martindale M.Q., Kuroda M.I., Maller B., Hayward D.C., Ball E.E., Degnan B., Muller P., Kuroda M.I., Maller B., Hayward D.C., Ball E.E., Degnan B., Muller P., Maller B., Hayward D.C., Ball E.E., Degnan B., Muller P., Hayward D.C., Ball E.E., Degnan B., Muller P., Ball E.E., Degnan B., Muller P., Degnan B., Muller P., Muller P., et al. Conservation of the sequence and temporal expression of let-7 heterochronic regulatory RNA. Nature. 2000;408:86–89. [PubMed]
  • Pekarsky Y., Santanam U., Cimmino A., Palamarchuk A., Efanov A., Maximov V., Volinia S., Alder H., Liu C.G., Rassenti L., Santanam U., Cimmino A., Palamarchuk A., Efanov A., Maximov V., Volinia S., Alder H., Liu C.G., Rassenti L., Cimmino A., Palamarchuk A., Efanov A., Maximov V., Volinia S., Alder H., Liu C.G., Rassenti L., Palamarchuk A., Efanov A., Maximov V., Volinia S., Alder H., Liu C.G., Rassenti L., Efanov A., Maximov V., Volinia S., Alder H., Liu C.G., Rassenti L., Maximov V., Volinia S., Alder H., Liu C.G., Rassenti L., Volinia S., Alder H., Liu C.G., Rassenti L., Alder H., Liu C.G., Rassenti L., Liu C.G., Rassenti L., Rassenti L., et al. Tcl1 expression in chronic lymphocytic leukemia is regulated by miR-29 and miR-181. Cancer Res. 2006;66:11590–11593. [PubMed]
  • Plasterk R.H. Micro RNAs in animal development. Cell. 2006;124:877–881. [PubMed]
  • Rajewsky N. microRNA target predictions in animals. Nat. Genet. 2006;38:S8–S13. [PubMed]
  • Richards S., Liu Y., Bettencourt B.R., Hradecky P., Letovsky S., Nielsen R., Thornton K., Hubisz M.J., Chen R., Meisel R.P., Liu Y., Bettencourt B.R., Hradecky P., Letovsky S., Nielsen R., Thornton K., Hubisz M.J., Chen R., Meisel R.P., Bettencourt B.R., Hradecky P., Letovsky S., Nielsen R., Thornton K., Hubisz M.J., Chen R., Meisel R.P., Hradecky P., Letovsky S., Nielsen R., Thornton K., Hubisz M.J., Chen R., Meisel R.P., Letovsky S., Nielsen R., Thornton K., Hubisz M.J., Chen R., Meisel R.P., Nielsen R., Thornton K., Hubisz M.J., Chen R., Meisel R.P., Thornton K., Hubisz M.J., Chen R., Meisel R.P., Hubisz M.J., Chen R., Meisel R.P., Chen R., Meisel R.P., Meisel R.P., et al. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution. Genome Res. 2005;15:1–18. [PMC free article] [PubMed]
  • Ritchie W., Legendre M., Gautheret D., Legendre M., Gautheret D., Gautheret D. RNA stem-loops: To be or not to be cleaved by RNAse III. RNA. 2007;13:457–462. [PMC free article] [PubMed]
  • Ronshaugen M., Biemar F., Piel J., Levine M., Lai E.C., Biemar F., Piel J., Levine M., Lai E.C., Piel J., Levine M., Lai E.C., Levine M., Lai E.C., Lai E.C. The Drosophila microRNA iab-4 causes a dominant homeotic transformation of halteres to wings. Genes & Dev. 2005;19:2947–2952. [PMC free article] [PubMed]
  • Ruby J.G., Jan C., Player C., Axtell M.J., Lee W., Nusbaum C., Ge H., Bartel D.P., Jan C., Player C., Axtell M.J., Lee W., Nusbaum C., Ge H., Bartel D.P., Player C., Axtell M.J., Lee W., Nusbaum C., Ge H., Bartel D.P., Axtell M.J., Lee W., Nusbaum C., Ge H., Bartel D.P., Lee W., Nusbaum C., Ge H., Bartel D.P., Nusbaum C., Ge H., Bartel D.P., Ge H., Bartel D.P., Bartel D.P. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell. 2006;127:1193–1207. [PubMed]
  • Ruby J.G., Stark A., Johnston W.K., Kellis M., Bartel D.P., Lai E.C., Stark A., Johnston W.K., Kellis M., Bartel D.P., Lai E.C., Johnston W.K., Kellis M., Bartel D.P., Lai E.C., Kellis M., Bartel D.P., Lai E.C., Bartel D.P., Lai E.C., Lai E.C. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res. 2007 doi: 10.1101/gr.6597907. (this issue) [PMC free article] [PubMed] [Cross Ref]
  • Schwarz D.S., Hutvagner G., Du T., Xu Z., Aronin N., Zamore P.D., Hutvagner G., Du T., Xu Z., Aronin N., Zamore P.D., Du T., Xu Z., Aronin N., Zamore P.D., Xu Z., Aronin N., Zamore P.D., Aronin N., Zamore P.D., Zamore P.D. Asymmetry in the assembly of the RNAi enzyme complex. Cell. 2003;115:199–208. [PubMed]
  • Seitz H., Zamore P.D., Zamore P.D. Rethinking the microprocessor. Cell. 2006;125:827–829. [PubMed]
  • Stark A., Brennecke J., Russell R.B., Cohen S.M., Brennecke J., Russell R.B., Cohen S.M., Russell R.B., Cohen S.M., Cohen S.M. Identification of Drosophila microRNA targets. PLoS Biol. 2003;1:e60. doi: 10.1371/journal.pbio.0000060. [PMC free article] [PubMed] [Cross Ref]
  • Stark A., Brennecke J., Bushati N., Russell R.B., Cohen S.M., Brennecke J., Bushati N., Russell R.B., Cohen S.M., Bushati N., Russell R.B., Cohen S.M., Russell R.B., Cohen S.M., Cohen S.M. Animal microRNAs confer robustness to gene expression and have a significant impact on 3′UTR evolution. Cell. 2005;123:1133–1146. [PubMed]
  • Stark A., Lin M.F., Kheradpour P., Pedersen J.S., Parts L., Carlson J.W., Crosby M.A., Rasmussen M.D., Roy S., Deoras A.N., Lin M.F., Kheradpour P., Pedersen J.S., Parts L., Carlson J.W., Crosby M.A., Rasmussen M.D., Roy S., Deoras A.N., Kheradpour P., Pedersen J.S., Parts L., Carlson J.W., Crosby M.A., Rasmussen M.D., Roy S., Deoras A.N., Pedersen J.S., Parts L., Carlson J.W., Crosby M.A., Rasmussen M.D., Roy S., Deoras A.N., Parts L., Carlson J.W., Crosby M.A., Rasmussen M.D., Roy S., Deoras A.N., Carlson J.W., Crosby M.A., Rasmussen M.D., Roy S., Deoras A.N., Crosby M.A., Rasmussen M.D., Roy S., Deoras A.N., Rasmussen M.D., Roy S., Deoras A.N., Roy S., Deoras A.N., Deoras A.N., et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature. 2007 doi: 10.1038/ nature06340. (in press) [PMC free article] [PubMed] [Cross Ref]
  • Teleman A.A., Cohen S.M., Cohen S.M. Drosophila lacking microRNA miR-278 are defective in energy homeostasis. Genes & Dev. 2006;20:417–422. [PMC free article] [PubMed]
  • Thompson J.D., Higgins D.G., Gibson T.J., Higgins D.G., Gibson T.J., Gibson T.J. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [PMC free article] [PubMed] [Cross Ref]
  • Valencia-Sanchez M.A., Liu J., Hannon G.J., Parker R., Liu J., Hannon G.J., Parker R., Hannon G.J., Parker R., Parker R. Control of translation and mRNA degradation by miRNAs and siRNAs. Genes & Dev. 2006;20:515–524. [PubMed]
  • Washietl S., Hofacker I.L., Stadler P.F., Hofacker I.L., Stadler P.F., Stadler P.F. Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. 2005;102:2454–2459. [PMC free article] [PubMed]
  • Wightman B., Ha I., Ruvkun G., Ha I., Ruvkun G., Ruvkun G. Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans. Cell. 1993;75:855–862. [PubMed]
  • Xie X., Lu J., Kulbokas E.J., Golub T.R., Mootha V., Lindblad-Toh K., Lander E.S., Kellis M., Lu J., Kulbokas E.J., Golub T.R., Mootha V., Lindblad-Toh K., Lander E.S., Kellis M., Kulbokas E.J., Golub T.R., Mootha V., Lindblad-Toh K., Lander E.S., Kellis M., Golub T.R., Mootha V., Lindblad-Toh K., Lander E.S., Kellis M., Mootha V., Lindblad-Toh K., Lander E.S., Kellis M., Lindblad-Toh K., Lander E.S., Kellis M., Lander E.S., Kellis M., Kellis M. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature. 2005;434:338–345. [PMC free article] [PubMed]
  • Yekta S., Shih I.H., Bartel D.P., Shih I.H., Bartel D.P., Bartel D.P. MicroRNA-directed cleavage of HOXB8 mRNA. Science. 2004;304:594–596. [PubMed]
  • Zamore P.D., Haley B., Haley B. Ribo-gnome: The big world of small RNAs. Science. 2005;309:1519–1524. [PubMed]
  • Zhang R., Peng Y., Wang W., Su B., Peng Y., Wang W., Su B., Wang W., Su B., Su B. Rapid evolution of an X-linked microRNA cluster in primates. Genome Res. 2007;17:612–617. [PMC free article] [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...