![]() | ![]() |
Formats:
|
|||||||||||||||||||||||||||||||||
Copyright : © 2005 Brennecke et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited Principles of MicroRNA–Target Recognition 1European Molecular Biology Laboratory, Heidelberg, Germany Corresponding author.#Contributed equally. Stephen M Cohen: cohen/at/embl.de Received September 21, 2004; Accepted January 4, 2005. This article has been cited by other articles in PMC.Abstract MicroRNAs (miRNAs) are short non-coding RNAs that regulate gene expression in plants and animals. Although their biological importance has become clear, how they recognize and regulate target genes remains less well understood. Here, we systematically evaluate the minimal requirements for functional miRNA–target duplexes in vivo and distinguish classes of target sites with different functional properties. Target sites can be grouped into two broad categories. 5′ dominant sites have sufficient complementarity to the miRNA 5′ end to function with little or no support from pairing to the miRNA 3′ end. Indeed, sites with 3′ pairing below the random noise level are functional given a strong 5′ end. In contrast, 3′ compensatory sites have insufficient 5′ pairing and require strong 3′ pairing for function. We present examples and genome-wide statistical support to show that both classes of sites are used in biologically relevant genes. We provide evidence that an average miRNA has approximately 100 target sites, indicating that miRNAs regulate a large fraction of protein-coding genes and that miRNA 3′ ends are key determinants of target specificity within miRNA families. Introduction MicroRNAs (miRNAs) are small non-coding RNAs that serve as post-transcriptional regulators of gene expression in plants and animals. They act by binding to complementary sites on target mRNAs to induce cleavage or repression of productive translation (reviewed in [1,2,3,4]). The importance of miRNAs for development is highlighted by the fact that they comprise approximately 1% of genes in animals, and are often highly conserved across a wide range of species (e.g., [5,6,7]). Further, mutations in proteins required for miRNA function or biogenesis impair animal development [8,9,10,11,12,13,14,15]. To date, functions have been assigned to only a few of the hundreds of animal miRNA genes. Mutant phenotypes in nematodes and flies led to the discovery that the lin-4 and let-7 miRNAs control developmental timing [16,17], that lsy-6 miRNA regulates left–right asymmetry in the nervous system [18], that bantam miRNA controls tissue growth [19], and that bantam and miR-14 control apoptosis [19,20]. Mouse miR-181 is preferentially expressed in bone marrow and was shown to be involved in hematopoietic differentiation [21]. Recently, mouse miR-375 was found to be a pancreatic-islet-specific miRNA that regulates insulin secretion [22]. Prediction of miRNA targets provides an alternative approach to assign biological functions. This has been very effective in plants, where miRNA and target mRNA are often nearly perfectly complementary [23,24,25]. In animals, functional duplexes can be more variable in structure: they contain only short complementary sequence stretches, interrupted by gaps and mismatches. To date, specific rules for functional miRNA–target pairing that capture all known functional targets have not been devised. This has created problems for search strategies, which apply different assumptions about how to best identify functional sites. As a result, the number of predicted targets varies considerably with only limited overlap in the top-ranking targets, indicating that these approaches might only capture subsets of real targets and/or may include a high number of background matches ([19,26,27,28,29,30]; reviewed by [31]). Nonetheless, a number of predicted targets have proven to be functional when subjected to experimental tests [19,26,27,29]. A better understanding of the pairing requirements between miRNA and target would clearly improve predictions of miRNA targets in animals. It is known that defined cis-regulatory elements in Drosophila 3′ UTRs are complementary to the 5′ ends of certain miRNAs [32]. The importance of the miRNA 5′ end has also emerged from the pairing characteristics and evolutionary conservation of known target sites [26], and from the observation of a non-random statistical signal specific to the 5′ end in genome-wide target predictions [27]. Tissue culture experiments have also underscored the importance of 5′ pairing and have provided some specific insights into the general structural requirements [29,33,34], though different studies have conflicted to some degree with each other, and with known target sites (reviewed in [31]). To date, no specific role has been ascribed to the 3′ end of miRNAs, despite the fact that miRNAs tend to be conserved over their full length. Here, we systematically evaluate the minimal requirements for a functional miRNA–target duplex in vivo. These experiments have allowed us to identify two broad categories of miRNA target sites. Targets in the first category, “5′ dominant” sites, base-pair well to the 5′ end of the miRNA. Although there is a continuum of 3′ pairing quality within this class, it is useful to distinguish two subtypes: “canonical” sites, which pair well at both the 5′ and 3′ ends, and “seed” sites, which require little or no 3′ pairing support. Targets in the second category, “3′ compensatory” sites, have weak 5′ base-pairing and depend on strong compensatory pairing to the 3′ end of the miRNA. We present evidence that all of these site types are used to mediate regulation by miRNAs and show that the 3′ compensatory class of target sites is used to discriminate among individual members of miRNA families in vivo. A genome-wide statistical analysis allows us to estimate that an average miRNA has approximately 100 evolutionarily conserved target sites, indicating that miRNAs regulate a large fraction of protein-coding genes. Evaluation of 3′ pairing quality suggests that seed sites are the largest group. Sites of this type have been largely overlooked in previous target prediction methods. Results The Minimal miRNA Target Site To improve our understanding of the minimal requirements for a functional miRNA target site, we made use of a simple in vivo assay in the Drosophila wing imaginal disc. We expressed a miRNA in a stripe of cells in the central region of the disc and assessed its ability to repress the expression of a ubiquitously transcribed enhanced green fluorescent protein (EGFP) transgene containing a single target site in its 3′ UTR. The degree of repression was evaluated by comparing EGFP levels in miRNA-expressing and adjacent non-expressing cells. Expression of the miRNA strongly reduced EGFP expression from transgenes containing a single functional target site (Figure 1
In a first series of experiments we asked which part of the RNA duplex is most important for target regulation. A set of transgenic flies was prepared, each of which contained a different target site for miR-7 in the 3′ UTR of the EGFP reporter construct. The starting site resembled the strongest bantam miRNA site in its biological target hid [19] and conferred strong regulation when present in a single copy in the 3′ UTR of the reporter gene (Figure 1 We next determined the minimal 5′ sequence complementarity necessary to confer target regulation. We refer to the core of 5′ sequence complementarity essential for target site recognition as the “seed” (Lewis et al. [27]). All possible 6mer, 5mer, and 4mer seeds complementary to the first eight nucleotides of the miRNA were tested in the context of a site that allowed strong base-pairing to the 3′ end of the miRNA (Figure 2
To determine the minimal lengths of 5′ seed matches that are sufficient to confer regulation alone, we tested single sites that pair with eight, seven, or six consecutive bases to the miRNA's 5′ end, but that do not pair to its 3′ end (Figure 2 We took care in designing the miRNA 3′ ends to exclude any 3′ pairing to nearby sequence according to RNA secondary structure prediction. However, we cannot rule out the possibility that extensive looping of the UTR sequence might allow the 3′ end to pair to sequences further downstream in our reporter constructs. Note, however, that even if remote 3′ pairing was occurring and required for function of 8- and 7mer seeds, it is not sufficient for 5′ matches with less than seven complementary bases (all test sites are in the same sequence context; Figure 2 From these experiments we conclude that (1) complementarity of seven or more bases to the 5′ end miRNA is sufficient to confer regulation, even if the target 3′ UTR contains only a single site; (2) sites with weaker 5′ complementarity require compensatory pairing to the 3′ end of the miRNA in order to confer regulation; and (3) extensive pairing to the 3′ end of the miRNA is not sufficient to confer regulation on its own without a minimal element of 5′ complementarity. The Effect of G:U Base-Pairs and Bulges in the Seed Several confirmed miRNA target genes contain predicted binding sites with seeds that are interrupted by G:U base-pairs or single nucleotide bulges [17,19,26,36,37,38,39]. In most cases these mRNAs contain multiple predicted target sites and the contributions of individual sites have not been tested. In vitro tests have shown that sites containing G:U base-pairs can function [29,34], but that G:U base-pairs contribute less to target site function than would be expected from their contribution to the predicted base-pairing energy [34]. We tested the ability of single sites with seeds containing G:U base-pairs and bulges to function in vivo. One, two, or three G:U base-pairs were introduced into single target sites with 8mer, 7mer, or 6mer seeds (Figure 3
Single nucleotide bulges in the seed are found in the let-7 target lin-41 and in the lin-4 target lin-14 [17,36,37]. Recent tissue culture experiments have led to the proposal that such bulges are tolerated if positioned symmetrically in the seed region [29]. We tested a series of sites with single nucleotide bulges in the target or the miRNA (Figure 3 Functional Categories of Target Sites While recognizing that there is a continuum of base-pairing quality between miRNAs and target sites, the experiments presented above suggest that sites that depend critically on pairing to the miRNA 5′ end (5′ dominant sites) can be distinguished from those that cannot function without strong pairing to the miRNA 3′ end (3′ compensatory sites). The 3′ compensatory group includes seed matches of four to six base-pairs and seeds of seven or eight bases that contain G:U base-pairs, single nucleotide bulges, or mismatches. We consider it useful to distinguish two subgroups of 5′ dominant sites: those with good pairing to both 5′ and 3′ ends of the miRNA (canonical sites) and those with good 5′ pairing but with little or no 3′ pairing (seed sites). We consider seed sites to be those where there is no evidence for pairing of the miRNA 3′ end to nearby sequences that is better than would be expected at random. We cannot exclude the possibility that some sites that we identify as seed sites might be supported by additional long-range 3′ pairing. Computationally, this is always possible if long enough loops in the UTR sequence are allowed. Whether long loops are functional in vivo remains to be determined. Canonical sites have strong seed matches supported by strong base-pairing to the 3′ end of the miRNA. Canonical sites can thus be seen as an extension of the seed type (with enhanced 3′ pairing in addition to a sufficient 5′ seed) or as an extension of the 3′compensatory type (with improved 5′ seed quality in addition to sufficient 3′ pairing). Individually, canonical sites are likely to be more effective than other site types because of their higher pairing energy, and may function in one copy. Due to their lower pairing energies, seed sites are expected to be more effective when present in more than one copy. Figure 4
Most currently identified miRNA target sites are canonical. For example, the hairy 3′ UTR contains a single site for miR-7, with a 9mer seed and a stretch of 3′ complementarity. This site has been shown to be functional in vivo [26], and it is strikingly conserved in the seed match and in the extent of complementarity to the 3′ end of miR-7 in all six orthologous 3′ UTRs. Although seed sites have not been previously identified as functional miRNA target sites, there is some evidence that they exist in vivo. For example, the Bearded (Brd) 3′ UTR contains three sequence elements, known as Brd boxes, that are complementary to the 5′ region of miR-4 and miR-79 [32,40]. Brd boxes have been shown to repress expression of a reporter gene in vivo, presumably via miRNAs, as expression of a Brd 3′ UTR reporter is elevated in dicer-1 mutant cells, which are unable to produce any miRNAs [14]. All three Brd box target sites consist of 7mer seeds with little or no base-pairing to the 3′ end of either miR-4 or miR-79 (see below). The alignment of Brd 3′ UTRs shows that there is little conservation in the miR-4 or miR-79 target sites outside the seed sequence, nor is there conservation of pairing to either miRNA 3′ end. This suggests that the sequences that could pair to the 3′ end of the miRNAs are not important for regulation as they do not appear to be under selective pressure. This makes it unlikely that a yet unidentified Brd box miRNA could form a canonical site complex. The 3′ UTR of the HOX gene Sex combs reduced (Scr) provides a good example of a 3′ compensatory site. Scr contains a single site for miR-10 with a 5mer seed and a continuous 11-base-pair complementarity to the miRNA 3′ end [28]. The miR-10 transcript is encoded within the same HOX cluster downstream of Scr, a situation that resembles the relationship between miR-iab-5p and Ultrabithorax in flies [26] and miR-196/HoxB8 in mice [41]. The predicted pairing between miR-10 and Scr is perfectly conserved in all six drosophilid genomes, with the only sequence differences occurring in the unpaired loop region. The site is also conserved in the 3′ UTR of the Scr genes in the mosquito, Anopheles gambiae, the flour beetle, Tribolium castaneum, and the silk moth,Bombyx mori. Conservation of such a high degree of 3′ complementarity over hundreds of millions of years of evolution suggests that this is likely to be a functional miR-10 target site. Extensive 5′ and 3′ sequence conservation is also seen for other 3′ compensatory sites, e.g., the two let-7 sites in lin-41 or the miR-2 sites in grim and sickle [17,26,36]. The miRNA 3′ End Determines Target Specificity within miRNA Families Several families of miRNAs have been identified whose members have common 5′ sequences but differ in their 3′ ends. In view of the evidence that 5′ ends of miRNA are functionally important [26,27,29,42], and in some cases sufficient (present study), it can be expected that members of miRNA families may have redundant or partially redundant functions. According to our model, 5′ dominant canonical and seed sites should respond to all members of a given miRNA family, whereas 3′ compensatory sites should differ in their sensitivity to different miRNA family members depending on the degree of 3′ complementarity. We tested this using the wing disc assay with 3′ UTR reporter transgenes and overexpression constructs for various miRNA family members. miR-4 and miR-79 share a common 5′ sequence that is complementary to a single 8mer seed site in the bagpipe 3′ UTR (Figure 5
To test whether miRNA family members can also have non-overlapping targets, we used 3′ UTR reporters of the pro-apoptotic genes grim and sickle, two recently identified miRNA targets [26]. Both genes contain K boxes in their 3′ UTRs that are complementary to the 5′ ends of the miR-2, miR-6, and miR-11 miRNA family [26,32]. These miRNAs share residues 2–8 but differ considerably in their 3′ regions (Figure 5 The sickle 3′ UTR contains two K boxes and provides an opportunity to test whether weak sites can function synergistically. The first site is similar to the grim 3′ UTR in that it contains a 6mer seed for all three miRNAs but extensive 3′ complementarity only to miR-2. The second site contains a 7mer seed for miR-2 and miR-6 but only a 6mer seed for miR-11 (Figure 5 To show that endogenous miRNA levels regulate all three 3′ UTR reporters, we compared EGFP expression in wild-type cells and dicer-1 mutant cells, which are unable to produce miRNAs [14]. dicer-1 clones did not affect a control reporter lacking miRNA binding sites, but showed elevated expression of a reporter containing the 3′ UTR of the previously identified bantam miRNA target hid (Figure 5 Genome-Wide Occurrence of Target Sites Experimental tests such as those presented above and the observed evolutionary conservation suggest that all three types of target sites are likely to be used in vivo. To gain additional evidence we examined the occurrence of each site type in all Drosophila melanogaster 3′ UTRs. We made use of the D. pseudoobscura genome, the second assembled drosophilid genome, to determine the degree of site conservation for the three different site classes in an alignment of orthologous 3′ UTRs. From the 78 known Drosophila miRNAs, we selected a set of 49 miRNAs with non-redundant 5′ sequences. We first investigated whether sequences complementary to the miRNA 5′ ends were better conserved than would be expected for random sequences. For each miRNA, we constructed a cohort of ten randomly shuffled variants. To avoid a bias for the number of possible target matches, the shuffled variants were required to produce a number of sequence matches comparable (±15%) to the original miRNAs for D. melanogaster 3′ UTRs. 7mer and 8mer seeds complementary to real miRNA 5′ ends were significantly better conserved than those complementary to the shuffled variants. This is consistent with the findings of Lewis et al. [27] but was obtained without the need to use a rank and energy cutoff applied to the full-length miRNA target duplex, as was the case for vertebrate miRNAs. Conserved 8mer seeds for real miRNAs occur on average 2.8 times as often as seeds complementary to the shuffled miRNAs (Figure 6
3′ compensatory and canonical sites depend on substantial pairing to the miRNA 3′ end. For these sites, we expect UTR sequences adjacent to miRNA 5′ seed matches to pair better to the miRNA 3′ end than to random sequences. However, unlike 5′ complementarity, 3′ base-pairing preference was not detected in previous studies looking at sequence complementarity and nucleotide conservation because UTR sequences complementary to the miRNA 3′ end were not better conserved than would be expected at random [27]. On this basis, we decided to treat the 5′ and 3′ ends of the miRNA separately. For the 5′ end, seed matches were required to be fully conserved in an alignment of orthologous D. melanogaster and D. pseudoobscura 3′ UTRs (we expected one-half to two-thirds of these matches to be real miRNA sites). We first investigated the overall conservation of UTR sequences adjacent to the conserved seed matches and found that overall the sequences are not better conserved than a random control with shuffled miRNAs (Figure 6 We therefore chose to evaluate the quality of 3′ pairing by the stability of the predicted RNA–RNA duplex. We assessed predicted pairing energy between the miRNA 3′ end and the adjacent UTR sequence for both Drosophila species and used the lower score. Use of the lower score measures conservation of the overall degree of pairing without requiring sequence identity. Figure 6 The average of the highest 1% of 3′ pairing energies of each of 58 3′ non-redundant miRNAs was divided by that of its 50 3′ shuffled controls. This ratio is one if the averages are the same, and increases if the real miRNA has better 3′ pairing than the shuffled miRNAs. To test whether a signal was specific for real miRNAs, we repeated the same protocol with a mutant version of each miRNA. The altered 5′ sequence in the mutant miRNA selects different seed matches than the real miRNA and permits a comparison of sequences that have not been under selection for complementarity to miRNA 3′ ends with those that may have been. Figure 6 A small fraction of sites show exceptionally good 3′ pairing. If we use 3′ pairing energy cutoffs to examine site quality for all miRNAs, we expect sites of this type to be distinguishable from random matches. The ratio of the number of sites above the cutoff for real versus 3′ shuffled miRNAs was plotted as a function of the 3′ pairing cutoff (Figure 6 We also tested whether sequences forming 7mer or 8mer seeds containing G:U base-pairs, mismatches, or bulges were better conserved if complementary to real miRNAs. We did not find any statistical evidence for these seed types. Analysis of 3′ pairing also failed to show any non-random signal for these sites. This suggests that such sites are few in number genome-wide and are not readily distinguished from random matches. Nonetheless, our experiments do show that sites of this type can function in vivo. The let-7 sites in lin-41 provide a natural example. Most Sites Lack Substantial 3′ Pairing The experimental and computational results presented above provide information about 5′ and 3′ pairing that allows us to estimate the number of target sites of each type in Drosophila. The number of 3′ compensatory sites cannot be estimated on the basis of 5′ pairing, because seed matches of four, five, or six bases cannot be distinguished from random matches, reflecting that a large number of randomly conserved and non-functional matches predominate (Figure 6 The overrepresentation of conserved 5′ seed matches (see Figure 6 Again, we note the caveat that some of sites that we identify as seed could in principle be supported by 3′ pairing to more distant upstream sequences, but also that such sites would be difficult to distinguish from background computationally and that it is unclear whether large loops are functional. If there were statistical evidence for 3′ pairing that is lower than would be expected at random for some sites, this would be one line of argument for a discrete functional class that does not use 3′ pairing and would therefore suggest selection against 3′ pairing. Although the overall distribution of 3′ pairing energies for real miRNA 3′ ends adjacent to 8mer seed matches is very similar to the random control with 3′ shuffled sequences (Figure 7
Overall, these estimates suggest that there are over 80 5′ dominant sites and 20 or fewer 3′ compensatory sites per miRNA in the Drosophila genome. As estimates of the number of miRNAs in Drosophila range from 96 to 124 [44], this translates to 8,000–12,000 miRNA target sites genome-wide, which is close to the number of protein-coding genes. Even allowing for the fact that some genes have multiple miRNA target sites, these findings suggest that a large fraction of genes are regulated by miRNAs. Discussion We have provided experimental and computational evidence for different types of miRNA target sites. One key finding is that sites with as little as seven base-pairs of complementarity to the miRNA 5′ end are sufficient to confer regulation in vivo and are used in biologically relevant targets. Genome-wide, 5′ dominant sites occur 2- to 3-fold more often in conserved 3′ UTR sequences than would be expected at random. The majority of these sites have been overlooked by previous miRNA target prediction methods because their limited capacity to base-pair to the miRNA 3′ end cannot be distinguished from random noise. Such sites rank low in search methods designed to optimize overall pairing energy [16,17,26,27,28,30,35]. Indeed, we find that few seed sites scored high enough to be considered seriously in these earlier predictions, even when 5′ complementarity was given an additional weighting (e.g., [28,43]. We thus suspect that methods with pairing cutoffs would exclude many, if not all, such sites. In a scenario in which protein-coding genes acquire miRNA target sites in the course of evolution [4], it is likely that seed sites with only seven or eight bases complementary to a miRNA would be the first functional sites to be acquired. Once present, a site would be retained if it conferred an advantage, and sites with extended complementarity could also be selected to confer stronger repression. In this scenario, the number of sites might grow over the course of evolution so that ancient miRNAs would tend to have more targets than those more recently evolved. Likewise, genes that should not be repressed by the miRNA milieu in a given cell type would tend to avoid seed matches to miRNA 5′ ends (“anti-targets” [4]). Although a 7- to 8mer seed is sufficient for a site to function, additional 3′ pairing increases miRNA functionality. The activity of a single 7mer canonical site is expected to be greater than an equivalent seed site. Likewise, the magnitude of miRNA-induced repression is reduced by introducing 3′ mismatches into a canonical site. Genome-wide, there are many sites that appear to show selection for conserved 3′ pairing and, interestingly, many sites that appear to show selection against 3′ pairing. In vivo, canonical sites might function at lower miRNA concentrations and might repress translation more effectively, particularly when multiple sites are present in one UTR (e.g., [42]). Efficient repression is likely to be necessary for genes whose expression would be detrimental, as illustrated by the genetically identified miRNAs, which produce clear mutant phenotypes when their targets are not normally repressed (“switch targets” [4]). Prolonged expression of the lin-14 and lin-41 genes in Caenorhabditis elegans mutant for lin-4 or let-7 causes developmental defects, and their regulation involves multiple sites [17,36,37]. Similarly, multiple target sites allow robust regulation of the pro-apoptotic gene hid by bantam miRNA in Drosophila [19]. More subtle modulation of expression levels could be accomplished by weaker sites, such as those lacking 3′ pairing. Sites that cannot function efficiently alone are in fact a prerequisite for combinatorial regulation by multiple miRNAs. Seed sites might thus be useful for situations in which the combined input of several miRNAs is used to regulate target expression. Depending on the nature of the target sites, any single miRNA might not have a strong effect on its own, while being required in the context of others. 3′ Complementarity Distinguishes miRNA Family Members 3′ compensatory sites have weak 5′ pairing and need substantial 3′ pairing to function. We find genome-wide statistical support for 3′ compensatory sites with 5mer and 6mer seeds and show that they are used in vivo. Furthermore, these sites can be differentially regulated by different miRNA family members depending on the quality of their 3′ pairing (e.g., regulation of the pro-apoptotic genes grim and sickle by miR-2, miR-6, and miR-11). Thus, members of a miRNA family may have common targets as well as distinct targets. They may be functionally redundant in regulation of some targets but not others, and so we can expect some overlapping phenotypes as well as differences in their mutant phenotypes. Following this reasoning, it is likely that the let-7 miRNA family members differentially regulate lin-41 in C. elegans [17,45]. The seed matches in lin-41 to let-7 and the related miRNAs miR-48, miR-84, and miR-241 are weak, and only let-7 has strong 3′ pairing. On this basis, it seems likely that lin-41 is regulated only by let-7. In contrast, hbl-1 has four sites with strong seed matches [38,39], and we expect it to be regulated by all four let-7 family members. As all four let-7-related miRNAs are expressed similarly during development [6], their role as regulators of hbl-1 may be redundant. let-7 must also have targets not shared by the other family members, as its function is essential. lin-41 is likely to be one such target. The idea that the 3′ end of miRNAs serves as a specificity factor provides an attractive explanation for the observation that many miRNAs are conserved over their full length across species separated by several hundreds of millions of years of evolution. 3′ compensatory sites may have evolved from canonical sites by mutations that reduce the quality of the seed match. This could confer an advantage by allowing a site to become differentially regulated by miRNA family members. In addition, sites could retain specificity and overall pairing energy, but with reduced activity, perhaps permitting discrimination between high and low levels of miRNA expression. This might also allow a target gene to acquire a dependence on inputs from multiple miRNAs. These scenarios illustrate a few ways in which more complex regulatory roles for miRNAs might arise during evolution. A Large Fraction of the Genome Is Regulated by miRNAs Another intriguing outcome of this study is evidence for a surprisingly large number of miRNA target sites genome-wide. Even our conservative estimate is far above the numbers of sites in recent predictions, e.g., seven or fewer per miRNA [27,28,29]. Our estimate of the total number of targets approaches the number of protein-coding genes, suggesting that regulation of gene expression by miRNAs plays a greater role in biology than previously anticipated. Indeed, Bartel and Chen [46] have suggested in a recent review that the earlier estimates were likely to be low, and a recent study by John et al. [43], published while this manuscript was under review, predicts that approximately 10% of human genes are regulated by miRNAs. We agree with these authors' suggestion that this is likely an underestimate, because their method identifies an average of only 7.1 target genes per miRNA, with few that we would classify as seed sites lacking substantial 3′ pairing. A large number of target sites per miRNA is also consistent with combinatorial gene regulation by miRNAs, analogous to that by transcription factors, leading to cell-type-specific gene expression [47]. Sites for multiple miRNAs allow for the possibility of cell-type-specific miRNA combinations to confer robust and specific gene regulation. Our results provide an improved understanding of some of the important parameters that define how miRNAs bind to their target genes. We anticipate that these will be of use in understanding known miRNA–target relationships and in improving methods to predict miRNA targets. We have limited our evaluation to target sites in 3′ UTRs. miRNAs directed at other types of targets or with dramatically different functions (e.g., in regulation of chromatin structure) might well use different rules. Accordingly, there may prove to be more targets than we can currently estimate. Further, there may be additional features, such as overall UTR context, that either enhance or limit the accessibility of predicted sites and hence their ability to function. For example, the rules about target site structure cannot explain the apparent requirement for the linker sequence observed in the let-7/lin-41 regulation [48]. Further efforts toward experimental target site validation and systematic examination of UTR features can be expected to provide new insight into the function of miRNA target sites. Materials and Methods Fly strains ptcGal4; EP miR278 was provided by Aurelio Teleman. The control, hid, grim, and sickle 3′ UTR reporter transgenes, and UAS-miR-2b are described in [19,26]. For UAS constructs for miRNA overexpression, genomic fragments including miR-4 (together with miR-286 and miR-5) and miR-11 were amplified by PCR and cloned into UAS-DSred as described for UAS-miR-7 [26]. Details are available on request. UAS-miR-79 (also contains miR-9b and miR-9c) and UAS-miR-6 (miR-6–1, miR-6–2, and miR-6–3) were kindly provided by Eric Lai. dcr-1Q1147X is described in [14]. Clonal analysis Clones mutant for dcr-1Q1147X were induced in HS-Flp;dcr-1 FRT82/armadillo-lacZ FRT82 larvae by heat shock for 1 h at 38 °C at 50–60 h of development. Wandering third-instar larvae were dissected and labeled with rabbit anti-GFP (Torrey Pines Biolabs, Houston, Texas, United States; 1:400) and anti-β-Gal (rat polyclonal, 1:500). Reporter constructs The bagpipe 3′ UTR was PCR amplified from genomic DNA (using the following primers [enzyme sites in lower case]: AAtctaga
AGGTTGGGAGTGACCATGTCTC and AActcgag
TATTTAGCTCTCGGGTAGATACG) and cloned downstream of the tubulin promoter and EGFP (Clontech, Palo Alto, California, United States) in Casper4 as in [26].
Single target site constructs Oligonucleotides containing the target site sequences shown in the figures were annealed and cloned downstream of tub>EGFP and upstream of SV40polyA (XbaI/XhoI). Clones were verified by DNA sequencing. Details are available on request. EGFP intensity measurements NIH image 1.63 was used to quantify intensity levels in miRNA-expressing and non-expressing cells from confocal images. Depending on the variation, between three and five individual discs were analyzed. 3′ UTR alignments For each D. melanogaster gene, we identified the D. pseudoobscura ortholog using TBlastn as described in [26]. We then aligned the D. melanogaster 3′ UTR obtained from the Berkeley Drosophila Genome Project to the D. pseudoobscura 3′ adjacent sequence (Human Genome Sequencing Center at Baylor College of Medicine) using AVID [49]. For individual examples, we manually mapped the D. melanogaster coding region to genomic sequence traces (National Center for Biotechnology Information trace archive) of D. ananassae, D. virilis, D. simulans, and D. yakuba by TBlastn and extended the sequences by Blastn-walking. These 3′ UTR sequences were then aligned to the D. melanogaster and D. pseudoobscura 3′ UTRs using AVID. miRNA-sequences Drosophila miRNA sequences were from [44,50,51] downloaded from Rfam (http://www.sanger.ac.uk/Software/Rfam/mirna/index.shtml). The 5′ non-redundant set (49 miRNAs) comprised bantam, let-7, miR-1, miR-10, miR-11, miR-100, miR-124, miR-125, miR-12, miR-133, miR-13a, miR-14, miR-184, miR-210, miR-219, miR-263b, miR-275, miR-276b, miR-277, miR-278, miR-279, miR-281, miR-283, miR-285, miR-287, miR-288, miR-303, miR-304, miR-305, miR-307, miR-309, miR-310, miR-314, miR-315, miR-316, miR-317, miR-31a, miR-33, miR-34, miR-3, miR-4, miR-5, miR-79, miR-7, miR-87, miR-8, miR-92a, miR-9a, and miR-iab-4–5p. Additional miRNAs in the 3′ non-redundant set were miR-2b, miR-286, miR-306, miR-308, miR-311, miR-312, miR-313, miR-318, and miR-6. miRNA shuffles and mutants For the completely shuffled miRNAs, we shuffled the miRNA sequence over the entire length and required all possible 8mer and 7mer seeds within the first nine bases to have an equal frequency (±15%) to the D. melanogaster 3′ UTRs (i.e., same single genome count). For the 3′ shuffled miRNAs, we shuffled the 3′ end starting at base 10 and required the shuffles to have equal (±15%) pairing energy to a perfect complement and to 10,000 randomly chosen sites. For each miRNA we created all possible 2-nt mutants (exchanging A to T or C, C to A or G, G to C or T, and T to A or G) within the seed (nucleotides 3–6) and chose the one with the closest alignment frequencies to the real miRNA in D. melanogaster 3′ UTRs and in the conserved sequences in D. melanogaster and D. pseudoobscura 3′ UTRs. Seed matching and site evaluation For each miRNA and seed type we found the 5′ match in the D. melanogaster 3′ UTRs and required it to be 100% conserved in an alignment to the D. pseudoobscura ortholog allowing for positional alignment errors of ±2 nt. When searching 7mer to 4mer seeds we masked all longer seeds to avoid identifying the same site more than once. For each matching site we extracted the 3′ adjacent sequence for both genomes, aligned it to the miRNA 3′ end starting at nucleotide 10 using RNAhybrid [35], and took the worse energy. Accession Numbers The miRNA sequences discussed in this paper can be found in the miRNA Registry (http://www.sanger.ac.uk/Software/Rfam/mirna/index.shtml). NCBI RefSeq (http://www.ncbi.nlm.nih.gov/RefSeq/) accession numbers: bagpipe (NM_169958), Brd (NM_057541), grim (NM_079413), hairy (NM_079253), hid (NM_079412), lin-14 (NM_077516), lin-41 (NM_060087), and Scr (NM_206443). GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) accession numbers: sickle (AF460844) and D. simulans hairy (AY055843). Acknowledgments We thank Ann-Mari Voie for cheerfully producing the large number of transgenic strains used in this work. We are grateful to Marc Rehmsmeier for providing us with the RNAhybrid program prior to publication, to Eric Lai for providing unpublished fly strains, to Aurelio Teleman for comments on the manuscript, and to Lars Juhl Jensen for helpful discussions on the statistics. Competing interests. The authors have declared that no competing interests exist. Abbreviations
Footnotes Author contributions. JB, AS, and SMC conceived and designed the experiments. JB and AS performed the experiments and analyzed the data. JB, AS, RBR, and SMC wrote the paper. Citation: Brennecke J, Stark A, Russell RB, Cohen SM (2005) Principles of microRNA–target recognition. PLoS Biol 3(3): e85. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
||||||||||||||||||||||||||||||||
Nature. 2004 Sep 16; 431(7006):350-5.
[Nature. 2004]Curr Biol. 2003 Dec 2; 13(23):R925-36.
[Curr Biol. 2003]Science. 2003 Jul 18; 301(5631):336-8.
[Science. 2003]Cell. 2004 Jan 23; 116(2):281-97.
[Cell. 2004]Nature. 2000 Nov 2; 408(6808):86-9.
[Nature. 2000]Cell. 1993 Dec 3; 75(5):843-54.
[Cell. 1993]Nature. 2000 Feb 24; 403(6772):901-6.
[Nature. 2000]Nature. 2003 Dec 18; 426(6968):845-9.
[Nature. 2003]Cell. 2003 Apr 4; 113(1):25-36.
[Cell. 2003]Curr Biol. 2003 Apr 29; 13(9):790-5.
[Curr Biol. 2003]Cell. 2002 Aug 23; 110(4):513-20.
[Cell. 2002]Mol Cell. 2004 Jun 18; 14(6):787-99.
[Mol Cell. 2004]Bioinformatics. 2004 Nov 22; 20(17):2911-7.
[Bioinformatics. 2004]Cell. 2003 Apr 4; 113(1):25-36.
[Cell. 2003]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Nat Genet. 2002 Apr; 30(4):363-4.
[Nat Genet. 2002]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Cell. 2003 Dec 26; 115(7):787-98.
[Cell. 2003]Genes Dev. 2004 May 15; 18(10):1165-78.
[Genes Dev. 2004]J Biol Chem. 2003 Nov 7; 278(45):44312-9.
[J Biol Chem. 2003]Cell. 2003 Apr 4; 113(1):25-36.
[Cell. 2003]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Cell. 2003 Dec 26; 115(7):787-98.
[Cell. 2003]Genes Dev. 2004 May 15; 18(10):1165-78.
[Genes Dev. 2004]Nat Genet. 2002 Apr; 30(4):363-4.
[Nat Genet. 2002]Cell. 2003 Dec 26; 115(7):787-98.
[Cell. 2003]Genes Dev. 2004 Mar 1; 18(5):504-11.
[Genes Dev. 2004]Genome Biol. 2003; 5(1):R1.
[Genome Biol. 2003]Dev Biol. 2004 Mar 15; 267(2):529-35.
[Dev Biol. 2004]RNA. 2004 Oct; 10(10):1507-17.
[RNA. 2004]Genes Dev. 2004 Mar 1; 18(5):504-11.
[Genes Dev. 2004]Genes Dev. 2004 May 15; 18(10):1165-78.
[Genes Dev. 2004]Nature. 2000 Feb 24; 403(6772):901-6.
[Nature. 2000]Cell. 2003 Apr 4; 113(1):25-36.
[Cell. 2003]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Cell. 1993 Dec 3; 75(5):855-62.
[Cell. 1993]Genes Dev. 1996 Dec 1; 10(23):3041-50.
[Genes Dev. 1996]Nature. 2000 Feb 24; 403(6772):901-6.
[Nature. 2000]Cell. 1993 Dec 3; 75(5):855-62.
[Cell. 1993]Genes Dev. 1996 Dec 1; 10(23):3041-50.
[Genes Dev. 1996]Genes Dev. 2004 May 15; 18(10):1165-78.
[Genes Dev. 2004]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Nat Genet. 2002 Apr; 30(4):363-4.
[Nat Genet. 2002]Development. 1997 Dec; 124(23):4847-56.
[Development. 1997]Cell. 2004 Apr 2; 117(1):69-81.
[Cell. 2004]Genome Biol. 2003; 5(1):R1.
[Genome Biol. 2003]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Science. 2004 Apr 23; 304(5670):594-6.
[Science. 2004]Nature. 2000 Feb 24; 403(6772):901-6.
[Nature. 2000]Cell. 1993 Dec 3; 75(5):855-62.
[Cell. 1993]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Cell. 2003 Dec 26; 115(7):787-98.
[Cell. 2003]Genes Dev. 2004 May 15; 18(10):1165-78.
[Genes Dev. 2004]Genes Dev. 2003 Feb 15; 17(4):438-42.
[Genes Dev. 2003]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Nat Genet. 2002 Apr; 30(4):363-4.
[Nat Genet. 2002]Cell. 2004 Apr 2; 117(1):69-81.
[Cell. 2004]Cell. 2003 Dec 26; 115(7):787-98.
[Cell. 2003]Cell. 2003 Dec 26; 115(7):787-98.
[Cell. 2003]PLoS Biol. 2004 Nov; 2(11):e363.
[PLoS Biol. 2004]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Genome Biol. 2003; 5(1):R1.
[Genome Biol. 2003]RNA. 2004 Oct; 10(10):1507-17.
[RNA. 2004]Genome Biol. 2003; 4(7):R42.
[Genome Biol. 2003]Cell. 1993 Dec 3; 75(5):843-54.
[Cell. 1993]Nature. 2000 Feb 24; 403(6772):901-6.
[Nature. 2000]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Cell. 2003 Dec 26; 115(7):787-98.
[Cell. 2003]Genome Biol. 2003; 5(1):R1.
[Genome Biol. 2003]Cell. 2004 Jan 23; 116(2):281-97.
[Cell. 2004]Genes Dev. 2003 Feb 15; 17(4):438-42.
[Genes Dev. 2003]Cell. 2004 Jan 23; 116(2):281-97.
[Cell. 2004]Nature. 2000 Feb 24; 403(6772):901-6.
[Nature. 2000]Cell. 1993 Dec 3; 75(5):855-62.
[Cell. 1993]Genes Dev. 1996 Dec 1; 10(23):3041-50.
[Genes Dev. 1996]Nature. 2000 Feb 24; 403(6772):901-6.
[Nature. 2000]Mol Cell. 2000 Apr; 5(4):659-69.
[Mol Cell. 2000]Dev Cell. 2003 May; 4(5):625-37.
[Dev Cell. 2003]Dev Cell. 2003 May; 4(5):639-50.
[Dev Cell. 2003]Genes Dev. 2003 Apr 15; 17(8):991-1008.
[Genes Dev. 2003]Cell. 2003 Dec 26; 115(7):787-98.
[Cell. 2003]Genome Biol. 2003; 5(1):R1.
[Genome Biol. 2003]Genes Dev. 2004 May 15; 18(10):1165-78.
[Genes Dev. 2004]Nat Rev Genet. 2004 May; 5(5):396-400.
[Nat Rev Genet. 2004]PLoS Biol. 2004 Nov; 2(11):e363.
[PLoS Biol. 2004]Genes Dev. 2004 Jan 15; 18(2):132-7.
[Genes Dev. 2004]Cell. 2003 Apr 4; 113(1):25-36.
[Cell. 2003]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Cell. 2004 Apr 2; 117(1):69-81.
[Cell. 2004]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]Genome Res. 2003 Jan; 13(1):97-102.
[Genome Res. 2003]Genome Biol. 2003; 4(7):R42.
[Genome Biol. 2003]Science. 2001 Oct 26; 294(5543):853-8.
[Science. 2001]Dev Cell. 2003 Aug; 5(2):337-50.
[Dev Cell. 2003]RNA. 2004 Oct; 10(10):1507-17.
[RNA. 2004]PLoS Biol. 2003 Dec; 1(3):E60.
[PLoS Biol. 2003]