![]() | ![]() |
Formats:
|
||||||||||||||||||||||
MicroRNA-mediated Feedback and Feedforward Loops are Recurrent Network Motifs in Mammals 1 Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA 2 Graduate Program in Biophysics, Harvard University, Cambridge, MA, USA 3 Institute for Genome Sciences & Policy and Department of Cell Biology, Duke University, Durham, NC, USA * Co-corresponding authors: Email: avano/at/mit.edu, (A.v.O.) or Email: jun.zhu/at/duke.edu, (J.Z.) SUMMARY MicroRNAs (miRNA) are regulatory molecules that participate in diverse biological processes in animals and plants. While thousands of mammalian genes are potentially targeted by miRNAs, the functions of miRNAs in the context of gene networks are not well understood. Specifically, it is unknown whether miRNA-containing networks have recurrent circuit motifs, as has been observed in regulatory networks of bacteria and yeast. Here we develop a computational method that utilizes gene expression data to show that two classes of circuits—corresponding to positive and negative transcriptional co-regulation of a miRNA and its targets—are prevalent in the human and mouse genomes. Additionally, we find that neuronal-enriched miRNAs tend to be coexpressed with their target genes, suggesting that these miRNAs could be involved in neuronal homeostasis. Our results strongly suggest that coordinated transcriptional and miRNA-mediated regulation is a recurrent motif to enhance the robustness of gene regulation in mammalian genomes. INTRODUCTION MicroRNAs (miRNA) are post-transcriptional regulatory molecules recently discovered in animals and plants (review in (Bartel, 2004)). They have been shown to regulate diverse biological processes ranging from embryonic development to the regulation of synaptic plasticity (Carthew, 2006; Kloosterman and Plasterk, 2006). Primary miRNA transcripts are predominantly transcribed by RNA Polymerase II. After multiple steps of transcript processing, the mature miRNA (~22 bps) is incorporated into the RISC complex in the cytoplasm. Mature miRNAs suppress gene expression via imperfect base pairing to the 3′ untranslated region (3′UTR) of target mRNAs, leading to repression of protein production, and in some cases, mRNA degradation (Bartel, 2004; Carthew, 2006; Valencia-Sanchez et al., 2006). Hundreds of miRNA genes have been identified in mammalian genomes (Griffiths-Jones et al., 2006), and computational predictions indicate that thousands of genes could be targeted by miRNAs in mammals (John et al., 2004; Krek et al., 2005; Lewis et al., 2005; Rajewsky, 2006). These findings suggest that miRNAs play an integral role in genome-wide regulation of gene expression. Similar to electronic circuits, gene regulatory networks (GRN) are made up of basic subcircuits, such as feedback and feedforward loops. Pioneering work in E. coli has shown that certain subcircuits are favored by evolution and hence are significantly more abundant than others (Shen-Orr et al., 2002). The identification of these recurring subcircuits, called network motifs (Milo et al., 2002), has offered key insights into gene regulation. For instance, ~35% of E. coli transcription factors repress their own transcription and such negative auto-regulatory circuits can significantly accelerate transcriptional response time (Rosenfeld et al., 2002) and dampen protein expression fluctuations (Becskei and Serrano, 2000). Like transcriptional repressors, miRNAs are likely embedded in a large number of GRNs, in which certain miRNA-containing circuits may be recurrent. While all miRNAs operate through a repressive mechanism, their functions in networks need not be simply repressive; they could have diverse functions depending on the unique GRN context of individual miRNA-target interactions. Hence, the identification of recurring miRNA-containing motifs in GRNs would greatly increase our understanding of the functional roles of miRNAs in gene regulation. Only a few studies have experimentally explored miRNA function in the context of a GRN. They suggest that a key recurring function of miRNAs in networks is to reinforce the gene expression program of differentiated cellular states. For instance, the secondary vulva cell fate in C. elegans is promoted by Notch signaling, which also activates miR-61; miR-61 in turn post-transcriptionally represses an inhibitory factor of Notch signaling, thereby stabilizing the secondary vulva fate (Yoo and Greenwald, 2005). Networks of similar architecture can also be found in the asymmetric differentiation of left-right neurons in C. elegans (Johnston et al., 2005), eye and sensory organ precursor development in Drosophila (Li and Carthew, 2005; Li et al., 2006), and granulocytic differentiation in human (Fazi et al., 2005). The repressive effect of miRNAs on target expression is modest and is often limited to the level of translation with little effects on transcript abundance (Bartel, 2004). Thus, an important question is whether miRNAs act in concert with other regulatory processes, such as transcriptional control, to regulate target gene expression at multiple levels and with greater strength. One possibility is that the transcription of the miRNAs and their targets is oppositely regulated by common upstream factor(s) (Type II circuits, Figure 1
While individual examples of Type I and II circuits exist in mammalian GRNs, our goal is to determine whether these circuits are recurrent (i.e. more prevalent than would be expected by chance). Although existing experimental data suggests that Type II circuits are prevalent and Type I circuits are not, the number of examples is far too few to be conclusive. It is possible that the apparent lack of evidence for the prevalence of Type I circuits is due to the bias in the choice of experimental systems, i.e., most existing studies used cellular differentiation systems where Type II circuits function to reinforce differentiation decisions. Given the dearth of known miRNA-containing networks, it is infeasible to directly determine whether Type I/II circuits are recurrent. However, if a miRNA is involved in a larger number of Type I (Type II) circuits than expected by chance, one would expect the transcription of the miRNA and a significant number of its targets to be positively (negatively) correlated across diverse conditions. There are three challenges that complicate the identification of such correlation signatures. The first challenge is that large-scale expression data sets containing both miRNAs and protein-coding genes are lacking. We address this challenge by taking advantage of the large number of miRNAs that are embedded in the introns of protein-coding genes in human and mouse (Rodriguez et al., 2004). With few exceptions (e.g. miR-7 during Drosophila embryogenesis (Aboobaker et al., 2005)), the expression profiles of embedded miRNAs examined thus far are highly correlated to their host genes at both the tissue and individual cell levels (Aboobaker et al., 2005; Baskerville and Bartel, 2005; Li and Carthew, 2005), suggesting that they tend to be co-transcribed at identical rates from the same promoter(s) (Kim and Kim, 2007). Hence, the relative level of host-gene transcription across conditions can accurately serve as a proxy for that of the embedded miRNA(s), even though the steady-state levels of host-gene mRNA and that of the embedded miRNA(s) may be different. The second challenge is that only a few miRNA targets have been verified in vivo and computational target predictions can be noisy (Rajewsky, 2006). We address this challenge by developing a robust method that does not rely on target prediction to detect significant over-abundance of Type I and/or II circuits. The third challenge is that most existing mammalian expression data sets tend to study tissues, not individual cell types. While expression correlation over tissue conditions is likely due to transcriptional co-regulation by common upstream factors, cell-type heterogeneity in tissues can complicate the analysis. For example, some miRNAs and their targets could be expressed in distinct cell types within a tissue even though their averaged expression at the tissue level may suggest that their expression is correlated. To address this challenge, we analyze expression data from homogeneous neuronal cell populations (Arlotta et al., 2005; Sugino et al., 2006). We consistently observe that Type I and/or II circuits are prevalent for a significant fraction of the embedded miRNAs we analyzed, independent of the gene expression data sets used in the analysis, suggesting that these two circuit types are recurrent motifs in mammalian genomes. Strikingly, brain-enriched miRNAs tend to target brain-enriched genes and Type I circuits are especially prevalent in mature neurons. Our findings not only confirm that Type II circuits are abundant, but reveal the surprising genome-wide prevalence of Type I circuits, suggesting that miRNAs are employed in recurrent gene regulatory circuits to perform important biological functions in mammals. RESULTS Genes Highly Correlated in Expression with a miRNA are More likely to be Predicted as Targets We first sought to determine whether embedded miRNAs tend to correlate in expression with their putative targets, a phenomenon we term “targeting bias”, by analyzing the Novartis human expression atlas (Su et al., 2004) that comprises 79 distinct tissues/cell types (Figure 2A
To assess targeting bias, we first made a ranked list of genes for each embedded miRNA based on the extent of their expression correlation and compiled a list of targets for each miRNA using the TargetScanS algorithm (Lewis et al., 2005). To examine if a miRNA's predicted target set is enriched in genes that are highly correlated or anti-correlated in expression with the miRNA, we devised a statistical test based on the hypergeometric distribution (see Methods). For the 60 miRNAs we analyzed, 75% have a significantly higher number of predicted targets (P < 0.05) in the top- or bottom-10 percentile of the ranked expression-correlation list. For instance, the number of predicted targets of miR-153 that fall in the top-10 percentile of its ranked list is twice more than expected (P<10−30). In contrast, the predicted target set of only 8% of the miRNA we analyzed show significant enrichment for genes in the middle-10 percentile of the ranked list (Figure 2B A Computational Method for Predicting whether a miRNA is biased for Type I/II Circuits given an Expression Data Set Although our observation of targeting bias is encouraging, one potential concern is that the analysis requires miRNA target prediction, which can be noisy (Lewis et al., 2005; Rajewsky, 2006). Furthermore, genes with different expression patterns might have different 3′UTR length distributions (Stark et al., 2005). In principle, these problems might have contributed to the targeting bias we observed. Therefore, we developed an alternative method that avoids target prediction and uses a measure that is independent of 3′UTR lengths. The new measure stems from the observation that putative miRNA binding sites have a higher probability of being evolutionarily conserved across mammalian genomes (Lewis et al., 2005; Xie et al., 2005). We reasoned that if a miRNA (m) is enriched in Type I (II) networks, a higher than expected proportion of putative binding sites in the 3′UTR of genes positively (negatively) correlated in expression with m should be functional in vivo, and hence evolutionarily conserved. In essence, given a group of genes (G) and a miRNA seed, our method counts the number of seed matches (S) in the 3′UTRs of G and the number of those matches that are conserved (C) (gray box in Figure 3
To determine whether a miRNA m exhibits bias for Type I and/or II circuits given an expression data set, we first rank genes by their expression correlation with m and slide a fixed-size window across the ranked list to generate a series of gene groups (G) with decreasing degrees of expression correlation to m (Figure 3 Many Human and Mouse Embedded miRNAs Exhibit Bias for Type I/II Circuits We applied CE analysis to human embedded miRNAs, again using the Novartis atlas. Of the 60 miRNAs we analyzed, 67% have a significant CE score in their top-10 or bottom-10 percentile sets (P<0.05) (Figure 4A
To assess whether the abundance of Type I/II biases is a general feature of mammalian gene regulation, we conducted CE analysis on embedded miRNAs in mouse using the Novartis mouse expression atlas comprising 61 tissues/cell types (Su et al., 2004). The overall trends are consistent with those of human: of the 45 miRNAs analyzed, 69% exhibit bias for either type, with 42% and 36% displaying bias for Type I and II (9% showing both), respectively (Figure 4E The fact that brain-enriched miRNAs tend to target brain-enriched genes suggests that some of them may be positively co-regulated by transcription factors that specify neuronal fates (e.g. NRSF/REST (Ballas et al., 2005); see Discussion). However, due to the heterogeneity of cell types in brain tissues, it is possible that the expression of some of these miRNAs and their targets does not overlap at the single-cell level. This suggests that some brain-enriched miRNAs may also be enriched in Type II circuits, even though tissue-level analysis only shows Type I bias due to the lack of coverage of individual cell types. Therefore, expression data on homogeneous neuronal cell populations are needed to confirm whether Type I circuits are indeed prevalent in the adult brain (see below). Non-brain-enriched miRNAs with Type I bias are less likely to be confounded by cell type heterogeneity, because unlike brain-enriched miRNAs, they tend to be differentially regulated across diverse tissue and homogeneous cultured-cell conditions (e.g. miR-198; Figure 4B The Prevalence of Type I/II Biases Persists in Homogeneous Neuronal Cell Population Expression Profiles To address the concern that the human and mouse data sets consist mostly of tissues but are lacking in homogeneous cell-types, we conducted CE analysis on embedded miRNAs using two additional mouse expression data sets: the developmental time-course of three types of motor neurons (MDEV) (Arlotta et al., 2005) and profiles of 12 homogeneous neuronal cell types (NCELL) from five different brain regions (Sugino et al., 2006). Both data sets were obtained by careful isolation of homogeneous neuronal cell populations. The MDEV profiles comprise three motor neuron types over four developmental stages (E18, P3, P6, P14). The samples were isolated from the mouse neocortex using a combination of anatomical and cell sorting techniques. The expression data are highly reproducible across biological replicates isolated from different mice (Arlotta et al., 2005), indicating that noise resulting from the cross-contamination of other cell types is minimal. Of the 42 embedded miRNAs analyzed, 45% exhibit bias, with 29% and 24% having significant CE scores in their top-10 and bottom-10 percentile sets, respectively; while only 12% show significant CE scores in their middle-10 percentile sets (Figure 5
Similar trends were observed in the NCELL data set, though a higher proportion (61%) of the miRNAs analyzed exhibit bias, consistent with the fact that this data set consists of more diverse conditions than the MDEV profiles (Figure 5D While some miRNAs consistently exhibit bias across all data sets we analyzed, others only show bias in some data sets (see Supplemental Data for examples). This is to be expected as the bias detected by our method largely depends on the conditions profiled in each data set. As with protein-coding genes, the transcription of a miRNA is likely regulated by multiple cis regulatory modules (Howard and Davidson, 2004); thus a miRNA can be involved in a large number of Type I or II circuits via different cis modules (see Discussion). A particular set of profiled conditions in a given data set may only reveal transcription patterns due to the regulation of a subset of these modules. In essence, two key factors determine whether a miRNA would exhibit bias as detected by CE analysis: first, whether the expression of the miRNA is differentially regulated across the profiled conditions; and second, whether a higher than expected number of functional targets (functional implies the seed-matches of the miRNA have a higher probability of being conserved) are positively or negatively co-regulated with the miRNA. As more expression data containing both miRNAs and protein-coding genes become available, CE analysis can readily be applied to further dissect the prevalence of Type I and II circuits in diverse biological contexts. Identification of miRNAs with Neuronal-enriched Expression Patterns Motivated by the observation that brain-enriched miRNAs tend to target brain-enriched genes, we reasoned that brain-enriched miRNAs can be identified by searching for miRNAs with significant CE scores in a group of genes with a brain-enriched expression signature. To test this idea, we applied CE analysis to a group of mouse genes whose expression is upregulated across all neuronal tissues profiled in the Novartis atlas (Figure S1). Table 1 lists all miRNA seed-matches with a significant CE score (CE > 2.33, P<0.01). Strikingly, 9 out of 10 seeds at the top of the list correspond to miRNAs known to be brain-specific or -enriched. For instance, miR-124a is one of the most abundant and ubiquitously expressed miRNAs in the brain (Lagos-Quintana et al., 2002). Other notable brain-enriched miRNAs in the list include miR-9,-125,-153, and 218 (Cao et al., 2006). This result further supports our conclusion that brain-enriched miRNAs tend to target brain genes, and provides direct evidence that the CE score is a biologically sensitive measure of whether a miRNA functionally interacts with a statistically significant number of targets in a gene group. High-scoring miRNAs in Table 1 not previously shown to be neuronal warrant experimental confirmation of their brain-enriched expression pattern.
DISCUSSION In this study, we found that the expression of embedded miRNA host genes and their predicted miRNA targets tend to be positively or negatively correlated (Figure 2 Negative expression correlation between a miRNA and its putative targets has been reported for several tissue-specific mammalian miRNAs (e.g. miR-133), where an increase in the miRNA level often coincides with a decrease in the levels of its target transcripts at the tissue level (Farh et al., 2005; Sood et al., 2006). However, this phenomenon can not be solely attributed to the repressive nature of miRNA-mediated gene regulation because targeting by miRNAs often lead to translational inhibition but has limited effects on target mRNA levels (Bartel, 2004; Doench et al., 2003; Doench and Sharp, 2004). A plausible explanation is that the observed negative correlation is due to Type II circuits where the suppression of target mRNA levels is mainly driven by transcriptional control and miRNAs play a modulatory role to reinforce such decisions (Bartel and Chen, 2004). This notion is further supported by the observed positive expression correlation between a miRNA and its targets in Type I circuits: elevated miRNA levels do not necessarily lead to lower target mRNA abundance at the tissue and individual cell type levels. The prevalence of positive expression correlation in miRNA-target pairs is surprising and counterintuitive given the repressive nature of miRNAs. Although some positively correlated miRNA-target pairs may result from localized expression of a miRNA and its target in distinct cell types within a tissue (Stark et al., 2005), mutually exclusive expression cannot account for all positively correlated miRNA-target pairs. Indeed, Type I signatures remained prevalent when CE analysis was applied to expression data sets obtained from homogeneous neuronal cell populations in the adult mouse forebrain and at distinct stages of cortical motor neuron development (Figure 5 The finding that brain-enriched miRNAs tend to target brain-enriched genes suggests that the primary function of these miRNAs is not to reinforce the suppression of genes specific to other tissues. However, our result does not imply that neuronal-enriched miRNAs tend to only participate in Type I circuits in the adult brain. Indeed, we found that these miRNAs could exhibit bias for either network type when data from homogeneous neuronal cell populations were used in our analysis. Interestingly, the number of miRNAs exhibiting Type I bias is still significantly higher than those showing Type II bias in the NCELL data set, even though this phenomenon was not observed in the MDEV data set. This suggests that Type I circuits may be more prevalent in networks operating in homeostasis—perhaps partly due to the need for such circuits to maintain protein steady-state and regulate local translation in neurons (see below)—as NCELL consists of mature neurons whereas MDEV only covers developing neurons. The following aspects might complicate the interpretation of our results. First, the prevalence of Type I/II circuits could be a special feature of embedded miRNAs. This is unlikely, however, because more than 80% of all known miRNAs in human and mouse reside in introns of coding or non-coding genes (Kim and Kim, 2007; Rodriguez et al., 2004). In this sense, there are no obvious features that distinguish the miRNAs we analyzed from the rest. In addition, embedded miRNAs are not homogeneous by any measure: they have diverse expression patterns and likely function in disparate biological processes. Second, post-transcriptional control of the host gene and/or the miRNA could uncouple their expression (Obernosterer et al., 2006; Thomson et al., 2006; Wulczyn et al., 2007). However, this phenomena is likely condition-specific (e.g. early development (Thomson et al., 2006)) as the steady-state expression of host genes and their embedded miRNA(s) tend to be correlated (Aboobaker et al., 2005; Baskerville and Bartel, 2005; Li and Carthew, 2005; Rodriguez et al., 2004). Importantly, the topology of Type I and II circuits does not preclude the possibility of post-transcriptional control of the miRNA. For example, the miRNA need not be immediately active after transcription in both circuit types (e.g. a delay between transcription and maturation). Negative Auto-regulatory Feedback by Embedded miRNAs The prevalence of Type I signatures suggests that miRNAs may be often employed in negative feedback circuits (Figure 1 Type I and II Circuits in Neural Development The positively correlated expression of miRNAs and targets in Type I circuits could be due to their sharing of cis regulatory modules regulated by common upstream transcription factors. In neuronal development, NRSF is a master transcriptional repressor that inhibits the expression of neuronal genes in non-neuronal cells and in neuronal progenitors prior to differentiation (Chong et al., 1995). Experimental and computational studies have identified hundreds of protein-coding genes and a handful of miRNAs (e.g. miR-9, -29, and -135b) under the control of NRSF (Conaco et al., 2006; Mortazavi et al., 2006; Wu and Xie, 2006). Notably, miR-29 and miR-135b have statistically significant CE scores (2.27 and 2.68, respectively) in genes with NRSF binding sites and brain-enriched expression patterns (data not shown), suggesting that a larger than expected number of functional targets of miR-29 and miR-135b are co-repressed by NRSF, and they may be co-activated as NRSF level goes down during neuronal development. Interestingly, the 3′UTR of NRSF has conserved putative binding sites for both miR-29 and miR-135b, likely forming Type II circuits (Wu and Xie, 2006). Indeed, in principle a miRNA can be involved in a large number of Type I and/or II circuits because a miRNA can target multiple genes (Figure 6A
Potential Functions of Type I Circuits Recurrent network motifs are likely a result of convergent evolution at the network level, presumably because certain circuit topologies are particularly versatile in carrying out important functions in cells. A plausible function of the miRNA-mediated negative feedback/feedforward loop (MNFL) in Type I circuits is to define and maintain target-protein steady-states. The eukaryotic cell is a noisy environment in which transcription often occurs in a bursting manner (Blake et al., 2006; Golding et al., 2005; Raj et al., 2006), causing the number of mRNAs per cell—which can go as low as fractions of a copy when averaged over a population—to fluctuate significantly. Since other processes in gene expression, such as mRNA degradation and protein translation, are also stochastic in nature, protein levels in turn may fluctuate considerably over time (Kaern et al., 2005). Importantly, such fluctuations can propagate through the network, e.g. fluctuations in the level of an upstream transcription factor can contribute significantly to expression fluctuations of downstream genes (Pedraza and van Oudenaarden, 2005; Rosenfeld et al., 2005). In Type I circuits (Figure 1 Noise buffering by Type I circuits may be especially common in circuits with positive feedback loops where fluctuations in any component can be amplified, driving the system to switch states (Figure 6C Another potential function of Type I circuits is the regulation of local translation in neurons. It has been proposed that the translation of neuronal mRNAs is often repressed in transport granules and at local synapses, and that the inhibition can be released in response to synaptic activity (reviewed in Kiebler and Bassell, 2006). Neuronal miRNAs may be transcriptionally co-activated with their targets and constitutively lower their targets' translation rate to facilitate activity-dependent local translation. Consistent with this notion, a brain-specific microRNA, miR-134, has been shown to function directly as a repressor of LimK1 translation under basal conditions. This inhibition can be relieved by the BDNF signaling pathway, thereby allowing for local translation and spine growth (Schratt et al., 2006). Although the detailed molecular mechanisms remain to be identified, the enrichment of Type I bias in the NCELL data set supports the idea that miRNAs have important functions in local translation and synaptic plasticity. Potential Functions of Type II Circuits In Type II circuits, a miRNA regulates its targets coherently with transcriptional control, thereby reinforcing transcriptional logic at the post-transcriptional level. Under conditions where the target genes are transcriptionally suppressed, such circuits can serve as a surveillance mechanism to suppress “leaky” transcription of target genes (Bartel and Chen, 2004; Hornstein and Shomron, 2006; Stark et al., 2005). While the basic function of Type II circuits is intuitive, it can have sophisticated functions in networks. For instance, the miRNA-mediated repression in Type II circuits can be part of a positive feedback loop, in which the target gene encodes a transcription factor that can down-regulate the miRNA's expression. Such an example can be found in Drosophila eye development, where the reciprocal repression between miR-7 and Yan ensures their mutually exclusive expression pattern: Yan is expressed in progenitor cells and miR-7 in photoreceptor cells (Li and Carthew, 2005). This circuit can be switched by EGFR signaling, which transiently triggers Yan degradation. A decrease in Yan levels relieves miR-7 from transcriptional repression, subsequently leading to the depletion of Yan in photoreceptor cells (Li and Carthew, 2005). Positive feedback loops are often employed in such toggle-switch circuits where a transient signal can be converted into a long-lasting cellular response (Ferrell, 2002). Since most putative targets have only one binding site for a miRNA, it is likely that miRNAs would act in concert with other miRNAs and/or regulatory processes to increase the feedback strength. We speculate that miRNAs may be involved in a large number of similar positive feedback loops to enhance the robustness of irreversible cellular differentiation. Conclusion In conclusion, our results provide strong evidence that coordinated transcriptional and post-transcriptional regulation via miRNAs is a recurrent motif to enhance the robustness of gene regulation in mammalian genomes. If, as suggested by our findings, miRNA-mediated repression tends to play modulatory and/or reinforcing roles in networks, miRNA loss-of-function phenotypes may be subtle and quantitative experimentation at the single-cell level (Acar et al., 2005), for instance, may be necessary to reveal their functions. Further exploration of miRNA-mediated repression in the context of gene regulatory networks will provide a comprehensive view on how gene expression is regulated at the systems level. EXPERIMENTAL PROCEDURES Embedded microRNAs Human and mouse RefSeq (Wheeler et al., 2006) genes and miRNA genomic coordinates were extracted from the UCSC 2004/May human and 2005/March mouse databases, respectively (http://genome.ucsc.edu/). We picked embedded miRNAs that reside on the same strand as the host, either in introns or UTRs of Refseq genes. Gene Expression Data Processing and Host Transcript Mapping The pre-normalized Novartis human/mouse atlas data, along with the probe annotations were downloaded from Novartis (http://wombat.gnf.org/index.html). The pre-normalized MDEV and NCELL data were downloaded from the NCBI GEO database (http://www.ncbi.nlm.nih.gov/projects/geo/). We assigned an integer id to probes on each microarray for ease of reference (Table S7-9). Probes for individual miRNA host genes were mapped and erroneous probes were removed before further analysis (see Supp. Data). To obtain relative expression levels, log-transformed expression of individual probes on an array is further normalized by subtracting the probe-median across conditions and divide by the corresponding standard deviation. Hierarchical Clustering and Gene Groups The hierarchical clustering module in GenePattern (Reich et al., 2006) was used. The parameters used were: average linkage, row centered with median, and distance using Pearson correlation. Spearman ranked correlation was also tested as the distance measure, but the results were essentially the same. Target Predictions We implemented the TargetScanS (Lewis et al., 2005) algorithm where we searched for 7-mer seed matches in the 3′UTRs of RefSeq genes. 3′UTRs were downloaded from the UCSC human/mouse database (see above), along with their multiz alignments. Given the multiple alignment of each UTR, we searched for 7-mer (no gaps) human miRNA seed-matches (m1-m7 and m2-m8) that are perfectly conserved across the human, mouse, rat, and dog genomes. Targeting Bias Calculations For each embedded miRNA host probe, we compiled a ranked list of genes based on their expression correlation to the probe. We then counted the number of genes (Kx) in a miRNA's predicted target set (T) that overlap with each of the X percentiles (where X can be top-10, middle-10, or bottom-10) of the ranked list. If miRNA target prediction is random, on average we expect 10% of the predicted target set to overlap each of these sets. Since target prediction corresponds to sampling without replacement, the sampling distribution is hypergeometric with parameters (N, 0.1N, |T|), where N is the total number of genes; and the sampling variance (s) can be computed exactly as: We then computed the z-score for each of the X percentile sets to indicate the degree of enrichment: The exact P value for enrichment (or depletion) can be obtained by computing the cumulative distribution of the above hypergeometric distribution. Conservation Enrichment Analysis For each miRNA host probe, we generated a ranked list of genes based on their expression correlation to the host probe. Since some genes can be represented by multiple probes on the microarray, we removed such redundancies so that each entry in the ranked list is unique. The rest of the procedure is as described in the main text (Figure 3 Predicting Auto-feedback Loops Conserved loops came directly from TargetScanS predictions. For non-conserved loops, the 3′UTRs of host genes were scanned for 7-mer seed-matches of the respective embedded miRNA. We used Miranda (Enright et al., 2003; John et al., 2004) to scan for energetically favorable binding sites. The parameters used were: score cutoff = 50, free energy cutoff = −20, scale=4 (to bias 5′ matches with a factor of 4), gap extension penalty=−4 (to allow loops towards the 3′ end). Identifying Brain-enriched miRNAs Standard k-means (k=40) clustering was applied to the Novartis mouse atlas using the Cluster tool (Eisen et al., 1998). We computed the CE scores of all known miRNA seeds for one of the resulting clusters that contains 1012 Refseq genes that are expressed in the brain tissues profiled. 01: Supplemental Data Supplemental Data include Figures S1-6, Tables S1-S11, additional details on probe selection, and comparison of results among the four data sets we analyzed. Click here to view.(2.6M, pdf) 02 Click here to view.(2.2M, xls) 03 Click here to view.(1.6M, xls) 04 Click here to view.(1.7M, xls) 05 Click here to view.(479K, xls) 06 Click here to view.(418K, xls) Acknowledgments This work was supported by a NSERC PGS Scholarship to J.T., a Basil O'Connor Starter Scholar award to J.Z., and grants from the NIH and NSF to A.v.O. We thank Mike Hu for introducing us to mammalian embedded miRNAs; Dale Muzzey, Margaret Ebert, Arjun Raj, Scott Rifkin, Phillip Sharp, Hunt Willard, and Han Wu for comments on the manuscript. Footnotes Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||
Cell. 2004 Jan 23; 116(2):281-97.
[Cell. 2004]Curr Opin Genet Dev. 2006 Apr; 16(2):203-8.
[Curr Opin Genet Dev. 2006]Dev Cell. 2006 Oct; 11(4):441-50.
[Dev Cell. 2006]Genes Dev. 2006 Mar 1; 20(5):515-24.
[Genes Dev. 2006]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D140-4.
[Nucleic Acids Res. 2006]Nat Genet. 2002 May; 31(1):64-8.
[Nat Genet. 2002]Science. 2002 Oct 25; 298(5594):824-7.
[Science. 2002]J Mol Biol. 2002 Nov 8; 323(5):785-93.
[J Mol Biol. 2002]Nature. 2000 Jun 1; 405(6786):590-3.
[Nature. 2000]Science. 2005 Nov 25; 310(5752):1330-3.
[Science. 2005]Proc Natl Acad Sci U S A. 2005 Aug 30; 102(35):12449-54.
[Proc Natl Acad Sci U S A. 2005]Cell. 2005 Dec 29; 123(7):1267-77.
[Cell. 2005]Genes Dev. 2006 Oct 15; 20(20):2793-805.
[Genes Dev. 2006]Cell. 2005 Dec 2; 123(5):819-31.
[Cell. 2005]Cell. 2004 Jan 23; 116(2):281-97.
[Cell. 2004]Science. 2005 Dec 16; 310(5755):1817-21.
[Science. 2005]Proc Natl Acad Sci U S A. 2006 Feb 21; 103(8):2746-51.
[Proc Natl Acad Sci U S A. 2006]Cell. 2005 Dec 16; 123(6):1133-46.
[Cell. 2005]Nature. 2005 Jun 9; 435(7043):839-43.
[Nature. 2005]Genome Res. 2004 Oct; 14(10A):1902-10.
[Genome Res. 2004]Proc Natl Acad Sci U S A. 2005 Dec 13; 102(50):18017-22.
[Proc Natl Acad Sci U S A. 2005]RNA. 2005 Mar; 11(3):241-7.
[RNA. 2005]Cell. 2005 Dec 29; 123(7):1267-77.
[Cell. 2005]EMBO J. 2007 Feb 7; 26(3):775-83.
[EMBO J. 2007]Nat Genet. 2006 Jun; 38 Suppl():S8-13.
[Nat Genet. 2006]Neuron. 2005 Jan 20; 45(2):207-21.
[Neuron. 2005]Nat Neurosci. 2006 Jan; 9(1):99-107.
[Nat Neurosci. 2006]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Nat Rev Cancer. 2006 Apr; 6(4):259-69.
[Nat Rev Cancer. 2006]Cell. 2005 Jan 14; 120(1):15-20.
[Cell. 2005]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Cell. 2005 Jan 14; 120(1):15-20.
[Cell. 2005]Nat Genet. 2006 Jun; 38 Suppl():S8-13.
[Nat Genet. 2006]Cell. 2005 Dec 16; 123(6):1133-46.
[Cell. 2005]Nature. 2005 Mar 17; 434(7031):338-45.
[Nature. 2005]Cell. 2005 Jan 14; 120(1):15-20.
[Cell. 2005]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Cell. 2005 May 20; 121(4):645-57.
[Cell. 2005]Neuron. 2005 Jan 20; 45(2):207-21.
[Neuron. 2005]Nat Neurosci. 2006 Jan; 9(1):99-107.
[Nat Neurosci. 2006]Neuron. 2005 Jan 20; 45(2):207-21.
[Neuron. 2005]Dev Biol. 2004 Jul 1; 271(1):109-18.
[Dev Biol. 2004]Curr Biol. 2002 Apr 30; 12(9):735-9.
[Curr Biol. 2002]Science. 2005 Dec 16; 310(5755):1817-21.
[Science. 2005]Proc Natl Acad Sci U S A. 2006 Feb 21; 103(8):2746-51.
[Proc Natl Acad Sci U S A. 2006]Cell. 2004 Jan 23; 116(2):281-97.
[Cell. 2004]Genes Dev. 2003 Feb 15; 17(4):438-42.
[Genes Dev. 2003]Genes Dev. 2004 Mar 1; 18(5):504-11.
[Genes Dev. 2004]Cell. 2005 Dec 16; 123(6):1133-46.
[Cell. 2005]Nat Rev Genet. 2004 May; 5(5):396-400.
[Nat Rev Genet. 2004]PLoS Biol. 2005 Mar; 3(3):e85.
[PLoS Biol. 2005]Science. 2005 Dec 16; 310(5755):1817-21.
[Science. 2005]EMBO J. 2007 Feb 7; 26(3):775-83.
[EMBO J. 2007]Genome Res. 2004 Oct; 14(10A):1902-10.
[Genome Res. 2004]RNA. 2006 Jul; 12(7):1161-7.
[RNA. 2006]Genes Dev. 2006 Aug 15; 20(16):2202-7.
[Genes Dev. 2006]FASEB J. 2007 Feb; 21(2):415-26.
[FASEB J. 2007]PLoS Biol. 2005 Mar; 3(3):e85.
[PLoS Biol. 2005]Science. 2005 Dec 16; 310(5755):1817-21.
[Science. 2005]Cell. 1995 Mar 24; 80(6):949-57.
[Cell. 1995]Proc Natl Acad Sci U S A. 2006 Feb 14; 103(7):2422-7.
[Proc Natl Acad Sci U S A. 2006]Genome Res. 2006 Oct; 16(10):1208-21.
[Genome Res. 2006]Genome Biol. 2006; 7(9):R85.
[Genome Biol. 2006]Mol Cell. 2006 Dec 28; 24(6):853-65.
[Mol Cell. 2006]Cell. 2005 Dec 16; 123(6):1025-36.
[Cell. 2005]PLoS Biol. 2006 Oct; 4(10):e309.
[PLoS Biol. 2006]Nat Rev Genet. 2005 Jun; 6(6):451-64.
[Nat Rev Genet. 2005]Science. 2005 Mar 25; 307(5717):1965-9.
[Science. 2005]Nat Rev Genet. 2004 May; 5(5):396-400.
[Nat Rev Genet. 2004]Nature. 2005 Jun 9; 435(7043):839-43.
[Nature. 2005]Nature. 2005 May 12; 435(7039):228-32.
[Nature. 2005]Science. 2005 Oct 21; 310(5747):496-8.
[Science. 2005]Bioessays. 1999 Oct; 21(10):866-70.
[Bioessays. 1999]Nat Rev Genet. 2004 May; 5(5):396-400.
[Nat Rev Genet. 2004]Nat Genet. 2006 Jun; 38 Suppl():S20-4.
[Nat Genet. 2006]Cell. 2005 Dec 16; 123(6):1133-46.
[Cell. 2005]Cell. 2005 Dec 29; 123(7):1267-77.
[Cell. 2005]Curr Opin Cell Biol. 2002 Apr; 14(2):140-8.
[Curr Opin Cell Biol. 2002]Nature. 2005 May 12; 435(7039):228-32.
[Nature. 2005]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D173-80.
[Nucleic Acids Res. 2006]Nat Genet. 2006 May; 38(5):500-1.
[Nat Genet. 2006]Cell. 2005 Jan 14; 120(1):15-20.
[Cell. 2005]Genome Biol. 2003; 5(1):R1.
[Genome Biol. 2003]PLoS Biol. 2004 Nov; 2(11):e363.
[PLoS Biol. 2004]Proc Natl Acad Sci U S A. 1998 Dec 8; 95(25):14863-8.
[Proc Natl Acad Sci U S A. 1998]