![]() | ![]() |
Formats:
|
||||||||||||||||
Copyright © 2007 by The National Academy of Sciences of the USA Genetics Evidence of spatially bound gene regulation in Mus musculus: Decreased gene expression proximal to microRNA genomic location *School of Biomedical Science and †Institute of Biomaterials and Biomedical Engineering, Tokyo Medical and Dental University, Chiyoda-ku, Tokyo 101-0062, Japan; ‡Informatics Program, Children's Hospital, Center for Biomedical Informatics, and Partners Center for Genetics and Genomics, Harvard Medical School, Boston, MA 02115; and §Division of Health Sciences and Technology, Harvard University and Massachusetts Institute of Technology, Cambridge, MA 02139 ¶To whom correspondence should be addressed at: Children's Hospital Informatics Program, 300 Longwood Avenue, Boston, MA 02115., E-mail: isaac_kohane/at/harvard.edu Communicated by Louis M. Kunkel, Harvard Medical School, Boston, MA, December 13, 2006. Author contributions: H.I. and Y.F. contributed equally to this work; I.S.K. designed research; H.I., Y.F., and I.S.K. performed research; H.I., Y.F., and I.S.K. analyzed data; and I.S.K. wrote the paper. Received August 29, 2006. Freely available online through the PNAS open access option. Abstract The extent, spatially and in time, of the phenomenon of localized decreased expression in the chromosomal vicinity of microRNA (miRNA) previously described in Caenorhabditis elegans is reproduced in Mus musculus across a wide range of tissues in several independent experiments. Computationally predicted miRNA targets are enriched in the vicinity of miRNAs, and transcription factors are identified as the class of genes that systematically exhibit this localized decrease. Also, those mRNA with AT-rich UTRs, particularly those that are not in the vicinity of CpG islands, most often exhibit this localized decrease. This localization broadens with the shift from developing to mature/differentiated tissues and suggests a developmentally controlled and spatially bound regulation. Keywords: cis-regulation, development, transcription The regulatory properties of colocation of genetic elements in chromosomal linear space (1, 2) and in the three-dimensional space of chromosomes and the nucleus (3) have become increasingly apparent in eukaryotes. Coexpression bias of proximal genes has been documented across a range organisms, from Saccharomyces cerevisiae (yeast) (4), Arabidopsis thaliana (mustard plant), Caenorhabditis elegans (worm) (5), Drosophila melanogaster (fly), Mus musculus (mouse), to Homo sapiens (6–9). The role, if any, of noncoding RNAs and microRNAs (miRNAs) in particular in this localized control of expression has yet to be determined. miRNAs are central gene expression regulators, functioning as posttranscriptional suppressors (10, 11). In metazoa, these short noncoding RNAs induce translational repression, and mRNA decay by binding to target mRNAs and therefore affect mRNA levels directly (12–15). Understanding of what determines the specificity of mRNA targeting by miRNA is incomplete: It is known that recognition of a target mRNA occurs in the context of a protein complex (RNA-induced silencing complex) that requires neither perfect sequence complementarity nor thermodynamic stability between the miRNA and its target. Two broad classes of miRNA-binding sites in animal mRNA have been identified: 5′ dominant and 3′ compensatory (16). Targets that are 5′ dominant have consecutive Watson–Crick base pairings with the 5′ end of the miRNA (often called the miRNA “seed”) and have lesser complementarity to the rest of the molecule. Targets that are 3′ compensatory have a weak 5′ base pairing and rely on strong compensatory pairing to the 3′ end of the miRNA. There is also mounting evidence that small noncoding RNAs have an important role in the control of the dynamics of localized gene expression through heterochromatin formation (17–19). Furthermore, analysis of C. elegans expression data sets provides evidence that gene expression tends to be decreased locally in the neighborhood of miRNAs throughout the genome, at least in that organism (20). Consequently we have hypothesized that miRNAs may have a role in the localized expression of genes in their neighborhood in “higher” organisms, including mammals. As a preliminary exploration of this hypothesis, we have determined the extent, spatially along the chromosome and in developmental time, of the phenomenon of localized decreased expression in M. musculus across a wide range of tissues in independent experiments. We defined a series of increasingly large chromosomal “windows” centered on each documented miRNA location. For each tissue, the expression of mRNA within each successive window was normalized by the overall expression of mRNA in that tissue as measured by expression microarrays. We compared these levels with those calculated in windows selected from 49,800 random locations in the murine genome. Furthermore, we determined how the composition of the proximal genome might affect the observed expression levels, including the presence of CpG islands, AT enrichment of mRNA targets, and proximal distribution of mRNA targets of each miRNA. Results The phenomenon of lower expression in the neighborhood of miRNAs is marked (approximately two-fold decrease) and widespread. We found extensive evidence of localized decreased gene expression in the neighborhood of miRNAs in many murine developmental models, including eight tissue-specific developmental models (see Methods). Localized decreased expression was observed in normal mouse cerebellar development (21) and in the ptch mutant mouse (21), lung development (22), thymic T cell (23), preimplantation embryo development (24), oocyte development (25), and two models of skeletal muscle cell growth (26, 27). These decreases in expression were not found in the neighborhood of randomly selected chromosomal locations. This finding is illustrated in Fig. 1
Of the 349 genes that showed lower expression in at least two experiments from either developmental or mature tissue data sets (to limit the number of genes included whose expression was not systematically affected in the miRNA neighborhood) and are in a 400-kb window of any miRNA, the only molecular function category that showed enrichment by the Fischer exact test (P < 0.0001) was transcription factor activity/DNA binding (see Methods). One hundred and fifty-four genes that had higher expression in two or more experiments were only enriched for translational initiation factor activity as shown in Table 1. The full list of genes is shown in supporting information (SI) Tables 2 and 3. To determine to what extent the genomic distribution of transcription factors might contribute to this finding, we determined which gene ontology (GO) (28) categories were overrepresented in the vicinity of miRNA, irrespective of gene expression level. Within 100 kb of miRNA, no category reached significance after Bonferroni correction, although the categories of nucleic acid binding/DNA binding and translation/translation initiation were top-ranked. The elimination of another possible confounder of these findings, miRNA clustering (29, 30), did not alter the above findings (see Methods).
The gene expression profile around miRNA was repeated for mature tissues to determine whether the effect noted in development persisted. The analysis was repeated for the following mature tissues obtained in one set of experiments described by Su et al. (31): mouse myocardium, kidney, brain, liver, lung, skeletal muscle, spleen, and thymus. Again, localized decreased gene expression was noted in all eight tissues but was even broader than in the developing tissues. Fig. 1
To determine whether targeting of mRNA transcripts by the local/cis miRNA might account for the observed decreased local expression, the number of putative mRNA targets for each miRNA was calculated in windows ranging from 10 kb to 200 mb centered on the miRNA. By computationally predicting targets (TargetscanS; ref. 32), the enrichment of targets in the larger windows was found to be lower than in the more proximal windows when normalized for gene count per window. The peak enrichment for targets as calculated by TargetscanS appears between 100 and 400 kb with a steady decrease to 15% of the peak value. The trend was not observed with the random location-centered data and is illustrated in Fig. 4
DNA methylation at CpG-rich sites is a component of one of the well documented mechanisms of transcriptional silencing (35). If the decreased expression levels in the neighborhood of miRNA were mediated in part by this mechanism, then we would expect a difference in the expression profiles in the vicinity of those miRNA in CpG-rich chromosomal regions versus those that were CpG-poor. The developmental data sets were reanalyzed by splitting them into two roughly equally sized sets: those miRNA That had at least one CpG island identified in silico within 100 kb (130 miRNA) and those without such an island (109 miRNA). As shown in Fig. 5
Discussion In our broadening understanding of the role of small RNAs in genetic regulation, their role in gene silencing and particularly epigenetic regulation, including the initiation of heterochromatin formation has only recently become apparent (18, 19). We provide here evidence of a modest but systematic and widespread decreased expression of coding gene mRNA that is localized to and centered on the position of miRNA throughout the murine genome. Furthermore, this localized decrease is robustly reproduced. It is noted both in late development (i.e., after apparent tissue fate determination) in eight independent experiments and eight different mature differentiated tissues. The extent of this effect appears to extend to ≈105 bp in the developing tissues but tends to have a larger extent (approximately106 bp) in the differentiated tissues. This trend is reproducible within single-tissue developmental time series. These results are consistent with earlier results described in the worm transcriptome (20). This localized decreased expression effect is a subtle one in that several genes are expressed at high levels in proximity to miRNAs. It only becomes apparent across thousands of genes, which speaks to the dominance of other mechanisms. Nevertheless, the persistence of this effect genome-wide does suggest a robust control mechanism. The observation that the only functional category of gene that is statistically enriched in this localized effect is that of DNA-binding proteins or transcription factors suggests that the primary consequence of this phenomenon is indirect, through the modulation of the expression of the regulators. Across the aforementioned multiple independent experiments, this hypothesis is further supported by the highly significant enrichment of miRNA targets within the vicinity of each miRNA that are specific to that miRNA. Due to the relatively low specificity and sensitivity of current computational predictors of miRNA targets, the quantification of this enrichment can only be tentative at present. Given the role of miRNA in epigenetic silencing (17), the interaction between CpG islands, gene expression, development and proximity to miRNAs that we observe here is perhaps not surprising even if the genome-wide nature of this phenomenon is previously unreported. The observed increased AT richness of 3′ UTRs of the genes in the miRNA neighborhoods without proximal CpG islands vs. those that do have proximal CpG islands is consistent with the observation described in ref. 36 of transcriptional and translational function of the genes with such AT-rich UTRs. Nonetheless, as noted above, AT enrichment alone does not identify those genes that are likely to have lower expression levels in the vicinity of a miRNA. Unaccounted for is the enrichment for translation initiation factors among those genes up-regulated in the vicinity of miRNA. Whether this phenomenon is a primary effect of miRNA targeting or a secondary regulatory effect remains to be determined. Nonetheless, this observation is intriguing because many of these genes are members of the Argonaute family. Argonaute genes have been implicated in miRNA-mediated translation repression and miRNA-directed mRNA degradation and chromatin modification (14, 37, 38). As to the function of this large-scale organization of miRNA-centered domains of decreased expression increasing with maturation, two possibilities suggest themselves. First, beyond the initiation of heterochromatin formation, miRNA may be involved (as are other noncoding RNAs; refs. 39 and 40) in the extension of heterochromatic domains as the developmental program progresses (41). Second, as chromatin unwinds to allow miRNA expression, neighboring transcription factors (TFs) may have coevolved 3′ UTR targets for these miRNA so as to avoid large-scale trans-acting effects of these TFs across the genome in the absence of specific TF enhancers. Determination of how these localized effects change with experimentally modified levels of specific miRNA levels may be illuminating in this regard. Methods The position pi of each miRNA documented in the murine genome was obtained from the Sanger database (http://microrna.sanger.ac.uk/sequences/ftp.shtml). Gene expression within each data set was mean and unit SD normalized. Successively larger windows wj of length l were centered on each pi. All of the coding genes gk within wj were identified, and the average expression of all of the gk within wj was calculated and expressed as a quotient with respect to the expression of all measured genes in a particular data set (“normalized expression level”). For comparison, an equal number of windows of length l were identified centered on randomly picked locations in the mouse genome, and the same window-specific average expression levels were calculated. For this purpose, 49,800 random locations were picked. For the chromosomal plots of Fig. 3 Gene Category Enrichment. For each data set, the expression levels were normalized by the overall average. The genes in a 400-kb window were extracted as the subject of the analysis. Genes with a normalized value of ≥1.1 were categorized as increased. Similarly, genes with a normalized value of ≤0.9 were regarded as decreased. Then for each gene, the number of conditions (across all of the eight developmental tissue data sets), in which the gene was increased (or decreased), was counted. The enrichment of each category of gene annotation was calculated by using the GOHyperG program in Bioconductor (43) and with the microarray platform used for the expression studies as the source of background annotation frequencies. GOHyperG calculates the P values from the hypergeometric distribution for the GO categories on the basis of the background frequency of each GO category on a specified expression microarray. This procedure is equivalent to using Fisher's exact test (44). mRNA Targets in miRNA Vicinities. CpG Island Determination. We downloaded a list of CpG islands (file seq_1020;cpg_islands.md.gz) from the National Center for Biotechnology Information web site (ftp.ncbi.nlm.nih.gov/genomes/M_musculus/mapview). We used CpG islands which met the “strict” conditions of (i) 500-bp minimum length, (ii) ≥50% GC content, and (iii) ≥0.60 observed CpG/expected CpG (45). The list includes the chromosomal position of each CpG island (if known). We then calculated a distance between every possible combination of CpG island and miRNA on the same chromosome. Because approximately half of the miRNAs have at least one CpG island within 100-kb proximity, we considered an miRNA that has a CpG island located <100 kb away as an miRNA with CpG and the others as miRNAs without CpG. The average normalized expression level of the genes near each group of miRNAs was calculated in differently sized windows for the two groups. Results are plotted in Fig. 5 miRNA Clusters. Clusters of miRNA were defined as contiguous groups of miRNAs in which no miRNA is more distant than 10 kb from another miRNA. The average expression levels were recalculated across the same successive windows as before but with and without the miRNA clusters. As shown in SI Fig. 8, there was no significant difference in the locally decreased pattern expression of coding genes around clustered vs. unclustered miRNA. Supporting Information
Acknowledgments We thank Prof. Simon Kasif, Prof. Joseph Majzoub, Prof. Louis Kunkel, Alal Eran, and the reviewers for valuable suggestions for improving the manuscript. I.S.K. was supported in part by the National Institutes of Health National Center for Biomedical Computing Grant 5U54LM008748-02. Footnotes The authors declare no conflict of interest. This article contains supporting information online at www.pnas.org/cgi/content/full/0611078104/DC1. References 1. Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, van Sluis P, Hermus MC, van Asperen R, Boon K, Voute PA, et al. Science. 2001;291:1289–1292. [PubMed] 2. Lawrence JG. Cell. 2002;110:407–413. [PubMed] 3. Stein GS, Lian JB, Montecino M, Stein JL, van Wijnen AJ, Javed A, Pratap J, Choi J, Zaidi SK, Gutierrez S, et al. Chromosome Res. 2003;11:527–536. [PubMed] 4. Cohen BA, Mitra RD, Hughes JD, Church GM. Nat Genet. 2000;26:183–186. [PubMed] 5. Roy PJ, Stuart JM, Lund J, Kim SK. Nature. 2002;418:975–979. [PubMed] 6. Lee JM, Sonnhammer EL. Genome Res. 2003;13:875–882. [PubMed] 7. Megy K, Audic S, Claverie JM. Genome Biol. 2003;4:P1. [PubMed] 8. Fukuoka Y, Inaoka H, Kohane IS. BMC Genomics. 2004;5:4. [PubMed] 9. Thygesen HH, Zwinderman AH. BMC Bioinformatics. 2005;6:10. [PubMed] 10. Bartel DP. Cell. 2004;116:281–297. [PubMed] 11. Ambros V. Nature. 2004;431:350–355. [PubMed] 12. Wu L, Fan J, Belasco JG. Proc Natl Acad Sci USA. 2006;103:4034–4039. [PubMed] 13. Bagga S, Bracht J, Hunter S, Massirer K, Holtz J, Eachus R, Pasquinelli AE. Cell. 2005;122:553–563. [PubMed] 14. Jing Q, Huang S, Guth S, Zarubin T, Motoyama A, Chen J, Di Padova F, Lin SC, Gram H, Han J. Cell. 2005;120:623–634. [PubMed] 15. Wu L, Belasco JG. Mol Cell Biol. 2005;25:9198–9208. [PubMed] 16. Brennecke J, Stark A, Russell RB, Cohen SM. PLoS Biol. 2005;3:e85. [PubMed] 17. Bernstein E, Allis CD. Genes Dev. 2005;19:1635–1655. [PubMed] 18. Verdel A, Jia S, Gerber S, Sugiyama T, Gygi S, Grewal SI, Moazed D. Science. 2004;303:672–676. [PubMed] 19. Grewal SI, Moazed D. Science. 2003;301:798–802. [PubMed] 20. Inaoka H, Fukuoka Y, Kohane IS. BMC Bioinformatics. 2006;7:112. [PubMed] 21. Kho AT, Zhao Q, Cai Z, Butte AJ, Kim JY, Pomeroy SL, Rowitch DH, Kohane IS. Genes Dev. 2004;18:629–640. [PubMed] 22. Mariani TJ, Reed JJ, Shapiro SD. Am J Respir Cell Mol Biol. 2002;26:541–548. [PubMed] 23. Hoffmann R, Bruno L, Seidl T, Rolink A, Melchers F. J Immunol. 2003;170:1339–1353. [PubMed] 24. Zeng F, Baldwin DA, Schultz RM. Dev Biol. 2004;272:483–496. [PubMed] 25. Pan H, O'Brien M, J., Wigglesworth K, Eppig JJ, Schultz RM. Dev Biol. 2005;286:493–506. [PubMed] 26. Tomczak KK, Marinescu VD, Ramoni MF, Sanoudou D, Montanaro F, Han M, Kunkel LM, Kohane IS, Beggs AH. FASEB J. 2004;18:403–405. [PubMed] 27. Palidwor G. (National Center for Biotechnology Information). Gene Expression Omnibus. 2005. [Accessed July 2006]. Available at www.ncbi.nlm.nih.gov/geo/query/acc.cgi. Accession code GSE3787. 28. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Nat Genet. 2000;25:25–29. [PubMed] 29. Seitz H, Royo H, Bortolin ML, Lin SP, Ferguson-Smith AC, Cavaille J. Genome Res. 2004;14:1741–1748. [PubMed] 30. Tanzer A, Stadler PF. J Mol Biol. 2004;339:327–335. [PubMed] 31. Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, et al. Proc Natl Acad Sci USA. 2002;99:4465–4470. [PubMed] 32. Lewis BP, Burge CB, Bartel DP. Cell. 2005;120:15–20. [PubMed] 33. Cochran WG. Biometrics. 1954;10:417–451. 34. Armitage P. Biometrics. 1955;11:375–386. 35. Fahrner JA, Baylin SB. Genes Dev. 2003;17:1805–1812. [PubMed] 36. Robins H, Press WH. Proc Natl Acad Sci USA. 2005;102:15557–15562. [PubMed] 37. Zilberman D, Cao X, Jacobsen SE. Science. 2003;299:716–719. [PubMed] 38. Meister G, Landthaler M, Peters L, Chen PY, Urlaub H, Luhrmann R, Tuschl T. Curr Biol. 2005;15:2149–2155. [PubMed] 39. Petrie VJ, Wuitschick JD, Givens CD, Kosinski AM, Partridge JF. Mol Cell Biol. 2005;25:2331–2346. [PubMed] 40. Chow JC, Yen Z, Ziesche SM, Brown CJ. Annu Rev Genomics Hum Genet. 2005;6:69–92. [PubMed] 41. Eshed Y, Bowman JL. Dev Cell. 2004;7:629–630. [PubMed] 42. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R. Nucleic Acids Res. 2005;33:D562–D566. [PubMed] 43. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al. Genome Biol. 2004;5:R80. [PubMed] 44. Adams WT, Skopek TR. J Mol Biol. 1987;194:391–396. [PubMed] 45. Takai D, Jones PA. Proc Natl Acad Sci USA. 2002;99:3740–3745. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
Science. 2001 Feb 16; 291(5507):1289-92.
[Science. 2001]Cell. 2002 Aug 23; 110(4):407-13.
[Cell. 2002]Chromosome Res. 2003; 11(5):527-36.
[Chromosome Res. 2003]Nat Genet. 2000 Oct; 26(2):183-6.
[Nat Genet. 2000]Nature. 2002 Aug 29; 418(6901):975-9.
[Nature. 2002]Genes Dev. 2004 Mar 15; 18(6):629-40.
[Genes Dev. 2004]Am J Respir Cell Mol Biol. 2002 May; 26(5):541-8.
[Am J Respir Cell Mol Biol. 2002]J Immunol. 2003 Feb 1; 170(3):1339-53.
[J Immunol. 2003]Dev Biol. 2004 Aug 15; 272(2):483-96.
[Dev Biol. 2004]Dev Biol. 2005 Oct 15; 286(2):493-506.
[Dev Biol. 2005]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Genome Res. 2004 Sep; 14(9):1741-8.
[Genome Res. 2004]J Mol Biol. 2004 May 28; 339(2):327-35.
[J Mol Biol. 2004]Proc Natl Acad Sci U S A. 2002 Apr 2; 99(7):4465-70.
[Proc Natl Acad Sci U S A. 2002]Proc Natl Acad Sci U S A. 2002 Apr 2; 99(7):4465-70.
[Proc Natl Acad Sci U S A. 2002]Cell. 2005 Jan 14; 120(1):15-20.
[Cell. 2005]Cell. 2005 Jan 14; 120(1):15-20.
[Cell. 2005]Genes Dev. 2003 Aug 1; 17(15):1805-12.
[Genes Dev. 2003]Proc Natl Acad Sci U S A. 2005 Oct 25; 102(43):15557-62.
[Proc Natl Acad Sci U S A. 2005]Science. 2004 Jan 30; 303(5658):672-6.
[Science. 2004]Science. 2003 Aug 8; 301(5634):798-802.
[Science. 2003]BMC Bioinformatics. 2006 Mar 6; 7():112.
[BMC Bioinformatics. 2006]Genes Dev. 2005 Jul 15; 19(14):1635-55.
[Genes Dev. 2005]Proc Natl Acad Sci U S A. 2005 Oct 25; 102(43):15557-62.
[Proc Natl Acad Sci U S A. 2005]Cell. 2005 Mar 11; 120(5):623-34.
[Cell. 2005]Science. 2003 Jan 31; 299(5607):716-9.
[Science. 2003]Curr Biol. 2005 Dec 6; 15(23):2149-55.
[Curr Biol. 2005]Mol Cell Biol. 2005 Mar; 25(6):2331-46.
[Mol Cell Biol. 2005]Annu Rev Genomics Hum Genet. 2005; 6():69-92.
[Annu Rev Genomics Hum Genet. 2005]Dev Cell. 2004 Nov; 7(5):629-30.
[Dev Cell. 2004]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D562-6.
[Nucleic Acids Res. 2005]Genome Biol. 2004; 5(10):R80.
[Genome Biol. 2004]J Mol Biol. 1987 Apr 5; 194(3):391-6.
[J Mol Biol. 1987]Cell. 2005 Jan 14; 120(1):15-20.
[Cell. 2005]Proc Natl Acad Sci U S A. 2002 Mar 19; 99(6):3740-5.
[Proc Natl Acad Sci U S A. 2002]