![]() | ![]() |
Formats:
|
||||||||||||||||
Copyright Mainguy et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Extensive Polycistronism and Antisense Transcription in the Mammalian Hox Clusters 1Hubrecht Laboratory, Center for Biomedical Genetics, Utrecht, The Netherlands 2Department of Human Genetics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands Thomas Zwaka, Academic Editor Baylor College of Medicine, United States of America * To whom correspondence should be addressed. E-mail: gaell.mainguy/at/institut.veolia.org (GM); Email: A.J.Durston/at/biology.leidenuniv.nl (AD) Conceived and designed the experiments: GM. Performed the experiments: GM JW HJ. Analyzed the data: JK AD GM JW HJ. Contributed reagents/materials/analysis tools: JK AD. Wrote the paper: AD GM. Other: Co-initiator and planner of the project (but biggest contribution by Mainguy), Head of Lab, Contributed to analysis, Provided all facilities, Co-writer: AD. Main author: GM. ¤aCurrent address: Institut Veolia Environnement, Paris, France ¤bCurrent address: Institute of Biology, IBL, Faculty of Science and Mathematics, University of Leiden, Leiden, The Netherlands Received January 31, 2007; Accepted February 26, 2007. This article has been cited by other articles in PMC.Abstract The Hox clusters play a crucial role in body patterning during animal development. They encode both Hox transcription factor and micro-RNA genes that are activated in a precise temporal and spatial sequence that follows their chromosomal order. These remarkable collinear properties confer functional unit status for Hox clusters. We developed the TranscriptView platform to establish high resolution transcriptional profiling and report here that transcription in the Hox clusters is far more complex than previously described in both human and mouse. Unannotated transcripts can represent up to 60% of the total transcriptional output of a cluster. In particular, we identified 14 non-coding Transcriptional Units antisense to Hox genes, 10 of which (70%) have a detectable mouse homolog. Most of these Transcriptional Units in both human and mouse present conserved sizeable sequences (>40 bp) overlapping Hox transcripts, suggesting that these Hox antisense transcripts are functional. Hox clusters also display at least seven polycistronic clusters, i.e., different genes being co-transcribed on long isoforms (up to 30 kb). This work provides a reevaluated framework for understanding Hox gene function and dys-function. Such extensive transcriptions may provide a structural explanation for Hox clustering. Introduction Hox clusters are amongst the most remarkable genomic objects, the structure and function of which are crucial to understand, as Hox clusters are implicated in a growing number of diseases from cancers to congenital malformations [1]. Mammals possess four similar Hox clusters, HoxA, HoxB, HoxC and HoxD, located on different chromosomes, consisting of 9 to 11 Hox genes arranged in tandem. The order of Hox genes along the chromosome corresponds to the order in which they act along the body axes and this collinear property links clustering to function emphasizing that Hox clusters are functional units [2]. The Hox clusters also contain 5 micro RNA (miRNA) genes intercalated at two homologous positions [3], [4]. The organization of Hox complexes is highly conserved in vertebrates and Hox and mir genes not only stay clustered but also in close proximity to each other despite their very complex and dynamic expression patterns, a property in apparent contradiction with the observation that the more complex the expression pattern of a gene is, the larger its flanking non coding DNA [5]. This apparent paradox raises the question of the selective pressure(s) at work for maintaining Hox and mir genes clustered. Current models propose that clustering is maintained via the sharing of cis-regulatory elements that control several Hox genes either locally or globally [6], [7], [2]. Other aspects of transcriptional structure could also be important. First, a case of polycistronism has been reported where Hoxc6, Hoxc5 and Hoxc4 are co-transcribed and gene-specific transcripts result from alternative splicing [8]. Notably, polycistronic Hox transcripts have also been reported in a number of crustaceans [9], indicating their importance in diverse metazoa. Second, a Hoxa11 antisense RNA is transcribed immediately 5′ to HoxA11 and is involved in its regulation [10]. Thus, Hox clusters present unusual transcriptional characteristics that may play an important role for Hox gene expression. The transcriptional complexity of mammalian genomes is increasingly recognized [11] and data mining provides a suitable way to establish transcriptional structure of poorly expressed genes.Here we present a thorough analysis of the best described vertebrate (human and murine) Hox clusters. Results and Discussion The majority of the transcriptional activity of the Hox Clusters is not annotated As the gene is a misleading concept we follow the unambiguous definitions proposed by the FANTOM consortium: A transcriptional unit (TU) is a segment of the genome flanked by the most distal exons from which transcripts are generated [12]. The transcripts sharing any exon are merged into a single TU. If two transcripts do not share any single exon, they constitute two different TUs, even if they overlap or if one is localized in the intron of the other. In particular, two transcripts on opposite strands always constitute two different TUs. Aligning the genome with all of the ESTs and mRNA provides a reliable method to delineate exons and deduce TU structures in the entire organism, independently of time and space and throughout its life cycle [13]. We computationally mined public mouse and human databases using a dedicated software platform, TranscriptView (see material and methods) and found that Hox cluster profiles are far more complex than annotated (fig 1a
In an effort to re-annotate the Hox clusters, we used the following strategy to establish TUs and discriminate functional RNAs. First, we restricted our analysis to spliced transcripts as splicing is evidence against genomic contamination and splice site asymmetry allows transcript orientation. As most of these transcripts are non-coding (see below), protein conservation was a useless criterion. To categorize the TUs along a scale of degree of confidence, we therefore focused on the exon-intron structure and nucleotide sequence conservation. In our analysis, Transcript existence (1) is defined by presence of multiple spliced transcripts in human databases, Sequence conservation (2) is observed when transcripts from two different species share a conserved sequence and Exon-Exon structure conservation (3) characterizes transcripts from two different species displaying the same intron boundaries. Our findings are summarized in tables 1 and 2 and are depicted in figure 2
Polycistronic clusters A polycistronic cluster designates two or more genes co-transcribed from a single promoter, sharing a non-coding exon, and whose products are generated by alternative splicing [18]. An operon is a particular case where the mRNA retains the different products after splicing. In Mammals, both operons and polycistronic clusters are scarcely documented [18]. One clear example nonetheless, is the case of the Hoxc4, Hoxc5 and Hoxc6 genes that can be co-transcribed from a common promoter [8]. We found 22 Hox transcripts for which introns seem to encompass other genes. In three cases we could identify a homolog in rodent that presented a conserved exon-exon boundary (>85% identity over at least 60 nucleotides, see supporting online material). In total, multiple alignment and identification of orthologs provided support for the existence of seven polycistronic clusters which concern 38% (15/39) of the Hox genes (table 1). Remarkably, the five miRNAs are located within introns of atypical transcripts and are therefore co-transcribed with Hox genes (figure 2 Widespread antisense transcription Our analysis also revealed the existence of 15 TUs distinct from the Hox and mir genes that are poly-adenylated and alternatively spliced like genuine products of RNA Polymerase II. Most of these TUs (14/15) are transcribed antisense (AS) to Hox genes (see fig 2
To date, AS RNAs have been implicated in various aspects of eukaryotic gene expression as diverse as genomic imprinting, RNA interference, translational regulation, alternative splicing, or RNA editing [21], [22], [23]. AS transcripts frequently originate from the same locus as sense transcripts and are called cis-encoded antisenses. They are thought to exert a control on RNA sense expression by sense-antisense (SAS) pairing [21], [22]. We searched for potential SAS contacts (>40 bp) and found that nine AS TUs (65%) have sequences reverse-complementary to twelve Hox mRNAs (table 2). This proportion is rather high as, on a genomic scale, natural AS transcription has been evaluated to target from 2 to 8% of the human genes [21], [22]. Similarly, in mouse sequences eight Hox genes are subjected to cis-antisense interactions, three of which, Hoxa3, Hoxb3 and Hoxb5, present the same SAS in both human and mouse (table 2). The conservation of SAS sequences between human and mouse strongly supports the hypothesis that these AS TUs are functional. Moreover, all of the SAS overlap sequences are remarkably conserved in the other species suggesting that cis-encoded antisenses could target as many as 22 Hox genes (table 2). Besides these interactions, trans-encoded AS RNAs have also been reported where the AS transcript originates from a different locus and displays only partial complementarity with the sense transcript [23]. We identified 6 and 5 potential trans-interactions in human and mouse respectively (SAS contact; >40 nucleotides, >85% identity) (table 2). These SAS interactions usually occur within a paralog group (A1/B1, A3/B3 or A11/C11/D11) but there are three noteworthy exceptions (B4/B5, B2/A9 and B2/D3). Remarkably, antisense transcripts with the potential to recognize Hoxb4 and Hoxd3 in trans are present in both human and mouse. Functional clustering and extensive transcription correlate with absence of transposons Our analysis suggests that, in addition to the sharing of cis-regulatory elements, the existence of operons, polycistronism and antisense-sense pairing provide additional constraints for maintaining Hox clusters as functional units. If this were the case, exogenous start and stop transcription signals would be highly counter-selected. Indeed, the four Hox clusters are by far the most repeat-poor regions of the genome in both human and mouse, and the current explanation is that insertions would interfere with the dense network of cis-interactions [24]. We analyzed the repeat distribution and found that transposons are virtually absent from transcribed regions but that they can accumulate within the clusters at untranscribed regions. The HoxB cluster provides a threefold example of this mutual exclusiveness between transposons and transcription (see figure 3
Concluding Remarks Our analysis confers on Hox clusters the status of the most complex objects reported to date in mammals in terms of both polycistronism and antisense and suggests that, in addition to enhancer sharing, these mechanisms provide additional constraints for maintaining Hox clusters as functional units. There is increasing recognition that the production of RNA transcripts from both orientations can produce coordinate regulation and since mammalian mRNAs that form sense-antisense pairs frequently exhibit reciprocal expression patterns [21] it is tempting to speculate that antisense transcription in the Hox Clusters is instrumental in establishing limits of gene expression. In conclusion, by unraveling the complex transcriptional organisation of the Hox clusters, our analysis blurs the traditional view of Hox genes and provides a reevaluated framework for understanding Hox gene function and dys-function. Methods The TranscriptView software platform We used the TranscriptView software platform to obtain and manipulate clusters of human expressed sequences aligned to genomic DNA. TranscriptView makes use of public genome alignment data for EST and mRNA sequences generated with BLAT by the UCSC genome consortium (http://genome.ucsc.edu/). The BLAT program is specifically designed for transcript to genome alignments making it possible to align large collections of sequences to the genome [25]. Expressed sequences are compared to the human genome to find high quality hits, and are then aligned to it using a spliced alignment model that allows long gaps, for modeling introns. The maximum intron length allowed by BLAT is 500,000 bases. When a single EST aligned in multiple places, the alignment having the highest base identity is identified. Low-quality sequence ends that disagree with the DNA are trimmed. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence are kept (http://genome.ucsc.edu/cgi-bin/hgTrackUi?g = est). Further, expressed sequences aligning to two or more chromosomes are discarded as suspected chimeras. Overlapping expressed sequences and corresponding genomic sequences are multiply aligned. Positions on the genomic sequence in which there is at least one expressed sequence that opens or closes a long gap, are considered splice sites. The exact position of the splice sites is determined taking the GT...AG rule into consideration as described in [22]. The list of all of the alignment boundaries is generated allowing a quantitative determination of the transcriptional status of any genomic segment at the base pair level. The deduced Exon-intron organization and the orientation when available are also accessible through the TranscriptView graphical interface.Datasets Using BLAT we retrieved 2630 Ests and mRNA sequences that aligned with the human Hox clusters (HoxA 837, HoxB 807, HoxC 441, HoxD 545). The distribution of this primary set is described in figure 1 TU annotations and transcript analysis Among this secondary set, 96 sequences displayed at least one intron longer than 7 kb (see the list in supporting online material for references and characteristics). These sequences were then merged with ‘classical’ Hox transcripts, grouped according to cluster and orientation and TUs were constructed using the Contig Assembly Program (http://www.infobiogen.fr/services/analyseq/cgi-bin/cap_in.pl) [26]. CAP generated contigs were then checked for misalignments. To identify putative homologuous TUs, non-redundant representative sequences for each TU were selected on the basis of the CAP contigs and blasted against vertebrate transcription databases. In the case of the 14 antisense TUs, we collected a representative set of 52 sequences to identify putative homologous and to evaluate the coding potential. Using the Diogenes ORF prediction program (http://web.ahc.umn.edu/cgi-bin/diogenes/diogenes.cgi), eight different sequences presented a score compatible with an ORF (p>10-3) but subsequent BLAST analysis failed to detect any conserved pattern outside human. These 52 sequences were systematically blasted against human database and alignment with sense Hox transcripts were reported as an indication of putative SAS contacts. Imperfect alignment and inconsistency in the genomic locations were the signs of putative trans-SAS contacts. Conservation of the SAS sequences was assessed by species cross-blasting. We undertook a similar procedure for the mouse Hox antisense TUs. The results are summarized in table 2. Acknowledgments http://genome.ucsc.edu/ UCSC genome consortium: home page http://genome.ucsc.edu/cgi-bin/hgTrackUi?g = est UCSC genome consortium: BLAT database.http://www.infobiogen.fr/services/analyseq/cgi-bin/cap_in.pl Contig Assembly Program. http://web.ahc.umn.edu/cgi-bin/diogenes/diogenes.cgi Diogenes ORF prediction program Footnotes Competing Interests: The authors have declared that no competing interests exist. Funding: This work was supported by the European Union FP5 Marie Curie and Biotech programmes, and also by an EMBO long term fellowship tp GM and a Marie Curie individual fellowship to G.M. These sponsors supported all phases of this work up until the final writing phase. References 1. Grier DG, Thompson A, Kwasniewska A, McGonigle GJ, Halliday HL, et al. The pathophysiology of HOX genes and their role in cancer. J. Pathol. 2005;205(2):154–71. [PubMed] 2. Duboule D. Vertebrate hox gene regulation: clustering and/or colinearity? Curr Opin Genet Dev. 1998;8(5):514–8. [PubMed] 3. Kosman D, Mizutani CM, Lemons D, Cox WG, McGinnis W, et al. Multiplex detection of RNA expression in Drosophila embryos. Science. 2004;305(5685):846. [PubMed] 4. Yekta S, Shih IH, Bartel DP. MicroRNA-directed cleavage of HOXB8 mRNA. Science. 2004;304(5670):594–6. [PubMed] 5. Nelson CE, Hersh BM, Carroll SB. The regulatory content of intergenic DNA shapes genome architecture. Genome Biol. 2004;5(4):R25. [PubMed] 6. Mann RS. Why are Hox genes clustered? Bioessays. 1997;19(8):661–4. [PubMed] 7. Gould A, Morrison A, Sproat G, White RA, Krumlauf R. Positive cross-regulation and enhancer sharing: two mechanisms for specifying overlapping Hox expression patterns. Genes Dev. 1997;11(7):900–13. [PubMed] 8. Simeone A, Pannese M, Acampora D, D'Esposito M, Boncinelli E. At least three human homeoboxes on chromosome 12 belong to the same transcription unit. Nucleic Acids Res. 1988;16(12):5379–90. [PubMed] 9. Shiga Y, Sagawa K, Takai R, Sakaguchi H, Yamagata H, et al. Transcriptional readthrough of Hox genes Ubx and Antp and their divergent post-transcriptional control during crustacean evolution. Evol Dev. Sep–Oct; 2006;8(5):407–14. 10. Hsieh-Li HM Davis AP, Witte DP, Potter SS, Capecchi MR, et al. Hoxa 11 structure, extensive antisense transcription, and function in male and female fertility. Development. 1995;121(5):1373–85. [PubMed] 11. Engstrom PG, Suzuki H, Ninomiya N, Akalin A, Sessa L, et al. Complex Loci in Human and Mouse Genomes. PLoS Genetics. 2006;2(4):e47. [PubMed] 12. Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002;420(6915):563–73. [PubMed] 13. Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, et al. Using the transcriptome to annotate the genome. Nat Biotechnol. 2002;20(5):508–12. [PubMed] 14. Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, et al. The transcriptional activity of human Chromosome 22. Genes Dev. 2003;17(4):529–40. [PubMed] 15. Yousef GM, Diamandis EP. An overview of the kallikrein gene families in humans and other species: emerging candidate tumour markers. Clin Biochem. 2003;36(6):443–52. [PubMed] 16. Cook PR. Nongenic transcription, gene regulation and action at a distance. J Cell Sci. 2003;116(Pt 22):4483–91. [PubMed] 17. Thanaraj TA, Clark F, Muilu J. Conservation of human alternative splice events in mouse. Nucleic Acids Res. 2003;31(10):2544–52. [PubMed] 18. Blumenthal T. Gene clusters and polycistronic transcription in eukaryotes. Bioessays. 1998;20(6):480–7. [PubMed] 19. Mansfield JH, Harfe BD, Nissen R, Obenauer J, Srineel J, et al. MicroRNA-responsive ‘sensor’ transgenes uncover Hox-like and other developmentally regulated patterns of vertebrate microRNA expression. Nat Genet. 2004;36(10):1079–83. [PubMed] 20. Isono K, Mizutani-Koseki Y, Komori T, Schmidt-Zachmann MS, Koseki H, et al. Mammalian polycomb-mediated repression of Hox genes requires the essential spliceosomal protein Sf3b1. Genes Dev. 2005;19(5):536–41. [PubMed] 21. Lehner B, Williams G, Campbell RD, Sanderson CM. Antisense transcripts in the human genome. Trends Genet. 2002;18(2):63–5. [PubMed] 22. Yelin R, Dahary D, Sorek R, Levanon EY, Goldstein O, et al. Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol. 2003;21(4):379–86. [PubMed] 23. Vanhee-Brossollet C, Vaquero C. Do natural antisense transcripts make sense in eukaryotes? Gene. 1998;211(1):1–9. [PubMed] 24. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, et al. Initial sequencing and comparative analy sis of the mouse genome. Nature. 2002;420(6915):520–62. [PubMed] 25. Kent WJ. BLAT–the BLAST-like alignment tool. Genome Res. 2002;12(4):656–64. [PubMed] 26. Huang X. An improved sequence assembly program. Genomics. 1996;33(1):21–. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
J Pathol. 2005 Jan; 205(2):154-71.
[J Pathol. 2005]Curr Opin Genet Dev. 1998 Oct; 8(5):514-8.
[Curr Opin Genet Dev. 1998]Science. 2004 Aug 6; 305(5685):846.
[Science. 2004]Science. 2004 Apr 23; 304(5670):594-6.
[Science. 2004]Genome Biol. 2004; 5(4):R25.
[Genome Biol. 2004]Bioessays. 1997 Aug; 19(8):661-4.
[Bioessays. 1997]Genes Dev. 1997 Apr 1; 11(7):900-13.
[Genes Dev. 1997]Curr Opin Genet Dev. 1998 Oct; 8(5):514-8.
[Curr Opin Genet Dev. 1998]Nucleic Acids Res. 1988 Jun 24; 16(12):5379-90.
[Nucleic Acids Res. 1988]Development. 1995 May; 121(5):1373-85.
[Development. 1995]PLoS Genet. 2006 Apr; 2(4):e47.
[PLoS Genet. 2006]Nature. 2002 Dec 5; 420(6915):563-73.
[Nature. 2002]Nat Biotechnol. 2002 May; 20(5):508-12.
[Nat Biotechnol. 2002]Genes Dev. 2003 Feb 15; 17(4):529-40.
[Genes Dev. 2003]Clin Biochem. 2003 Sep; 36(6):443-52.
[Clin Biochem. 2003]J Cell Sci. 2003 Nov 15; 116(Pt 22):4483-91.
[J Cell Sci. 2003]Bioessays. 1998 Jun; 20(6):480-7.
[Bioessays. 1998]Nucleic Acids Res. 1988 Jun 24; 16(12):5379-90.
[Nucleic Acids Res. 1988]Nat Genet. 2004 Oct; 36(10):1079-83.
[Nat Genet. 2004]Genes Dev. 2005 Mar 1; 19(5):536-41.
[Genes Dev. 2005]PLoS Genet. 2006 Apr; 2(4):e47.
[PLoS Genet. 2006]Trends Genet. 2002 Feb; 18(2):63-5.
[Trends Genet. 2002]Nat Biotechnol. 2003 Apr; 21(4):379-86.
[Nat Biotechnol. 2003]Trends Genet. 2002 Feb; 18(2):63-5.
[Trends Genet. 2002]Nat Biotechnol. 2003 Apr; 21(4):379-86.
[Nat Biotechnol. 2003]Gene. 1998 Apr 28; 211(1):1-9.
[Gene. 1998]Nature. 2002 Dec 5; 420(6915):520-62.
[Nature. 2002]Trends Genet. 2002 Feb; 18(2):63-5.
[Trends Genet. 2002]Genome Res. 2002 Apr; 12(4):656-64.
[Genome Res. 2002]Nat Biotechnol. 2003 Apr; 21(4):379-86.
[Nat Biotechnol. 2003]Genomics. 1996 Apr 1; 33(1):21-31.
[Genomics. 1996]