![]() | ![]() |
Formats:
|
||||||||||||
Copyright © 2007, Cold Spring Harbor Laboratory Press Functional persistence of exonized mammalian-wide interspersed repeat elements (MIRs) 1 Institute of Experimental Pathology (ZMBE), University of Münster, Münster, Germany; 2 Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, Pennsylvania 16802, USA; 3 Institute of Bioinformatics, University of Münster, Münster, Germany 4Corresponding authors.E-mail jueschm/at/uni-muenster.de; fax 49-251-8352134.E-mail RNA.world/at/uni-muenster.de; fax 49-251-8358512. Received January 24, 2007; Accepted May 8, 2007. This article has been cited by other articles in PMC.Abstract Exonization of retroposed mobile elements, a process whereby new exons are generated following changes in non-protein-coding regions of a gene, is thought to have great potential for generating proteins with novel domains. Our previous analysis of primate-specific Alu-short interspersed elements (SINEs) showed, however, that during their 60 million years of evolution, SINE exonizations occurred in some primates, only to be lost again in some of the descendent lineages. This dynamic gain and loss makes it difficult to ascertain the contribution of exonization to genomic novelty. It was speculated that Alu-SINEs are too young to reveal persistent protein exaptation. In the present study we examined older mobile elements, mammalian-wide interspersed repeats (MIRs) that underwent active retroposition prior to the placental mammalian radiation ~130 million years ago, to determine their contribution to protein-coding sequences. Of 107 potential cases of MIR exonizations in human, an analysis of splice sites substantiates a mechanism that benefits from 3′ splice site selection in MIR sequences. We retraced in detail the evolution of five MIR elements that exonized at different times during mammalian evolution. Four of these are expressed as alternatively spliced transcripts; three in species throughout the mammalian phylogenetic tree and one solely in primates. The fifth is the first experimentally verified, constitutively expressed retroposed SINE element in mammals. This pattern of highly conserved, alternatively and constitutively spliced MIR sequences evinces the potential of exonized transposed elements to evolve beyond the transient state found in Alu-SINEs and persist as important parts of functional proteins. Genomic plasticity has contributed significantly to the dynamic generation of novel features in evolution. In this context, retroposed genetic elements, which are sequences of DNA that amplify via RNA to different positions within the genome, play a decisive role as inducers or substrates of novel evolutionary building blocks (Brosius and Gould 1992). Discernible retroposed elements compose up to 42% of the human genome (Lander et al. 2001) and, in an appropriate genomic context, have the potential to provide alternative splice sites and/or polyadenylation signals or may modify gene expression as parts of promoters or enhancers (Brosius 2005). Retroposed sequences can even be exapted (i.e., assume a new role) as protein-coding modules in a process that requires exonization, and when expressed via alternative splicing, they increase the complexity of the proteome (Xing and Lee 2006). In rare cases, when their sequences or parts thereof are under strong negative selection, retroposed elements can be identified as very ancient exaptations dating as far back as half a billion years (Bejerano et al. 2006). In primates, Alu-short interspersed elements (SINEs) are the prevailing retroposed elements found in open reading frames (ORFs) of mRNAs and expressed sequence tags (ESTs) (Sorek et al. 2002). Following the path of Alu exonizations along a phylogenetic tree of primates indicates a dynamic gain and loss of exonizations, processes that may embrace >60 million years (Myr) of evolution (Krull et al. 2005). Thus, it appears that the time frame of primate evolution might not yet be sufficiently long to ascertain whether exonized elements have been exapted as persistent modules of the proteome (Gotea and Makalowski 2006). Although transcribed, exonized sequences are destined to negotiate a grueling course along the way to becoming part of functional proteins that only a very few ever survive. Various mRNA surveillance mechanisms might degrade alternatively exonized transcripts before they become essential parts of functioning proteins (Wagner and Lykke-Andersen 2002; Hillman et al. 2004). Survivors of the degradation mechanism are further exposed to natural selection in maintaining their status as alternative splice products or in replacing the ancestral splice variant. A number of Alu exonizations in primates might not have reached that status yet. A retrospective view of much older retropositions, ones active during the Mesozoic era of mammalian evolution, promises a more definitive picture of the exonization process, especially regarding their contribution to protein plasticity. Good candidates for illuminating these older exonization processes are the retroposed mammalian-wide interspersed repeat (MIR) elements. MIR elements amplified ~130 million years ago (Mya), and they number, for example, in human, ~368,000 discernible copies (Lander et al. 2001). Their age and their abundant distribution in mammals make MIRs an excellent source to elucidate the history of their exonizations. MIRs are usually truncated at either or both ends (Smit and Riggs 1995). The 5′ tRNA part of the MIR is fused to a tRNA-unrelated sequence, and the 3′ end features a 50-nt fragment that is similar to the 3′ end of long interspersed elements (LINEs), a likely binding site of the LINE reverse transcriptase needed for retroposition (Kapitonov et al. 2004). The conserved central domain includes a 15-nt core sequence (Fig. 1
In this paper, we focus on MIR exonizations in a phylogenetic context by characterizing novel gene modules in representatives of all mammalian clades (Kriegs et al. 2006). We examine the exonization patterns of MIR elements in species of all major branches of mammals, which evolved over a period of >100 Myr, and compare these to the relatively young processes involved in Alu exonizations in primates. Results Selection of the data set and examples To identify MIR elements in protein-coding sequences of human, mouse, and rat, we screened a compilation of mammalian mRNAs presumably harboring transposable element-cassettes assembled by Makalowski and co-workers (Genomic ScrapYard; http://warta.bio.psu.edu/SYDB/database.html), performing separate searches for the listed species. From 372 ScrapYard records with potential MIR element-cassettes identified in this initial search, 126 MIR sequences were present in protein-coding sequences (CDS), the remaining were either redundant entries or otherwise artifactual (Supplemental Table S1; Supplemental Fig. S1). Of these 126 loci, 107 were found in human (one of which was initially identified in rat) and were used to investigate the distribution and orientation of exonized MIR sequences (Fig. 1 Five of the above 126 cases were (1) supported by ESTs or other indications of expression, (2) flanked by conserved sequence regions facilitating mammalian wide PCR amplification of fragments not exceeding 2 kb, (3) expressed in available tissue, and (4) embedded in introns, and thereby suitable for intensive phylogenetic reconstruction (Fig. 2
A natural splice site in MIR elements Because the presence or acquisition of alternative splice sites recognized by the splicing machinery is crucial for intronic elements to be exonized, we examined the nature of the splice sites flanking the 107 MIR exonizations identified in human by compiling and comparing their potential protein-coding sequences (Fig. 1 Twenty of the 64 antisense exonized MIR elements feature a MIR-contributed AG splice site that is preceded by a MIR-contributed oligopyrimidine tract (Fig. 1 Alternative expression of MIR sequences Three of the five experimentally analyzed MIR elements (NTRK3, LAS1L, Zfp384) exhibit stable alternative splice forms in representative species of the major mammalian branches evidenced both in DNA (conservation of splice sites and maintenance of ORFs) and mRNA, demonstrating that the theoretical splice variants actually exist (Fig. 3
Constitutive expression of a MIR sequence Zhang and Chasin (2006) proposed that advantageous new exons are eventually expressed constitutively. For SINEs, we provide the first evidence of this prediction. Of the five genes containing internal MIR exonizations, ZNF639 is the only one in which the MIR sequence is constitutively exonized in protein-coding sequences. All analyzed mammalian representatives of Eutheria (Supraprimates: human and mouse; Laurasiatheria: mole, dog, and cow; Afrotheria: manatee; and Xenarthra: sloth), Metatheria (Didelphimorphia: opossum), and Monotremata (Ornithorhynchidae: platypus) express ZNF639 mRNAs only with, and none without, the MIR contribution. The Ka/Ks value of 0.19 indicates purifying selection. The exonized MIR component encodes 45 amino acids located within the 205 amino acid N-terminal part of the ZNF639 protein. This region (amino acids 58–102; Imoto et al. 2003) contains no recognizable protein motifs, while the 280 C-terminal amino acids contain nine zinc finger motifs (Fig. 2 For ZNF639, all necessary events from insertion of the MIR element to recruitment of parts of the MIR as a novel protein-coding exon occurred on the phylogenetic branch leading from the common ancestor of amniotes (mammals, reptiles, dinosaurs, and birds) to the mammalian ancestor. However, there are no living animals that diverged in this time period that would enable us to reconstruct, step by step, the successive evolution of molecular changes that were necessary to facilitate the 5′ functional splice site and an intact ORF, or to show possible intermediate alternative splice variants. Discussion Mammalian-wide detection of MIR sequence exonizations From a total of 372 potential exonized MIR elements, we investigated 126 elements with strong indications of exonization in human, mouse, or rat (Supplemental Fig. S1). On five of these, we performed an extensive retrospective analysis reconstructing >100 Myr of mammalian history. To establish a complete mammalian exonization pattern, we limited our analyses to those loci that could be amplified in the greatest number of species, sampling for both DNA and RNA analyses (introns ≤ 2 kb). From our extensive experience analyzing 160,000 genomic loci in the evolutionary history of mammalian species (Kriegs et al. 2006), we concluded that larger introns in some compared species render PCR amplification and comparative analyses in the rest of the species difficult. However, in addition to the five experimentally accessible loci presented in this study, we attempted to amplify additional exonizations with larger intronic regions (up to 5.8 kb; Supplemental Table S1, e.g., NM_145065, NM_015424, NM_010058). None of these cases was amplifiable in all essential key mammalian species. To gain full species sampling for those examples and to add some additional loci with lower conservation of flanking regions, we must await the upcoming data from genome sequencing projects. Expression of MIR sequences Relative to gene orientation, significantly more MIR exonized sequences are located in the antisense orientation (60%); this preference is even more pronounced for Alu exonizations (85%; Sorek et al. 2002). The difference might be explained by the significant predominance of antisense intronic Alu insertions compared to MIR insertions that are found preferentially in the sense orientation. Furthermore, there are two prevalent 3′ splice sites in the right arm of exonized Alu sequences (proximal and distal; Lev-Maor et al. 2003), whereas we could detect only one such prevalent 3′ splice site for the potential MIR exonizations (Fig. 1 An intriguing observation is that the natural, MIR-specific splice site is just 3′ adjacent to the MIR conserved core region. However, there are hundreds of thousands of MIR elements that are not associated with splice sites but still bear the conserved core region. Given their origins dating back at least 130 Mya, it is still unclear why, in contrast to the rest of the MIR sequence, the core sequences remain so highly conserved. In coding sequences the 5′ domain of MIR elements appears to be preferentially exonized (Fig. 1 Time point of MIR exonization MIR elements were actively mobile prior to the mammalian radiation ~130 Mya (Smit and Riggs 1995). In four of the five phylogenetically analyzed cases, exonization of MIR elements also took place very early in the evolution of mammals (Fig. 3 Nearly 30 years ago, Walter Gilbert recognized in alternative splicing a process that allows evolution to try out new solutions without destroying the old (Gilbert 1978), a variation of a “strategy” that had been recognized in gene and genome duplication (Bridges 1936; Ohno 1970). Particularly in mammalian genomes, exonized transposed elements contribute significantly to alternative splicing. Evolutionary time seems to be a critical factor in establishing essential key mutations required for exonization. Three examples endorse this assumption (Fig. 4
The low Ka/Ks value emphasizes the moderate selection pressure acting on the exonized MIR sequence of ZNF639. Interestingly, although we did not detect a MIR element in the corresponding locus in chicken, we found an exonized intronic sequence of the same size in this locus. The same sequence region (one additional triplet with respect to chicken) is also exonized in the ostrich, which represents another major branch of the bird phylogenetic tree. A DNA sequence alignment shows no apparent relationship between the two independently exonized sequences in mammals and birds and only ~50% random similarity (Supplemental Fig. S4A). However, although the additional sequences of the proteins do display some similarities in charges and hydrophobicity (Supplemental Fig. S4B,C), protein structural information is necessary to understand if the exonized sequences might play any beneficial role at all in separating neighboring protein domains. The orthologous gene in Xenopus lacks this additional exon, and consequently the protein lacks the extra 45-amino acid segment. Although Ka/Ks values indicate that the exonized part of the MIR element itself is under moderate selection pressure, the 291-nt adjacent protein coding flanks (parts of exons 5 and 7) show even lower Ka/Ks values (0.17 vs. 0.19 for the exonized MIR). This, at most, suggests a possibly lower selection pressure on the MIR exonized sequence than on the flanks. This difference is even greater when compared to the nine functional zinc finger domains of ZNF639 in exon 7 (582 nt; Ka/Ks = 0.03; data not shown). However, more information about functional domains of the N-terminal region of the protein is necessary to present more conclusive information about a potential spacer function of the exonized MIR sequence. There is also another report of two independent exonizations, although of different lengths and origin, in the same intron of the ADARB1 gene in different taxonomic groups (human and mouse; Slavov and Gardiner 2002). In theory, the first transition from random insertion to alternative splicing is reversible and is either not at all under purifying selection or under relaxed negative selection (Xing and Lee 2006). Novel exons might be generated and shaped for “testing” under such relaxed selection (this period might include a phase of positive selection as well). Through a gradient of purifying selection, the second transition leads to stable alternative splice forms and in rare cases to constitutive splicing that includes the exonized form. ZNF639 is the first gene experimentally verified to contain such a constitutively expressed MIR exonization and exaptation. Of course, at this time we cannot exclude the possibility that the ZNF639 exonization was constitutive from the start and did not pass through this transitional state. Conclusion The contribution of transposed elements to gene structures is more or less coincidental. Their persistence is usually transient. If not deleted, they fade beyond recognition over longer evolutionary periods. However, a notable fraction of transposed elements escapes transience, for example, by integrating into protein-coding parts of genes, facilitated by internal components providing splice sites and oligopyrimidine tracts and “reprogramming” the splicing system of a targeted gene. Once proven worthy in the struggle of survival, they endure recognizably over hundreds of millions of years and contribute to significant tasks. We have identified and analyzed some of these candidates, thus shedding light on their >100 million-year-old evolutionary histories that show they have clearly stood the test of evolutionary time and persisted in mammalian lineages. We showed that 3′ splice site selection in exonized transposed elements is not restricted to Alu elements but seems to be an older and significant mechanism for MIR exonization as well. Alternative splicing of exonized MIRs is exemplarily shown to be a stable process retained >100 Myr in all major groups of mammals. Functional persistence shown by constitutive splicing of an exonized MIR sequence was evidenced for the first time and demonstrates that this evolutionary pathway is not necessary correlated with genetic disorder, as has been suggested for Alu exonizations by Lev-Maor et al. (2003). The present data are ample evidence of the value of exonization, given enough evolutionary time, in generating genomic novelties. Methods Data selection To identify MIR elements in protein-coding sequences, we screened a compilation of mammalian mRNAs with transposable element cassettes (Genomic ScrapYard; http://warta.bio.psu.edu/SYDB/database.html), performing independent searches for human, mouse, and rat. Out of 1091, 472, and 201 matching records we selected 314, 38, and 20 cases that included MIR elements in human, mouse, and rat, respectively. These 372 cases were then scrutinized to filter out duplications and other artifacts (246 cases; Supplemental Fig. S1). To search for cases that were suitable for experimental evaluation of their phylogeny, we screened the remaining 126 potential exonizations by applying the following criteria: (1) The MIR-derived sequence should not be located in the first or last protein-coding exon as they usually lack highly conserved flanking regions and hence are difficult to amplify by polymerase chain reaction (PCR). (2) To enable manageable PCRs of up to 2 kb in highly diverged mammalian species, the MIR-derived sequences should be flanked by conserved sequences. Conservation was determined by comparison of available genomic sequences (e.g., of human, mouse, and dog). (3) Indication for exonization should be supplied by such available information as EST data or other published information. (4) Relevant tissues should be available for specific alternative transcripts. Five of the 126 potential exonizations fulfilled all four criteria and were subjected to further phylogenetic examination using PCR, RT-PCR, and sequence analyses. Note that the ScrapYard database consists of GenBank entries from January 15, 2002. It is expected that an updated database would facilitate the recovery of additional cases of MIR exonizations. Available sequence information was obtained from the UCSC Genome Browser (http://genome.ucsc.edu/cgi-bin/hgBlat) and the NCBI trace archive (http://www.ncbi.nlm.nih.gov/blast/tracemb.shtml). Experimental procedures concerning DNA and RNA extraction, PCR amplification, and reverse transcription are given in the Supplemental Protocol S1. PCR primers are listed in Supplemental Table S2 and are illustrated in Supplemental Fig. S5. Sequence analyses For detection and classification of the inserted and partially exonized MIR sequences, we used the RepeatMasker server (A.F.A. Smit, R. Hubley, and P. Green, unpubl.; RepeatMasker at http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker). The available mammalian sequences of the five relevant loci were derived from the NCBI database, and all additionally amplified sequences were manually aligned. GenBank entries are compiled in Supplemental Table S3. Ka/Ks values We used the Yang–Nielsen maximum likelihood method (Yang and Nielsen 2000), implemented in the YN00 program of the PAML package (Yang 1997), to calculate the Ka/Ks values for aligned exonized parts of the MIR elements. Ka/Ks values were averaged over all investigated species pairs. It should be noted that in most cases the significance of the derived values was limited by the short length and low variation of the exonized regions. Acknowledgments We thank Frank Grützner, Rodney L. Honeycutt, Uwe Joite, Jan Ole Kriegs, Jörg Molten, Bernhard Neurohr, Christian Roos, Gertrud Scheele, Heike Weber, and Anja Zemann for providing us with tissue samples and Marsha Bundman for editorial assistance. We thank Valer Gotea for his help in selecting data from the Genomic ScrapYard database and Michael Haberl for his comments. We thank Django Sussman for introducing us to methods for analyzing structural features of the ZNF639 protein. J.S. thanks Matthias Schmitz for all his personal support. This work was supported by the Nationales Genomforschungsnetz (NGFN) (0313358A to J.B. and J.S.), the European Union (EU) (LSHG-CT-2003-503022 to J.B.), and the Deutsche Forschungsgemeinschaft (DFG) (SCHM1469 to J.S. and J.B.). Footnotes Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6320607 References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||
Proc Natl Acad Sci U S A. 1992 Nov 15; 89(22):10706-10.
[Proc Natl Acad Sci U S A. 1992]Nature. 2001 Feb 15; 409(6822):860-921.
[Nature. 2001]Cytogenet Genome Res. 2005; 110(1-4):8-24.
[Cytogenet Genome Res. 2005]Nat Rev Genet. 2006 Jul; 7(7):499-509.
[Nat Rev Genet. 2006]Nature. 2006 May 4; 441(7089):87-90.
[Nature. 2006]Mol Biol Evol. 2005 Aug; 22(8):1702-11.
[Mol Biol Evol. 2005]Trends Genet. 2006 May; 22(5):260-7.
[Trends Genet. 2006]J Cell Sci. 2002 Aug 1; 115(Pt 15):3033-8.
[J Cell Sci. 2002]Genome Biol. 2004; 5(2):R8.
[Genome Biol. 2004]Nature. 2001 Feb 15; 409(6822):860-921.
[Nature. 2001]Nucleic Acids Res. 1995 Jan 11; 23(1):98-102.
[Nucleic Acids Res. 1995]Proc Natl Acad Sci U S A. 1999 Mar 16; 96(6):2869-74.
[Proc Natl Acad Sci U S A. 1999]Genome Res. 2006 Jul; 16(7):864-74.
[Genome Res. 2006]PLoS Biol. 2006 Apr; 4(4):e91.
[PLoS Biol. 2006]Trends Genet. 1994 Jun; 10(6):188-93.
[Trends Genet. 1994]Science. 2003 May 23; 300(5623):1288-91.
[Science. 2003]Proc Natl Acad Sci U S A. 2005 Sep 20; 102(38):13526-31.
[Proc Natl Acad Sci U S A. 2005]Proc Natl Acad Sci U S A. 2006 Sep 5; 103(36):13427-32.
[Proc Natl Acad Sci U S A. 2006]Cancer Res. 2003 Sep 15; 63(18):5691-6.
[Cancer Res. 2003]Exp Cell Res. 2005 Nov 15; 311(1):1-13.
[Exp Cell Res. 2005]PLoS Biol. 2006 Apr; 4(4):e91.
[PLoS Biol. 2006]Genome Res. 2002 Jul; 12(7):1060-7.
[Genome Res. 2002]Science. 2003 May 23; 300(5623):1288-91.
[Science. 2003]Mol Biol Evol. 2005 Aug; 22(8):1702-11.
[Mol Biol Evol. 2005]Trends Genet. 2006 May; 22(5):260-7.
[Trends Genet. 2006]Mol Biol Evol. 2006 Feb; 23(2):401-10.
[Mol Biol Evol. 2006]Nucleic Acids Res. 1995 Jan 11; 23(1):98-102.
[Nucleic Acids Res. 1995]J Neuroimmunol. 2005 Dec; 169(1-2):177-9.
[J Neuroimmunol. 2005]Cytogenet Genome Res. 2005; 110(1-4):8-24.
[Cytogenet Genome Res. 2005]Nature. 1978 Feb 9; 271(5645):501.
[Nature. 1978]Mol Biol Evol. 2005 Aug; 22(8):1702-11.
[Mol Biol Evol. 2005]J Mol Biol. 2007 Mar 2; 366(4):1055-63.
[J Mol Biol. 2007]J Mol Biol. 2004 Aug 20; 341(4):883-6.
[J Mol Biol. 2004]Mol Biol Evol. 2005 Aug; 22(8):1702-11.
[Mol Biol Evol. 2005]J Mol Biol. 2007 Mar 2; 366(4):1055-63.
[J Mol Biol. 2007]Gene. 2002 Oct 16; 299(1-2):83-94.
[Gene. 2002]Nat Rev Genet. 2006 Jul; 7(7):499-509.
[Nat Rev Genet. 2006]Science. 2003 May 23; 300(5623):1288-91.
[Science. 2003]Mol Biol Evol. 2000 Jan; 17(1):32-43.
[Mol Biol Evol. 2000]Comput Appl Biosci. 1997 Oct; 13(5):555-6.
[Comput Appl Biosci. 1997]