![]() | ![]() |
Formats:
|
||||||||||||||||||||||
Copyright © 2005, Cold Spring Harbor Laboratory Press An extraordinary retrotransposon family encoding dual endonucleases Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan 1Corresponding author. E-mail haruh/at/k.u-tokyo.ac.jp.; fax 81-4-7136-3659. Received September 21, 2004; Accepted May 10, 2005. This article has been cited by other articles in PMC.Abstract Retrotransposons commonly encode a reverse transcriptase (RT), but other functional domains are variable. The acquisition of new domains is the dominant evolutionary force that brings structural variety to retrotransposons. Non-long-terminal-repeat (non-LTR) retrotransposons are classified into two groups by their structure. Early branched non-LTR retrotransposons encode a restriction-like endonuclease (RLE), and recently branched non-LTR retrotransposons encode an apurinic/apyrimidinic endonuclease-like endonuclease (APE). In this study, we report a novel non-LTR retrotransposon family Dualen, identified from the Chlamydomonas reinhardtii genome. Dualen encodes two endonucleases, RLE and APE, with RT, ribonuclease H, and cysteine protease. Phylogenetic analyses of the RT domains revealed that Dualen is positioned at the midpoint between the early-branched and the recently branched groups. In the APE tree, Dualen was branched earlier than the I group and the Jockey group. The ribonuclease H domains among the Dualen family and other non-LTR retrotransposons are monophyletic. Phylogenies of three domains revealed the monophyly of the Dualen family members. The domain structure and the phylogeny of each domain imply that Dualen is a retrotransposon conserving the domain structure just after the acquisition of APE. From these observations, we discuss the evolution of domain structure of non-LTR retrotransposons. Retrotransposons are mobile genetic elements found in a wide range of eukaryotes (Arkhipova and Meselson 2000). Retrotransposons have a reverse transcriptase (RT) in common, but other functional domains are quite variable. Retroviruses are considered to be retrotransposons that have acquired a domain for extracellular function (Malik et al. 2000). The acquisition of new domains has provided variable life styles to retroelements. Non-long-terminal-repeat (non-LTR) retrotransposons are one major group of retrotransposons and are considered to be the ancestors of long-terminal-repeat (LTR) retrotransposons and retroviruses (Malik and Eickbush 2001). Non-LTR retrotransposons are classified into two groups by their structure (Malik et al. 1999; Yang et al. 1999). The early branched non-LTR retrotransposons, such as the insect R2, include only one open-reading-frame (ORF) encoding an RT and a restriction-like endonuclease (RLE). In contrast, the recently branched non-LTR retrotransposons, such as the human L1 (long interspersed nuclear element-1, LINE-1), include two ORFs, and the second ORF encodes an RT and an apurinic/apyrimidinic endonuclease-like endonuclease (APE). Several families of the recently branched non-LTR retrotransposons have only the second ORF. Several recently branched non-LTR retrotransposons, such as the I and the TRAS families, have a ribonuclease H (RNH) domain immediately after the RT domain, similar to LTR retrotransposons. The RT domain is the only common structure among all non-LTR retrotransposons (see Fig. 7B
The existence of the endonuclease domain is the most remarkable feature of non-LTR retrotransposons when compared with LTR retrotransposons, which use integrases, but not endonucleases for genome integration. Endonuclease defines a transposition mechanism peculiar to non-LTR retrotransposons that is called target-primed reverse transcription (TPRT). In the nucleus, endonuclease nicks the target DNA, and the free 3′-hydroxyl end of DNA is used as primer for reverse transcription. This contrasts with LTR retrotransposons and retroviruses that use tRNA as primer and are reverse transcribed in the cytoplasm. LTR retrotransposons and retroviruses import their cDNA into the nucleus, and integrate it into the genomic DNA by integrases. Both non-LTR retrotransposons with RLE, and those with APE, were shown to transpose by TPRT (Yang et al. 1999; Cost et al. 2002). Inactivation of endonuclease dramatically reduces the transposition efficiency of non-LTR retrotransposons (Feng et al. 1996; Takahashi and Fujiwara 2002). Because all early branched non-LTR retrotransposons have an RLE and all recently branched retrotransposons encode an APE, it is certain that non-LTR retrotransposons once exchanged their endonuclease type from RLE to APE. Until now, however, we did not have any evidence for this evolutionary event. In this study, we report a novel non-LTR retrotransposon family that encodes both RLE and APE. These elements, which we named Dualen, are positioned phylogenetically at the midpoint between the early branched group and the recently branched group. In addition, Dualen also encodes an RNH domain after the RT domain. We discuss the origin and the evolutionary implication of the extraordinary domain structure of Dualen. Results and Discussion Dualen, a new family of non-LTR retrotransposons, has dual endonuclease domains and ribonuclease H While we screened early branched non-LTR retrotransposons from genomic databases (Kojima and Fujiwara 2004), we identified a novel non-LTR retrotransposon which was apparently distinct from other non-LTR elements, in the Chlamydomonas reinhardtii genomic database at the US Department of Energy Joint Genome Institute (JGI, http://aluminum.jgi-psf.org/prod/bin/runBlast.pl?db=chlre1/). This novel retrotransposon (DualenCr1) (Fig. 1A
It was shown that Dualen constituted a family including several elements that have the same protein domain composition, but their nucleotide sequences were less conserved. From the C. reinhardtii genomic database, we identified three complete Dualen elements (DualenCr1, DualenCr3, DualenCr4) and one related element (DualenCr2), which has 5′-truncation because of partial sequencing and/or incomplete retrotransposition (Fig. 1A To investigate whether the Dualen elements truly exist in the genome of C. reinhardtii and A. thaliana, and to exclude the possibility of assembly error, we tried to detect genomic copies of the Dualen elements by polymerase chain reaction (PCR) (Fig. 1B We could not amplify the PCR product corresponding to the full-length DualenCr4 (Fig. 1B We could not detect either the PCR products for the two proposed Dualen elements (DualenU1 and DualenU2) in the A. thaliana genome (Fig. 1B Genomic structures of Dualen indicate the recent activity for retrotransposition We performed Southern hybridization for further characterization of Dualen in the C. reinhardtii genome (Fig. 1C To obtain further genomic information of Dualen, such as copy number, target sequence preference, and the length of target-site duplication (TSD), we searched Dualen copies from the genomic database. We identified 28 copies of DualenCr1, 26 of DualenCr2, 27 of DualenCr3, and 25 of DualenCr4 from the C. reinhardtii genomic database (Fig. 1A
Antisense RNA of Dualen is transcribed and spliced We next investigated the transcription of Dualen using the EST (expressed sequence tags) database. BLAST search to EST databases at NCBI revealed the transcription of all Dualen elements in C. reinhardtii. We identified 22 (DualenCr1), two (Cr2), six (Cr3), and 12 (Cr4) EST clones that showed >90% nucleotide identity throughout the sequences (Table 2). To our surprise, we found spliced antisense transcripts of DualenCr1 and DualenCr4 in EST sequences (Table 2; Fig. 1D
Dualen is an intermediate retrotransposon between RLE-encoding and APE-encoding retrotransposon groups Because Dualen has both an RLE domain that is considered to be specific for the early branched non-LTR retrotransposons and an APE domain that is considered to be specific for the recently branched retrotransposons, we analyzed the phylogenetic position of Dualen. In the Bayesian phylogenetic inference based on the RT domains, the Dualen family is a monophyletic group positioned at the midpoint between the early branched and the recently branched non-LTR retrotransposons (Fig. 2A
Figure 2B Malik et al. (1999) showed that the “thumb” region of the RT domain appears to be divided into two subtypes, CRE/R2/R4/L1/RTE (corresponding to the R2, the L1, and the RTE groups) and Tad/R1/LOA/Jockey/CR1/I (the I and the Jockey groups). However, the monophyly among the RTE and the latter group was highly supported in their study and in our analysis (98% in the Bayesian tree and 84% in the NJ tree; Fig. 2B Conservation of the catalytic residues in dual endonucleases (RLE and APE) suggests that both endonucleases are functional in retrotransposition The most extraordinary feature of Dualen is its dual endonuclease domains. We identified the complete RLE of five elements except DualenU2 (Fig. 3
With regard to APE, we identified the complete APE sequences in four elements (Fig. 4
Although some substitutions at conserved residues are observed in both RLE and APE, the conservation throughout the domains indicates that both endonuclease activities are required for efficient retrotransposition. Both RLE and APE were reported to cleave the bottom (first, primer) strand (Feng et al. 1996; Yang et al. 1999), and the top (second, nonprimer) strand (Yang et al. 1999; Anzai et al. 2001). Sequence alignments of both endonucleases indicated that both endonuclease activities could be weakened. It is possible that dual endonucleases in Dualen compensate for their weakened activities to each other. Another possibility is that mutations of both endonucleases do not affect the endonuclease activity in the least, and the persistence of two endonucleases is simply a result of selection for the ability to transpose into a wide range of sequences. Even if both endonucleases of Dualen had only a weak activity now, it is likely that the common ancestor of Dualen was more active due to dual endonucleases than retrotransposons that had a single endonuclease. The Bayesian phylogenetic inference of the APE domains is shown in Figure 5A
Ribonuclease H (RNH) of Dualen has the same origin as other non-LTR retrotransposons The other functional domain conserved among several non-LTR retrotransposons and the Dualen family is RNH (Fig. 6A
The Bayesian phylogenetic inference of the RNH domains is shown in Figure 5C The N-terminal josephin-like domain is a cysteine protease The upstream region of the APE domain shows weak similarity to josephin domains, which was recently characterized in ataxin-3, also called Machado-Joseph disease gene, and its related proteins (Albrecht et al. 2003). Since the E-value of the BLAST search to conserved domain database (CDD) was marginal, we performed a PSI-BLAST search of the nonredundant database at NCBI seeded with the putative josephin domain of DualenU1. A total of 53 sequences belonging to the ataxin-3 family and the josephin family were found with statistical significance (E-values <E-5) within two iterations (data not shown). We aligned the presumed josephin domains of Dualen with these two protein family members (Fig. 6B The phylogeny of the josephin and the JCP domains supports the monophyly of Dualen, as well as the josephin family and the ataxin-3 family (Fig. 5E Evolution of non-LTR retrotransposons implicated by the extraordinary domain structure of Dualen Figure 7A There are two possible origins of each domain of Dualen, from cellular genes or from other non-LTR retrotransposons. The simplest event that originated the domain structure of Dualen is that an I group element, which is the only non-LTR retrotransposon group having RNH except Dualen, transposed into an R2 group retrotransposon; however, it is unlikely, because the APE and the RT domains of Dualen are clearly phylogenetically distant from those of the I group elements. RT and endonuclease activities are essential for non-LTR retrotransposons because of their retrotransposition mechanism (Feng et al. 1996; Yang et al. 1999; Cost et al. 2002; Takahashi and Fujiwara 2002). Retrotransposons cannot survive without endonucleases. The dual endonuclease structure of Dualen is the only way to change endonuclease domains without loss of retrotransposition ability. Since early branched retrotransposons have an RLE, the ancestor of Dualen could have newly acquired an APE. Dualen could have acquired an APE from a cellular gene, not from other non-LTR retrotransposons, because Dualen is the most ancient non-LTR retrotransposon having an APE. One of the possible mechanisms of acquiring a cellular APE is that an early branched retrotransposon transposed to just downstream of a cellular APE gene was cotranscribed and was comobilized. Co-mobilization of 5′-flanking sequences was experimentally demonstrated in the human L1 and the silkworm SART1 (Symer et al. 2002; Takahashi and Fujiwara 2002). RLE could have been lost after the branch between Dualen and recently branched non-LTR retrotransposons. The RNH domain is considered to have been acquired independent of the APE domain, because the RNH domain of Dualen is positioned between the RT and the RLE domains, both of which are related to those of the early branched non-LTR retrotransposons. The RNH domains of non-LTR retrotransposons are monophyletic; thus, either the Dualen group or the I group could have acquired a cellular RNH gene. If the I group had acquired a cellular RNH, Dualen would be a chimeric retrotransposon whose RNH was acquired from the I group retrotransposons. But, there have been no obvious reports of chimeras or domain swapping between two non-LTR retrotransposons. If Dualen had acquired a cellular RNH, the L1, the RTE, and the Jockey groups would have lost their RNH domains secondarily (Fig. 7B In addition, since LTR retrotransposons and retroviruses are considered to have evolved from non-LTR retrotransposons having both Gag and RNH (Malik and Eickbush 2001), LTR retrotransposons could have branched from the common ancestor of the recently branched non-LTR retrotransposons. The most ancient LTR retrotransposon group is the Ty1/copia group, which retains protease and integrase domains, instead of the APE domain of non-LTR retrotransposons (Fig. 7B Methods Computer-based nucleotide and protein searches were performed using different BLAST search programs (Altschul et al. 1997) at NCBI (http://www.ncbi.nlm.nih.gov/BLAST) and JGI (http://aluminum.jgi-psf.org/prod/bin/runBlast.pl?db=chlre1). Protein sequences of non-LTR retrotransposons previously described (Kojima and Fujiwara 2004) were used as queries for database searches. As the Chlamydomonas reinhardtii genomic sequences were in draft format and there were no single sequences containing complete retrotransposons, we constructed representative retrotransposon sequences from several sequences derived from different genomic positions. Sequences more than 90% identical to each other were connected in order to include longer ORFs. The reconstructed sequences of the Dualen elements from C. reinhardtii are available from the authors' Web site (http://www.biol.s.u-tokyo.ac.jp/users/animal/kojima/sequence.html). DualenU1 corresponds to bases 20173–30250 of AC109923 and DualenU2 corresponds to bases 133667–134928 joined to 89898–95300 of AC109923. Amino acid sequences of elements were aligned using CLUSTAL X (Thompson et al. 1997) on the basis of previous reports (Malik et al. 1999; Burke et al. 2002; Kojima and Fujiwara 2004; Weichenrieder et al. 2004). Bayesian phylogenetic trees were constructed using MrBayes 3 (Ronquist and Huelsenbeck 2003). Neighbor-joining (NJ) trees were constructed using CLUSTAL X. Nonparametric bootstrap analyses were performed with 1000 replicates. Primers used for PCR are listed in Supplemental Table 1. Probes used in Southern hybridization were amplified by PCR with pairs of primers as follows: Cr1-5′, DualenCr1_F2 and DualenCr1_R2; Cr1-3′, DualenCr1_F1 and DualenCr1_R1; Cr2, DualenCr2_F1 and DualenCr2_R1; Cr3, DualenCr3_F1 and DualenCr3_R1; Cr4, DualenCr4_F1 and DualenCr4_R1. PCR conditions were as follows: 35 cycles of 96°C for 30 sec, 60°C for 30 sec, and 72°C for 1 min. Approximately 5 μg of genomic DNA was digested with respective restriction enzymes (EcoRI, BglII, and HindIII), separated on 1.0% agarose gel and blotted onto Hybond-N+ nylon membrane (Amersham) in 0.4N NaOH. Radioactive probes were obtained by using BcaBEST Labeling Kit (TaKaRa) with [α-32P]dCTP (ICN). Hybridization was performed at 45°C in 50% formamide, 10× Denhardt's solution (1× Denhardt's solution is 0.2% each of BSA, Ficoll, and polyvinylpyrrolidone), 50 mM sodium phosphate (pH 7.0), and 25 μg/mL sonicated salmon sperm DNA in 5× SSC, and the 32P-labeled DNA probe. Post-hybridization washes were carried out in 2× SSC with 0.1% SDS for 15 min at 65°C and 0.2× SSC with 0.1% SDS for 15 min at 65°C. Acknowledgments Chlamydomonas reinhardtii sequence data were produced by the US Department of Energy Joint Genome Institute, http://www.jgi.doe.gov/ and are provided for use in this publication/correspondence only. Genomic DNA of Chlamydomonas reinhardtii was kindly provided by Masafumi Hirono, and genomic DNA of Arabidopsis thaliana by Mari Kurosawa and Kintake Sonoike. We thank Hiroyuki Toh, Mizuko Osanai, and Hideyuki Aoyagi for discussions and critical reading of the manuscript. This work was supported by grants from the Ministry of Education, Science and Culture of Japan (MESCJ), by a Grant-in-Aid from the Research for the Future Program of the Japan Society for the Promotion of Science (JSPS), and by the JSPS Research Fellowships for Young Scientists. Notes [Supplemental material is available online at www.genome.org and http://www.biol.s.u-tokyo.ac.jp/users/animal/kojima/sequence.html. The following individuals and institute kindly provided reagents, samples, or unpublished information as indicated in the paper: M. Hirono, M. Kurosawa, K. Sonoike, and the US Department of Energy Joint Genome Institute.] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3271405. References
Web site references
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||
Proc Natl Acad Sci U S A. 2000 Dec 19; 97(26):14473-7.
[Proc Natl Acad Sci U S A. 2000]Genome Res. 2000 Sep; 10(9):1307-18.
[Genome Res. 2000]Mol Biol Evol. 1999 Jun; 16(6):793-805.
[Mol Biol Evol. 1999]Proc Natl Acad Sci U S A. 1999 Jul 6; 96(14):7847-52.
[Proc Natl Acad Sci U S A. 1999]Proc Natl Acad Sci U S A. 1999 Jul 6; 96(14):7847-52.
[Proc Natl Acad Sci U S A. 1999]EMBO J. 2002 Nov 1; 21(21):5899-910.
[EMBO J. 2002]Cell. 1996 Nov 29; 87(5):905-16.
[Cell. 1996]EMBO J. 2002 Feb 1; 21(3):408-17.
[EMBO J. 2002]Mol Biol Evol. 2004 Feb; 21(2):207-17.
[Mol Biol Evol. 2004]Nucleic Acids Res. 2003 Jan 1; 31(1):383-7.
[Nucleic Acids Res. 2003]Nature. 2000 Dec 14; 408(6814):796-815.
[Nature. 2000]Nat Genet. 2002 Jun; 31(2):159-65.
[Nat Genet. 2002]Mol Biol Evol. 2003 Mar; 20(3):351-61.
[Mol Biol Evol. 2003]Mol Biol Evol. 2004 Feb; 21(2):207-17.
[Mol Biol Evol. 2004]Mol Gen Genet. 1996 Aug 27; 252(1-2):137-45.
[Mol Gen Genet. 1996]Mol Biol Evol. 1999 Jun; 16(6):793-805.
[Mol Biol Evol. 1999]Mol Biol Evol. 2003 Feb; 20(2):248-54.
[Mol Biol Evol. 2003]Mol Biol Evol. 2004 Feb; 21(2):207-17.
[Mol Biol Evol. 2004]Genome Res. 2003 Jul; 13(7):1686-95.
[Genome Res. 2003]Mol Biol Evol. 1999 Jun; 16(6):793-805.
[Mol Biol Evol. 1999]Proc Natl Acad Sci U S A. 1999 Jul 6; 96(14):7847-52.
[Proc Natl Acad Sci U S A. 1999]Proc Natl Acad Sci U S A. 1993 Oct 15; 90(20):9596-600.
[Proc Natl Acad Sci U S A. 1993]Eukaryot Cell. 2004 Feb; 3(1):170-9.
[Eukaryot Cell. 2004]Cell. 1996 Nov 29; 87(5):905-16.
[Cell. 1996]Nucleic Acids Res. 2003 Aug 1; 31(15):4646-53.
[Nucleic Acids Res. 2003]Cell. 1996 Nov 29; 87(5):905-16.
[Cell. 1996]Proc Natl Acad Sci U S A. 1999 Jul 6; 96(14):7847-52.
[Proc Natl Acad Sci U S A. 1999]Mol Cell Biol. 2001 Jan; 21(1):100-8.
[Mol Cell Biol. 2001]Mol Biol Evol. 1999 Jun; 16(6):793-805.
[Mol Biol Evol. 1999]Mol Biol Evol. 1999 Jun; 16(6):793-805.
[Mol Biol Evol. 1999]Proteins. 2003 Feb 1; 50(2):355-70.
[Proteins. 2003]Hum Mol Genet. 2003 Nov 1; 12(21):2845-52.
[Hum Mol Genet. 2003]Hum Mol Genet. 2003 Dec 1; 12(23):3195-205.
[Hum Mol Genet. 2003]Genome Biol. 2003; 4(2):R11.
[Genome Biol. 2003]Genetics. 2000 Jan; 154(1):193-203.
[Genetics. 2000]Cell. 1996 Nov 29; 87(5):905-16.
[Cell. 1996]Proc Natl Acad Sci U S A. 1999 Jul 6; 96(14):7847-52.
[Proc Natl Acad Sci U S A. 1999]EMBO J. 2002 Nov 1; 21(21):5899-910.
[EMBO J. 2002]EMBO J. 2002 Feb 1; 21(3):408-17.
[EMBO J. 2002]Cell. 2002 Aug 9; 110(3):327-38.
[Cell. 2002]Mol Biol Evol. 1999 Jun; 16(6):793-805.
[Mol Biol Evol. 1999]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Mol Biol Evol. 2004 Feb; 21(2):207-17.
[Mol Biol Evol. 2004]Nucleic Acids Res. 1997 Dec 15; 25(24):4876-82.
[Nucleic Acids Res. 1997]Mol Biol Evol. 1999 Jun; 16(6):793-805.
[Mol Biol Evol. 1999]Mol Biol Evol. 2002 May; 19(5):619-30.
[Mol Biol Evol. 2002]Mol Biol Evol. 2004 Feb; 21(2):207-17.
[Mol Biol Evol. 2004]Structure. 2004 Jun; 12(6):975-86.
[Structure. 2004]Mol Biol Evol. 2004 Feb; 21(2):207-17.
[Mol Biol Evol. 2004]Mol Biol Evol. 1999 Jun; 16(6):793-805.
[Mol Biol Evol. 1999]Mol Biol Evol. 2002 May; 19(5):619-30.
[Mol Biol Evol. 2002]