![]() | ![]() |
Formats:
|
||||||||||||||||
Copyright © 2006 by the Genetics Society of America Genomewide Comparative Analysis of the Highly Abundant Transposable Element DINE-1 Suggests a Recent Transpositional Burst in Drosophila yakuba Faculty of Life Sciences & Institute of Genome Sciences, National Yang-Ming University, Peitou, Taipei 112, Taiwan, Republic of China 1Corresponding author: Institute of Genetics, National Yang-Ming University, 155 Li-Nong St., Sec. 2, Peitou, Taipei 112, Taiwan, Republic of China. E-mail: hpyang/at/ym.edu.tw Communicating editor: D. Begun Received September 29, 2005; Accepted December 18, 2005. This article has been cited by other articles in PMC.Abstract DINE-1 (Drosophila interspersed element) is the most abundant repetitive sequence in the Drosophila genome derived from transposable elements. It comprises >1% of the Drosophila melanogaster genome (DMG) and is believed to be a relic from an ancient transpositional burst that occurred ~5–10 MYA. We performed a genomewide comparison of the abundance, sequence variation, and chromosomal distribution of DINE-1 in D. melanogaster and D. yakuba. Unlike the highly diverged copies in the DMG (pairwise distance ~15%), DINE-1's in the Drosophila yakuba genome (DYG) have diverged by only 3.4%. Moreover, the chromosomal distribution of DINE-1 in the two species is very different, with a significant number of euchromatic insertions found only in D. yakuba. We propose that these different patterns are caused by a second transpositional burst of DINE-1's in the D. yakuba genome ~1.5 MYA. On the basis of the sequence of these recently transposed copies, we conclude that DINE-1 is likely to be a family of nonautomomous DNA transposons. Analysis of the chromosomal distribution of two age groups of DINE-1's in D. yakuba indicates that (1) there is a negative correlation between recombination rates and the density of DINE-1's and (2) younger copies are more evenly distributed in the chromosome arms, while older copies are mostly located near the centromere regions. Our results fit the predictions of a selection–transposition balance model. Our data on whole-genome comparison of a highly abundant TE among Drosophila sibling species demonstrate the unexpectedly dynamic nature of TE activity in different host genomes. INTERSPERSED repetitive sequences compose a significant portion of the genomes of almost all organisms (Finnegan 1989; Kidwell and Lisch 1997; Bartolome et al. 2002; Deininger and Roy-Engel 2002). Up to 10% of the genome is composed of dispersed repetitive DNA in Arabidopsis (Arabidopsis Initiative 2000), ~40% in rice (Goff et al. 2002; Yu et al. 2002), ~10–15% in nematodes (C. elegans Sequencing Consortium 1998) and flies (Quesneville et al. 2005), and ~37–46% in mice (Mouse Genome Sequencing Consortium 2002) and humans (International Human Genome Sequencing Consortium 2001). These sequences are mostly transposable elements (TEs), which can move to novel genomic positions. According to the mechanism of transposition, TEs are divided into transposons (DNA mediated) and retrotransposons (RNA mediated) (Berg and Howe 1989; McDonald 1993). TEs can also be divided by their ability to direct their own transposition (see review in Kazazian 2004). Autonomous TEs code for the proteins required for transposition and are mobilized in cis. Nonautonomous TEs are mobilized in trans by enzymes produced from autonomous elements. The best-known example of this dichotomy is the long interspersed element (LINE) families, and the short interspersed elements (SINEs). The well-known human SINE, the nonautonomous element Alu, can be mobilized in trans by autonomous LINE-1 elements (Deininger et al. 2003; Dewannieux et al. 2003). Another example of a nonautonomous TE is the family of miniature inverted-repeat transposable elements (MITEs) in plants and animals. Interestingly, MITEs are the predominant TEs that are associated with the noncoding regions of genes of flowering plants. They have also been found in several animal genomes, including Caenorhabditis elegans, mosquitoes, fish, and humans (reviewed in Feschotte et al. 2002). These repetitive sequences are mostly parasitic to their host (Dawkins 1976; Doolittle and Sapienza 1980); i.e., TE insertions are, on average, deleterious to host fitness (Houle and Nuzhdin 2004). They disrupt host genes (Charlesworth and Langley 1989; Biemont et al. 1997) and increase the likelihood of ectopic rearrangement (Charlesworth and Langley 1989; Montgomery et al. 1991; Virgin and Bailey 1998). When the host is unable to fully inactivate TEs and they do not impair host fitness too severely, TEs and their host will coexist. If TEs persist, they can have an important impact on chromosome dynamics and genome evolution (Kazazian 2004). They can cause nonhomologous recombination and thus generate genetic diversity of the host genome, and their activity and numbers have an important impact on genome-size evolution (Petrov 2001). They also play important roles in gene regulation and chromatin assembly (reviewed in Csink and Henikoff 1998; Kazazian 2004). With the advance of genomic tools and genome databases, we now are able to study the “paleontology” of TEs in closely related species to understand the evolutionary dynamics of TEs and the mechanisms regulating their activities. Unlike plant and mammalian genomes, where TEs seem to be under less control [e.g., MITEs in grass (Feschotte et al. 2002) and Alu in primates (Batzer and Deininger 2002)], it is generally believed that TEs in the compact Drosophila melanogaster genome (DMG) are well controlled. This genome contains many families of active TEs, with relatively few copies (<100) in each family (Kaminker et al. 2002). The only exception is DINE-1 (Drosophila interspersed element, also named DNAREP1_DM, INE). These elements are very abundant in the DMG. There are >1000 highly fragmented and diverged copies located in the heterochromatic regions, such as the pericentric regions and the fourth chromosome (Locke et al. 1999; Kapitonov and Jurka 2003; Singh and Petrov 2004; Quesneville et al. 2005; Singh et al. 2005). No open reading frames (ORFs) or target-site duplications can be found in the existing copies in the DMG. Although the mechanisms responsible for their propagation and distribution in the genome remain a puzzle, DINE-1's are believed to be the relics of a family of ancient retroelements that experienced a transpositional burst ~5–10 MYA (Kapitonov and Jurka 2003; Singh and Petrov 2004). DINE-1's are highly diverged (~15%) from each other in D. melanogaster. We first became interested in DINE-1 in D. yakuba after discovering that, unlike in D. melanogaster, D. yakuba contains a large number of DINE-1's that are highly similar to each other. We present evidence below that this difference can be explained by the occurrence of a recent transpositional burst of DINE-1 in D. yakuba, which generated many copies of DINE-1 with low sequence divergence. Furthermore, we found that there is a dramatic difference in the chromosomal distribution and insertion-site frequency of different age groups of DINE-1's: Young copies (recently transposed) are evenly distributed along the chromosomes; old copies are restricted to heterochromatic regions. Our results support the hypothesis of the deleterious effects of repetitive sequences on host fitness. Their random insertions into the host genome are followed by efficient removal in regions of high recombination and their accumulation in regions of low recombination. MATERIALS AND METHODS Sequence exaction of DINE-1 elements from D. yakuba: Using the reported consensus sequence for DINE-1 in D. melanogaster (Kapitonov and Jurka 2003) as a query, we searched for similar sequences in the D. yakuba genome database (Release 2.0 D. yakuba whole-genome shotgun assembly, http://rana.lbl.gov/drosophila/yakuba.html) using BLASTN with the default setting of the parameters. We retrieved the 50 copies of DINE-1 with the lowest E-value, aligned them, and derived the consensus sequence. Using this DINE-1 consensus sequence as the query, we used WU BLAST to search for DINE-1's in the Drosophila yakuba genome (DYG) with the default settings of parameters. In total, we obtained >50,000 hits from the blast search. We eliminated hits with E-values >0.00001 and with alignable length <300 bp. We then selected the top 1000 hits with the longest aligned lengths. After eliminating redundant copies with the same 5′ and 3′ flanking sequences, we obtained 933 copies of DINE-1. We further partitioned these copies into three categories: G1, G2, and G3, based on the sequence similarity to the consensus sequence, with <4%, 4–6%, and >6% pairwise distances to the consensus, respectively (see below). Sequence alignment and pairwise distance computation: Sequences were aligned using ClustalW from the web server of European Bioinformatics Institute (http://www.ebi.ac.uk/clustalw/) with the default parameter settings. Alignments were then further improved by manual checks and adjustments. On the basis of the multiple alignment output, pairwise distances were estimated with the maximum-likelihood method using MEGA 3.0 (Kumar et al. 2004) with the Jukes–Cantor one-parameter model (Jukes and Cantor 1969) assuming equal rates of substitution among four nucleotides. Location mapping: We mapped the locations of each DINE-1 sequence found by BLAST search onto the assembled chromosome sequences of the DMG (Release 4, http://flybase.net/blast/) and the DYG (Release 2.0 whole-genome shotgun assembly). The density of DINE-1's along each of the chromosome arms was calculated by counting the number found in each 1-Mb interval (or 0.1-Mb interval for the fourth chromosome). RESULTS Abundance and structure of DINE-1: From the Blast search results, we estimated that the total number of DINE-1's in the DYG is at least 3000 copies, with an average length of ~600 bp. Under the same stringency of Blast search, there are ~1000 copies of DINE-1's in the DMG, constituting ~1% of the total DMG. These are likely to be underestimates since the Blast search result is biased toward young copies. Ancient copies are likely to be missed, due to high sequence divergence and fragmentation from mutation accumulation. We aligned and compared the consensus sequences of DINE-1's from the DMG and the DYG. The length of the D. yakuba DINE-1 consensus is 823 bp. Compared with the D. melanogaster DINE-1 consensus (Locke et al. 1999), the sequence structure of DINE-1 is similar in these two species (Figure 1
To understand the potential transpositional mechanism that results in such high copy numbers of DINE-1, we looked for various sequence motifs and features that are found in other highly abundant elements, such as SINEs or MITEs. We did not find any DNA polymerase III promoter motif or tRNA-derived elements, which are typical features of SINEs, using the program Pol3scan (Pavesi et al. 1994). We also did not find long-terminal repeats or a poly(A) tail, which are characteristics of the two types of retroelements. We therefore suggest that DINE-1 is unlikely to transpose through an RNA-mediated mechanism, as previously suggested (Kapitonov and Jurka 2003). In contrast, we found that in both species DINE-1 sequences contain two 12-bp perfect inverted repeats at or close to the termini (at positions 1–12 bp and 761–773 bp), which is a characteristic of transposons. However, we did not find any reliable open reading frames in the consensus sequence of D. yakuba DINE-1, suggesting that this element is nonautonomously transposed. Discovery of insertion target site and target-site duplication of DINE-1 elements: Many transposons have some insertion-site specificity and also can cause duplications at the insertion site. These features would be difficult to detect in D. melanogaster since DINE-1 and flanking sequences are likely to have accumulated many mutations since their last period of activity. Our analysis of divergence among DINE-1's from D. yakuba suggested that DINE-1 may have been transpositionally active much more recently in this species than in D. melanogaster (see below). This provided us the opportunity to search for potential target sites and target-site duplications in recently transposed DINE-1's. We sampled 10 strains of D. yakuba and looked for the presence of DINE-1 insertions by PCR at three euchromatic sites, which we had found containing “young” DINE-1 insertions in the genome reference strain (see materials and methods). We found that none of the sampled strains contain DINE-1 inserts at any of the three sites of insertion found in the genome reference strain. We then compared the homologous sequences of sites with and without DINE-1 inserts. We discovered that all three copies had a 2-bp target-site duplication of TT. We then looked at our entire set of young copies and found TT flanking DINE-1 on both sides in ~50% of them. We conclude that DINE-1 preferentially inserts at sites of TT and causes a 2-bp target-site duplication. On the basis of the existence of inverted repeats close to the termini, target-site duplication, and the conservation of sequences in both ends among copies [all common features of transposons (Kaminker et al. 2002)], we propose that DINE-1 is likely to transpose through a DNA-mediated mechanism. Recent tranpositional burst of DINE-1 in D. yakuba: The frequency distribution of pairwise distances between DINE-1 copies in D. yakuba has a single peak at ~7%, with a long tail toward high divergence. This distribution suggested to us that D. yakuba may contain a large number of both ancient and relatively young copies of DINE-1. This hypothesis is illustrated in Figure 2
In contrast, the distribution of DINE-1 in D. yakuba contains a sharp peak of relatively low divergence (~3.4%), which we suggest reflects a second burst of transposition that occurred in the D. yakuba lineage. We calculate that this burst occurred ~1.5 MYA [assuming a per nucleotide mutation rate per generation to be 1.11 × 10−8 (Tamura et al. 2004)]. The highly diverged copies found in D. yakuba might represent the remnants of the first transpositional burst that occurred in the common ancestor of D. melanogaster and D. yakuba and then became inactive. Chromosomal distribution of DINE-1's in D. melanogaster and D. yakuba: Because D. melanogaster and D. yakuba appear to have very different age distributions of DINE-1, we speculated that they may also have different chromosomal distributions. Given the availability of assembled chromosome arms in D. melanogaster and D. yakuba, we were able to map and compare the chromosomal distribution of DINE-1 in both species. We retrieved the 2-kb flanking sequences in the 5′-end of each DINE-1 copy found in the Blast search against both genome databases. We mapped the chromosomal locations of each copy onto the chromosomal assembly of the DMG, assuming the chromosomal location is conserved at a fine-scale level between D. melanogaster and D. yakuba. This assumption is clearly not correct, as D. yakuba contains multiple inversions relative to D. melanogaster (Lemeunier and Ashburner 1976). However, we note that most of these inversions have the net effect of “relocating” genes between euchromatic regions, rather than between euchromatic and heterochromatic regions (Lemeunier and Ashburner 1976), and thus cannot account for the striking patterns that we describe below. Due to the presence of repetitive elements in the flanking sequences, the locations for some copies were not determined. In total, we retrieved the 5′ flanking sequences of 1684 DINE-1 copies from the Blast search in the D. yakuba genome and 563 flanking sequences of DINE-1 from the D. melanogaster genome (Table 1).
From the analysis of the chromosomal location of DINE-1's, we found that in both species DINE-1's are more abundant in the heterochromatic regions than in the euchromatic regions, being especially dense around the centromeric regions in all chromosomes (Figure 3 Chromosomal distribution of young and old copies of DINE-1 elements in the D. yakuba genome: To test the above hypothesis, we partitioned the 933 DINE-1's into three groups on the basis of their sequence divergence. We compared the chromosomal distributions of young (G1) and old (G3) groups and found that the distributions are very different between these two groups. Younger copies are much more evenly spread along the chromosomal arms and older copies are mostly located close to, or in, the centromere regions (Figure 4
DISCUSSION DINE-1 is the most abundant TE found in the DMG (Kapitonov and Jurka 2003). More than a thousand copies of DINE-1s are present in the sequenced reference strain. Unlike most TEs found in the DMG, sequences of DINE-1's are highly divergent. They are believed to be relics from an ancient transpositional burst and most copies, if not all, lost their mobility >3 MYA (Kapitonov and Jurka 2003). DINE-1 was originally reported to share a weak similarity with the retroelements SINE (Locke et al. 1999) and Penelope (Kapitonov and Jurka 2003). However, because of the antiquity of these copies and their accumulation of mutations, the mechanism of transposition inferred from sequence data remains ambiguous. Our study of the recently completed D. yakuba whole-genome sequence has revealed that DINE-1's are even more abundant than in D. melanogaster and that D. yakuba copies appear to be much younger. We were able to retrieve sequences of DINE-1 under the same stringency of conservation in the two species and to infer the age of the elements on the basis of the sequence divergence analysis. This approach allowed us to derive the consensus for the most recently active copies of DINE-1 in the D. yakuba genome and, further, to infer the mechanism of transposition of DINE-1. On the basis of the discovery of terminal inverted repeats, target-site duplications, and conservation of both termini sequences, we propose that DINE-1 is a family of DNA-mediated transposons. However, we did not detect any significant ORF in DINE-1, suggesting that this element may be nonautonomously transposed, similar to MITEs in plants (Feschotte et al. 2002). The dramatic differences in the levels of divergence among DINE-1 copies in D. melanogaster and D. yakuba (speciated >5 MYA) suggest very different evolutionary dynamics of TEs in these two host genomes. Assuming neutral evolution of DINE-1 sequences, the >15% divergence among copies in D. melanogaster indicates that the first transpositional burst of DINE-1 likely occurred in the common ancestor of D. melanogaster and D. yakuba. In contrast to D. melanogaster, we discovered that the peak of sequence divergence among DINE-1's in D. yakuba is only ~3%. We propose that the activity of DINE-1 was suppressed in the common ancestor of both species and remained inactivate in the D. melanogaster lineage, but was reactivated in D. yakuba in the last 1 MY (see Figure 5
It is well known that transpositional rates can vary greatly among lines (Pasyukova and Nuzhdin 1993) and that transpositional bursts have occasionally been found in laboratory stocks (Biemont et al. 1987). Our suggestion that D. yakuba experienced a recent transpositional burst is unlikely to be an artifact of laboratory culture, because very few of the “young” copies of DINE-1 that we identified in D. yakuba are identical to each other. We also note that our preliminary analysis suggests that at least some “young” copies of DINE-1 are found in multiple, recently collected strains of D. yakuba. Therefore, the pattern of DINE-1 that we have described in the sequenced genome strain is likely to reflect a general property of D. yakuba. Our comparative genomic analyses of the chromosomal locations of DINE-1 in two closely related species allow us to further investigate the evolutionary mechanisms controlling the number and distribution of TEs in the host genome. Previous studies on chromosomal distribution of TEs in D. melanogaster and other species have concluded that TEs are more densely distributed in the heterochromatic regions, such as near the centromeres, and are less abundant in the euchromatic regions (Charlesworth et al. 1992, 1994; Bartolome et al. 2002; Blumenstiel et al. 2002; Kaminker et al. 2002; Rizzon et al. 2002). However, the mechanism causing the observed TE distribution pattern is not yet known conclusively. Two alternative hypotheses can explain this pattern: (1) biased insertion of TEs into heterochromatic regions (Dimitri and Junakovic 1999) or (2) lack of recombination in the heterochromatic regions slowing down the process of eliminating deleterious TEs. Previous studies (Charlesworth et al. 1994; Biemont et al. 1997; Bartolome et al. 2002; Lerat et al. 2003) based on multiple families of TEs that have a relatively low copy number (~10–100 copies/genome) have limited resolution in distinguishing these two hypotheses due to the fact that different TEs may have invaded into the host genome at different times and have experienced different evolutionary dynamics. One requires the comparison of whole-genome sequence data in multiple species to reconstruct the evolutionary history of TEs in the host genome. Here we have taken advantage of the recently completed whole-genome sequences of D. melanogaster and D. yakuba to compare the sequence variation and distribution of DINE-1. Our analysis based on the genomewide comparison of this highly abundant TE family in these two closely related species has provided a higher-resolution view of the evolutionary history and dynamics between TEs and their hosts. Our data set of DINE-1 in the two species allows us to detect insertions at two different stages in the transposition cycle of this TE. We are able to demonstrate that in D. yakuba the genome consists of at least two groups of DINE-1, which we propose are derived from two separate transpositional bursts, one in the common ancestor of D. melanogaster and D. yakuba and one only in the lineage leading to D. yakuba (Figure 5 Our results are also consistent, however, with the possibility that DINE-1's may have functional roles in heterochromatic regions. Recent evidence has shown that repetitive sequences in heterochromatic regions play important roles in chromatin assembly and gene regulation (Kazazian 2004). The abundance of old copies in the heterochomatic regions may therefore be due to the combined effect of reduced negative selection and positive selection. Further studies of the population distribution of insertion frequency and sequence variation both within and between species may help to understand the roles of TEs in both euchromatin and heterochromatin. Acknowledgments We thank Daniel Barbash, Sergey Nuzhdin, Casey Bergman, Chuck Langley, and David Waxman for helpful discussions and comments on the manuscript. We thank Chuan-Hsiung Chang and Yi-Fong Wu for help with the 5′ flanking sequence database construction. We also acknowledge support from the National Science Foundation, where this work was initiated at the University of California at Davis by H.P.Y. under a grant to Sergey Nuzhdin. This work was supported by a grant from the National Science Council of the Republic of China (NSC 93-2311-B-010-007) to H.P.Y. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
Trends Genet. 1989 Apr; 5(4):103-7.
[Trends Genet. 1989]Proc Natl Acad Sci U S A. 1997 Jul 22; 94(15):7704-11.
[Proc Natl Acad Sci U S A. 1997]Mol Biol Evol. 2002 Jun; 19(6):926-37.
[Mol Biol Evol. 2002]Nature. 2000 Dec 14; 408(6814):796-815.
[Nature. 2000]Science. 2002 Apr 5; 296(5565):92-100.
[Science. 2002]Nature. 1980 Apr 17; 284(5757):601-3.
[Nature. 1980]Genet Res. 2004 Feb; 83(1):7-18.
[Genet Res. 2004]Annu Rev Genet. 1989; 23():251-87.
[Annu Rev Genet. 1989]Genetics. 1997 Dec; 147(4):1997-9.
[Genetics. 1997]Genetics. 1991 Dec; 129(4):1085-98.
[Genetics. 1991]Nat Rev Genet. 2002 May; 3(5):329-41.
[Nat Rev Genet. 2002]Nat Rev Genet. 2002 May; 3(5):370-9.
[Nat Rev Genet. 2002]Genome Biol. 2002; 3(12):RESEARCH0084.
[Genome Biol. 2002]Chromosoma. 1999 Nov; 108(6):356-66.
[Chromosoma. 1999]Proc Natl Acad Sci U S A. 2003 May 27; 100(11):6569-74.
[Proc Natl Acad Sci U S A. 2003]Proc Natl Acad Sci U S A. 2003 May 27; 100(11):6569-74.
[Proc Natl Acad Sci U S A. 2003]Brief Bioinform. 2004 Jun; 5(2):150-63.
[Brief Bioinform. 2004]Chromosoma. 1999 Nov; 108(6):356-66.
[Chromosoma. 1999]Nucleic Acids Res. 1994 Apr 11; 22(7):1247-56.
[Nucleic Acids Res. 1994]Proc Natl Acad Sci U S A. 2003 May 27; 100(11):6569-74.
[Proc Natl Acad Sci U S A. 2003]Genome Biol. 2002; 3(12):RESEARCH0084.
[Genome Biol. 2002]Mol Biol Evol. 2004 Apr; 21(4):670-80.
[Mol Biol Evol. 2004]Mol Biol Evol. 2004 Jan; 21(1):36-44.
[Mol Biol Evol. 2004]Proc R Soc Lond B Biol Sci. 1976 May 18; 193(1112):275-94.
[Proc R Soc Lond B Biol Sci. 1976]Proc Natl Acad Sci U S A. 2003 May 27; 100(11):6569-74.
[Proc Natl Acad Sci U S A. 2003]Mol Biol Evol. 2004 Apr; 21(4):670-80.
[Mol Biol Evol. 2004]Genetics. 2005 Feb; 169(2):709-22.
[Genetics. 2005]Genet Res. 1992 Oct; 60(2):115-30.
[Genet Res. 1992]Mol Biol Evol. 2002 Jun; 19(6):926-37.
[Mol Biol Evol. 2002]Proc Natl Acad Sci U S A. 2003 May 27; 100(11):6569-74.
[Proc Natl Acad Sci U S A. 2003]Chromosoma. 1999 Nov; 108(6):356-66.
[Chromosoma. 1999]Nat Rev Genet. 2002 May; 3(5):329-41.
[Nat Rev Genet. 2002]Nature. 1987 Oct 22-28; 329(6141):742-4.
[Nature. 1987]Genet Res. 1992 Oct; 60(2):115-30.
[Genet Res. 1992]Genet Res. 1994 Dec; 64(3):183-97.
[Genet Res. 1994]Mol Biol Evol. 2002 Jun; 19(6):926-37.
[Mol Biol Evol. 2002]Mol Biol Evol. 2002 Dec; 19(12):2211-25.
[Mol Biol Evol. 2002]Genome Biol. 2002; 3(12):RESEARCH0084.
[Genome Biol. 2002]Trends Genet. 1999 Apr; 15(4):123-4.
[Trends Genet. 1999]Annu Rev Genet. 1989; 23():251-87.
[Annu Rev Genet. 1989]Genetics. 1991 Dec; 129(4):1085-98.
[Genetics. 1991]Science. 2004 Mar 12; 303(5664):1626-32.
[Science. 2004]Mol Biol Evol. 2004 Apr; 21(4):670-80.
[Mol Biol Evol. 2004]