• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Sep 2001; 11(9): 1527–1540.
PMCID: PMC311128

Drosophila Euchromatic LTR Retrotransposons are Much Younger Than the Host Species in Which They Reside

Abstract

The recent release of the complete euchromatic genome sequence of Drosophila melanogaster offers a unique opportunity to explore the evolutionary history of transposable elements (TEs) within the genome of a higher eukaryote. In this report, we describe the annotation and phylogenetic comparison of 178 full-length long terminal repeat (LTR) retrotransposons from the sequenced component of the D. melanogaster genome. We report the characterization of 17 LTR retrotransposon families described previously and five newly discovered element families. Phylogenetically, these families can be divided into three distinct lineages that consist of members from the canonical Copia and Gypsy groups as well as a newly discovered third group containing BEL, mazi, and roo elements. Each family consists of members with average pairwise identities ≥99% at the nucleotide level, indicating they may be the products of recent transposition events. Consistent with the recent transposition hypothesis, we found that 70% (125/178) of the elements (across all families) have identical intra-element LTRs. Using the synonymous substitution rate that has been calculated previously for Drosophila (.016 substitutions per site per million years) and the intra-element LTR divergence calculated here, the average age of the remaining 30% (53/178) of the elements was found to be 137,000 ±89,000 yr. Collectively, these results indicate that many full-length LTR retrotransposons present in the D. melanogaster genome have transposed well after this species diverged from its closest relative Drosophila simulans, 2.3 ± .3 million years ago.

Retrotransposons are the most abundant and widespread class of eukaryotic transposable elements. For example, >50% of the maize genome (SanMiguel et al. 1996) and >40% of the human genome (Smit 1999) are comprised of retrotransposons. The biological importance of retrotransposons ranges from their contribution to mutation (Green 1988) and disease (Deininger and Batzer 1999) to their postulated role in evolution (McDonald 1990, 1993; Kidwell and Lisch 1997). The genome sequencing of humans and selected experimental and agriculturally important species is providing an unprecedented opportunity to view the patterns of variation existing among the entire complement of retrotransposons in complete genomes.

Retrotransposons are made up of short interspersed nuclear elements (SINES), long interspersed nuclear elements [LINES, also known as non-long terminal repeat (LTR) retrotransposons], LTR retrotransposons, and retroviruses. LTR retrotransposons are named for their long terminal repeats, which contain transcriptional regulatory sites and flank the internal coding regions of the elements (Boeke and Stoye 1997). LTR retrotransposons are classically divided into two groups, the Copia/Ty1 group and the Gypsy/Ty3 group. The distinguishing characteristic between these groups is the order of the three protein domains — protease (PR), reverse transcriptase (RT), and integrase (IN) — encoded within the polymerase (pol) gene of the elements. The pol region of Copia/Ty1 elements has the order (PR, IN, RT) whereas the Gypsy/Ty3 group has the more familiar arrangement (PR, RT, IN), which is also the order found in retroviruses. Recently, a third major group of LTR retrotransposons has been described containing the BEL element from Drosophila melanogaster as well as the Cer7-12 elements of Caenorhabditis elegans (Bowen and McDonald 1999; Malik et al. 2000). The IN domain is also found downstream of the RT domain in this third group of LTR retrotransposons.

LTR retrotransposons and retroviruses are nearly identical in structure and are clearly related phylogenetically (Xiong and Eickbush 1988). The main distinguishing characteristic is that some LTR retrotransposons, such as Ty1 in yeast, do not contain an envelope gene, which renders retroviruses infectious. Many LTR retrotransposons, such as gypsy from D. melanogaster, however, do encode Envelope proteins and are infectious (Song et al. 1994). Therefore, LTR retrotransposons also serve as excellent models for the study of the evolution of infectious retroviruses. Previous large-scale analyses of the LTR retrotransposons of Saccharomyces cerevisiae (Jordan and McDonald 1998, 1999a,b; Kim et al. 1998), C. elegans (Bowen and McDonald 1999), Zea mays, and Hordeum vulgare (Shirasu et al. 2000) have provided novel insights into the molecular evolution and phylogenetic distribution of these retrotransposons.

Because the long terminal repeats of LTR retrotransposons are synthesized from a single template during reverse transcription, they are identical at the DNA sequence level on integration. Therefore, if the nucleotide substitution rate for the host DNA polymerase is known, the relative integration time or age of the element can be estimated from the level of sequence divergence existing between an element's LTRs. Previously, LTR nucleotide identity has been used to estimate the time of insertion of LTR retrotransposons from S. cerevisiae, Zea mays, and humans. For example, the age of the Ty1 and Ty2 elements from S. cerevisiae has been estimated to be <100,000 years old (Jordan and McDonald 1999b; Promislow et al. 1999). In contrast, it has been reported that the LTR retrotransposons within the ADH-region of the maize genome are much older, having transposed in the past 2 to 6 million years (SanMiguel et al. 1998). In a similar study, it has been reported that most human endogenous retroviruses (HERVs) inserted into the human genome long before humans diverged from the Old World monkeys, more than 25 million years ago (Tristem 2000).

In an initial effort to characterize all of the LTR retrotransposons within the genome of D. melanogaster, we report the annotation, phylogenetic analysis, and estimated ages of 178 full-length elements (i.e., those containing two intact LTRs and intervening coding regions) from the nonredundant sequence found in GenBank (Benson et al. 2000). Our results indicate that there are three major groups of LTR retrotransposons found within the D. melanogaster genome. We find that these three groups consist of over 20 individual families of elements and that each family of elements is composed of a group of highly homologous individual elements (~99% identity at the nucleotide level). We conclude that many LTR retrotransposons from each family have resulted from evolutionarily recent episodes of transpositional activity.

RESULTS

Isolation and Characterization of D. melanogaster LTR Retrotransposon Families from the Genome Sequence

The majority of the D. melanogaster genome sequence now available in GenBank is from the euchromatic regions of the genome (Adams et al. 2000). In contrast, only 2.5% of the genome sequence is derived from heterochromatic clones (Myers et al. 2000). Constitutive heterochromatin, which comprises roughly one-third of the D. melanogaster genome, is poorly represented in the genome sequence because these regions are not easily cloned into large inserts (Myers et al. 2000). Likewise, the assembly of DNA sequence from genomic regions that contain many tandemly arranged repetitive elements can result in the omission of internal sequences (E. Myers, pers. comm.). These issues are important to our study because D. melanogaster heterochromatin is thought to contain a substantial number of transposable elements (TEs) (Pimpinelli et al. 1995). Also LTR retrotransposons have been shown to exist in nested arrays in other species (SanMiguel et al. 1996a). Consequently, any LTR retrotransposons located in these regions of the genome are precluded from our analysis. Further sequencing and gap-filling efforts being conducted by Celera and the Berkeley Drosophila Genome Project (Myers et al. 2000) will likely identify additional elements within both the euchromatic and heterochromatic portions of the genome. Therefore, our results represent a large sampling of LTR retrotransposons from the euchromatin of D. melanogaster.

Following the initial characterization of each LTR retrotransposon (see Methods section), ClustalW (Thompson et al. 1997) was used to align the nucleotides of each element to known full-length LTR retrotransposons of D. melanogaster and other related organisms listed in Table Table1.1. Information concerning all elements identified previously can be obtained through Flybase (http://flybase.harvard.edu). This initial alignment was done to group elements into known and unknown families. The phylogram generated from this preliminary alignment is shown in Figure Figure1.1. For clarity, each family is labeled once followed by the number of elements in each family. Because of the low level of interfamily nucleotide sequence identity, this initial phylogram may not accurately represent all interfamily relationships, but it does allow us to classify elements into distinguishable groups. The long interfamily branches and the large cluster of nearly identical elements at the termini of the family lineages apparent in this initial phylogram indicate that most families of D. melanogaster LTR retrotransposons consist of a group of highly homologous elements. Subsequently, computed pairwise nucleotide identities confirmed this finding in that each family was found to consist of elements with average pairwise nucleotide identities of ≥99% (Table (Table2).2).

Table 1
LTR Retrotransposons Used to Categorize Elements from the Drosophila melanogaster Genome
Figure 1
Neighbor-joining phylogenetic tree of the ClustalW alignment of LTR retrotransposon nucleotide sequences. This tree indicates the high level of sequence homology within a family of elements. For clarity, the element families are listed once with the number ...
Table 2
Element Family Average Pairwise Nucleotide Identities

The nucleotide sequences of the LTR retrotransposons that did not group with known elements were translated and their RT motif was aligned to the RT of the known elements (Fig. (Fig.2A).2A). Only those RT motifs that were uninterrupted by frame shifts or stop codons were used in the characterization of novel families. This alignment was then used to generate pairwise amino acid identities. Consistent with criteria established previously (Bowen and McDonald 1999), if an element had a pairwise identity of <90% to a known RT, it was classified as a new family. These novel elements are shown in boldface type in the first column of Table Table3.3. Additional RT sequences from other Drosophila and invertebrate species were included in this analysis to ensure that novel elements from D. melanogaster did not represent previously characterized elements from other related species. Elements with RT pairwise identities >90% to a previously characterized element were given the name of that element followed by a number.

Figure 2Figure 2Figure 2
(A) Amino-acid alignment of the RT motif of Drosophila melanogaster LTR retrotransposons found in this study. The sequences were aligned using ClustalX as described in Methods. The seven conserved domains of RT (Xiong and Eickbush 1988) are indicated ...
Table 3
LTR Retrotransposons Characterized in This Study

All elements characterized in this study are listed individually in Table Table3.3. Included in this table are other distinguishing characteristics of the LTR retrotransposons, including their accession numbers, chromosomal locations, inverted terminal repeats (ITRs), direct terminal repeats (DTRs), LTR length, complete element length, and estimated age of each element (see below). The DTRs result from a duplication of the unoccupied insertion site following proviral or element insertion (Coffin et al. 1997). In our study, the DTRs served as internal controls for the assembly process following the whole genome shotgun sequencing of D. melanogaster (Myers et al. 2000). If proviral elements located at different loci were incorrectly assembled, they would contain a mixed set of DTRs. For the elements that contained unique DTR sequences, 93% were identical. The other 7% are either incorrectly assembled or are possibly the result of ectopic recombination between proviral elements at different loci. This hypothesis is currently under further investigation. In summary we identified 23 copia, six 17.6, 10 297, 18 412, four antonia, six blastopia, 21 blood, one burdock, two hamilton, eight HMS Beagle, four mazi, five mdg1, eight mdg3, one micropia, one nik, four nomad, 40 roo, six tirant, four transpac, two wolfman, and five unclassified elements (Table (Table4).4). The elements' combined lengths totaled 1, 279, 046 nucleotides or nearly 1% of the sequenced component of the genome.

Table 4
The Content of LTR-Retrotransposons Analyzed from the Drosophila melanogaster Genome in This Study

In general, the number of individual elements that we have characterized for each LTR retrotransposon family is consistent with average copy numbers estimated previously by in situ hybridization (Table (Table4).4). In situ hybridization also detects only those elements located in the polytenized, euchromatic component of the genome. For example, the most abundant element found is the roo element, which occupies, on average, 68 ± 14 sites within the polytene chromosomes of natural D. melanogaster populations (Vieira et al. 1999). We also found that roo was the most abundant element with at least 40 full-length copies in the genome of the sequenced D. melanogaster lab strain. In contrast, the least abundant elements in natural populations are 1731, gypsy, and zam. Each of these elements has, on average, less than two copies in natural populations (Vieira et al. 1999). We did not find any copies of these elements in our analysis of the D. melanogaster genome sequence. This is not surprising in that both gypsy and zam are known to be most abundant in constitutive heterochromatin and are in low abundance or absent from the euchromatic regions of some D. melanogaster strains (Pimpinelli et al. 1995; Baldrich et al. 1997).

Phylogenetic Characterization of D. melanogaster LTR Retrotransposons

The aligned RTs of all D. melanogaster LTR retrotransposons shown in Figure Figure2A2A were used to generate the phylogenetic trees presented in Figure Figure2,2, B and C. Additional RT sequences from other Drosophila and invertebrate species as well as D. melanogaster elements that were not identified in our analysis are also included in this phylogeny. The RT phylogeny indicates that there are three major groups of LTR retrotransposons within the D. melanogaster genome.

Copia Group

To date, members of the Copia group found in D. melanogaster include only copia and 1731. In our study, we did not find any representatives of the 1731 family. In one instance we found a copia element, copia-8, inserted into another element (mdg3-2). Similar composite insertions have been observed previously in maize (SanMiguel et al. 1996) and barley (Shirasu et al. 2000).

Gypsy Group

Previously characterized Gypsy group members that we identified in this study include 17.6, 297, 412, blastopia, blood, burdock, HMS Beagle, mdg1, mdg3, micropia, nomad, tirant, and transpac. Novel Gypsy group members first identified and named here are antonia, hamilton, nik, and wolfman. Five additional elements were identified that are closely related to Gypsy group elements and not characterized previously at the level of RT amino-acid identity. These elements are listed by accession number only in Figure Figure1.1. The RTs of these elements contain frame shifts or stop codons and are difficult to characterize (see above discussion). These five elements will require further analysis before they can be confidently placed phylogenetically with respect to their RT identity.

The Gypsy group found in D. melanogaster is composed of at least 20 different families that form three divergent clades seen in Figure Figure2,2, B and C. These clades all emerge from a central unsupported region deep within the Gypsy group. This is best illustrated in the unrooted phylogram shown in Figure Figure2B.2B. One clade is composed of the elements 412, blood, mdg1 , and the novel element we named wolfman. This clade is well supported with a bootstrap value of 100. These four element families are closely related and form a very tight cluster at the end of a long branch that separates them from the rest of the Gypsy group.

A second clade that is less well supported (bootstrap value= 63) is composed of the elements micropia, mdg3, and blastopia. In contrast to the previously described group, these three elements are very distantly related to each other as indicated by very long branch lengths leading to each element.

The third clade that is found within the Gypsy group is better supported with a bootstrap value of 71. This is the most abundant clade within the D. melanogaster genome and contains, to date, 13 different families of elements. This clade can be divided further into two well-supported lineages containing five and eight families each. In addition to gypsy, burdock, HMS Beagle, and nomad, the group of five contains one novel element we have named hamilton. Previously, only LTR nucleotide sequences were available for the HMS Beagle element. Here we describe the first full-length copies of this element family. HMS Beagle is most closely related to the yoyo element first characterized in the Mediterranean fruit fly, Ceratitis capitata.

As mentioned earlier, each LTR retrotransposon family that we have characterized consists of a group of nearly identical elements (≥99% identity at the nucleotide level). One exception to this is the HMS Beagle family, which contains elements that are highly related yet show some level of phylogenetic structure (Figure (Figure1).1). HMS Beagle elements consist of two well supported phylogenetic groups that share 97% RT identity at the amino acid level (Figure (Figure2C).2C). An additional phylogenetic comparison based on the entire DNA sequence of the HMS Beagle elements supports the conclusion that HMS Beagle elements consist of two well-defined subgroups (Fig. (Fig.3).3).

Figure 3
Unrooted neighbor joining phylogram of the nucleotide alignments of the HMS Beagle elements. Bootstrap values are shown on branches. The age of the individual elements as calculated by LTR sequence divergence is shown in parentheses following the name ...

The novel element found within the aforementioned group of five, hamilton, is most closely related to the D. melanogaster endogenous retrovirus gypsy. Interestingly, we found that the two members of the hamilton family of elements are tandemly duplicated and separated by a single LTR. LTR retrotransposons having this same duplicate structure have been identified previously in yeast (Roeder and Fink 1983) and flies (Csink and McDonald 1995) and are postulated to be the products of homologous recombination between two full-length elements within the LTR region.

The group of eight families within the third Gypsy group clade contains tirant, zam, idefix, 17.6, 297, transpac, and two novel elements we have named antonia and nik. antonia is most closely related to idefix and Tv1 from D. virilis. nik is most closely related to tirant, zam and TED. TED was first found integrated into the genome of a baculovirus in the cells of the cabbage looper, Trichoplusia ni (Friesen et al. 1986).

BEL Group

In addition to the Gypsy and Copia clades, there is a third well-supported clade (bootstrap value  = 100) that contains the BEL, mazi, and roo families.

The complete sequence of BEL has been published previously (Bell et al. 1985) and partial characterization of roo (first described as B104) has been reported (Scherer et al. 1982; Lerat and Capy 1999). mazi, however, is a novel element. This is the first phylogenetic characterization of both roo and mazi. Previously, BEL was the only described member from D. melanogaster belonging to this third group of LTR retrotransposons now referred to as the BEL group (Malik et al. 2000). Interestingly, the roo element seems to encode a single, long open reading frame (ORF) of 2360 amino acids that contains homology to all of the previously described motifs found in LTR retrotransposons and retroviruses (McClure 1991) (Fig. (Fig.4).4). In most instances a frame shift is present after the gag (group specific antigen ) gene to regulate differential expression of the gag and pol regions of LTR retrotransposon and retrovirus genomes.

Figure 4
Translation of roo reveals one long ORF. The characteristic amino acid motifs of Gag, Protease, RT, Ribonuclease H (Rnase H), and Integrase of LTR retrotransposons and retroviruses (McClure 1991) are shaded and labeled in the margin. The region containing ...

Aging the LTR-Retrotransposons of D. melanogaster

As described previously, LTR nucleotide identity can be used to estimate the time of integration (SanMiguel et al. 1998) of LTR retrotransposons and retroviruses. We have found that 125 of the LTR retrotransposons described here have identical LTRs, whereas the remaining 53 have low levels of nucleotide divergence. Identical LTRs indicate that the elements have inserted recently and have not had time to accumulate mutations between the LTRs. Using the synonymous substitution rate for Drosophila (Li 1997) of .016 substitutions per site per million years and the intra-element LTR divergence calculated here, we have calculated the integration time of the 53 elements with LTR nucleotide divergence. The average age of the remaining 30% (53/178) of the elements was found to be 137,000 ± 89,000 yr. These results are shown in Table Table33 and Figure Figure5A.5A. Our data indicate that all of the D. melanogaster LTR retrotransposons analyzed in this study have integrated within the last 500,000 years. Moreover, the level of divergence for most elements indicates integration times of <200,000 years.

Figure 5Figure 5
(A) Graph of LTR retrotransposon element ages of those elements that contain LTR nucleotide divergence values other than zero. (B) Graph of LTR retrotransposon family ages based on average pairwise identities of elements contained within a single family. ...

A second method for dating TEs is to calculate the average pairwise nucleotide identity across the complete sequences of the elements that are very closely related at the phylogenetic level (Kapitonov and Jurka 1996; Costas and Naveira 2000). The assumption underlying this method is that phylogenetically related elements are identical at the time of integration and have subsequently accumulated differences attributable to host DNA polymerase substitutions. This method also assumes that no homogenization of the element sequences by molecular mechanisms related to gene conversion has occurred subsequent to their integration. Most elements we characterized were found to contain unique flanking sequences in the DTRs (see above). This indicates that gene conversion has not affected any of the sequence directly adjacent to the elements since their insertion. Although it is a formal possibility that gene conversion may have some role in homogenizing repetitive sequences, available data indicate that the magnitude of its influence is not sufficient to account for the degree of similarity we observe (Nevo-Caspi and Kupiec 1996). We analyzed each independent family of elements using this second method. The results of this independent method of aging elements also indicate that the full-length D. melanogaster LTR retrotransposons have integrated within the last 500,000 years (Table (Table2;2; Fig. Fig.55B).

Therefore, both available methods of computing the age of LTR retrotransposon integration are consistent and indicate that many full-length LTR retrotransposons in D. melanogaster are much younger than the age of the genome in which they reside. The estimated divergence time of D. melanogaster from its closest relative D. simulans is 2.3 ± .3 million years ago (Li et al. 1999).

DISCUSSION

We have identified 178 full-length LTR retrotransposons from the sequenced, euchromatic component of the D. melanogaster genome. We have characterized the D. melanogaster LTR retrotransposons phylogenetically with respect to other known LTR retrotransposon families. In doing so, we have identified five novel families of LTR retrotransposons within the genome of D. melanogaster that we have named antonia, hamilton, mazi, nik, and wolfman. Four of these elements fall into the canonical Gypsy group of LTR retrotransposons. mazi groups with a third well-defined group of LTR retrotransposons present within the genome of D. melanogaster. Also found within this third group is the abundant element roo, which we found encodes a single polyprotein that contains all of the enzymes necessary for LTR retrotransposon replication. We have previously characterized six families of elements from C. elegans (Cer7-12) belonging to this newly defined third clade (Bowen and McDonald 1999), which also contains Pao from Bombyx mori and Tas from Ascaris lumbricoides (Xiong et al. 1993). This group is most closely related in structure to the Gypsy group of elements in that its integrase gene is found downstream or 3′ of reverse transcriptase. In the Copia group, integrase is found upstream or 5′ of reverse transcriptase. Judging from its almost equal phylogenetic distance from both Copia and Gypsy groups, however, this third clade likely diverged at or near the time of divergence of the Copia and Gypsy groups and represents an ancient group of LTR retrotransposons. Additional elements belonging to this third clade have since been characterized from the genomes of Anopheles mosquitoes (Cook et al. 2000). Even more recently, it has been claimed that elements belonging to this clade have been identified in the pufferfish Fugu rubripes, the ascidian urochordate Ciona intestinalis, and the blood fluke Schistosoma mansoni (Malik et al. 2000). Therefore, this third major group of LTR retrotransposons is likely to be widespread within the metazoan lineage.

Most LTR retrotransposons and retroviruses contain at least one translational frame shift following the gag gene to regulate the necessary overproduction of Gag relative to the other element proteins (Coffin et al. 1997). In addition to roo, other elements with single ORFs include copia from D. melanogaster as well as the Gypsy group members Cer1 from C. elegans (Britten 1995) and Tf1 from Schizosaccharomyces pombe (Levin et al. 1990). In the case of Tf1, a differential protein degradation process regulates the overproduction of Gag (Atwood et al. 1996). The presence of a long, single ORF in the roo element (Fig. (Fig.4)4) indicates that this characteristic is present within all three major groups of LTR retrotransposons.

Perhaps the most intriguing result to appear from our study is the fact that the D. melanogaster genome contains many families of full-length LTR retrotransposons, all of which have been transpositionally active in the very recent evolutionary past. Interestingly, this finding is similar to what has been observed previously for the LTR retrotransposons in S. cerevisiae (Jordan and McDonald 1998, 1999b) and C. elegans (Bowen and McDonald 1999). As shown in our results, the age of the full-length LTR retrotransposons in the D. melanogaster genome is substantially younger than the melanogaster species itself. Interestingly, the average ages of all full-length LTR retrotransposons in yeast (<100,000 yr) (Promislow et al. 1999) and nematode (<500,000 yr) (N. Bowen, unpubl.) are also much younger than the age of the species in which they are contained. In contrast to these findings, it has been reported using the same criteria we have used here that several full-length LTR retrotransposons within the ADH-region of the maize genome are much older, having transposed in the past two to six million years (SanMiguel et al. 1998). Likewise, the average age of full-length HERVs (>25 million years) (Tristem 2000) is significantly older than the age of the human species (4–6 million years) (Yang 1996; Goodman et al. 1998).

One possible explanation for these contrasting comparisons may be differential genome size constraints placed on these species. In this regard, Adrian Bird (Bird 1995) has postulated that large increases in genome size are necessarily associated with increases in informational noise. Bird believes that the evolution of global epigenetic control mechanisms, such as methylation, were prerequisite to the significant expansions in genome size observed over the evolutionary history of higher eukaryotes. Although methylation is known to have a key role in the silencing of LTR retrotransposons in plants and vertebrates (Yoder et al. 1997), it appears to be lacking this function in many invertebrate species, including yeast, nematodes, and Drosophila (Russo et al. 1996). We believe that it may be for reasons such as this that full-length LTR retrotransposons have not accumulated over evolutionary time within these invertebrate genomes. As a consequence of the lack of methylation-mediated silencing in invertebrates, there would be strong selective pressure to eliminate LTR retrotransposons from these genomes.

Evidence has been presented that supports the existence of an active mechanism for the deletion of TEs in S. cerevisiae (Jordan and McDonald 1999b) and Drosophila (Petrov et al. 1996). Numerous solo LTRs exist in the S. cerevisiae genome as the result of intra-element LTR recombination, which serves to eliminate Ty elements from the host's genome (Jordan and McDonald 1999b). In D. melanogaster, as well as in other Drosophila species, DNA deletions of <400 bp are thought to occur at an astonishingly high rate within the genome, leading to a very high incidence of DNA loss (Petrov and Hartl 1997). The level of DNA loss attributable to deletions in Drosophila is estimated to be 75 times higher than that produced by deletions in mammals (Petrov and Hartl 1997). Consistent with this hypothesis, many of the elements that we have characterized contain sequence deletions when compared to the length of the canonical elements found in the public database (Tables (Tables11 and and3).3). For example, every 17.6 element that we characterized from the D. melanogaster genome is shorter than the 7439 bp reported for the canonical 17.6 element (Saigo et al. 1984). These active processes that eliminate elements from genomes supply selective pressure for these elements to continually replicate or risk elimination (Jordan and McDonald 1999b). In turn, this results in only young, full-length elements within these genomes. Our results indicate that the full-length elements from the melanogaster genome are very young. Further support that genome size constraints can limit the accumulation of older retrotransposons comes from the recent characterization of BARE-1 insertion patterns in Hordeum spontaneum (barley) (Kalendar et al. 2000). These authors have shown that there is a positive correlation between full-length BARE-1 elements and increased genome size in barley. They further suggest that, if needed, selection for increased genome size can be regulated by limiting the amount of intra-element LTR recombination as described above for S. cerevisiae.

A final question concerns the immediate source of the full-length LTR retrotransposons present within the D. melanogaster genome. One possibility is that the full-length LTR retrotransposons are descendants from older elements that have been actively eliminated from the D. melanogaster genome or from older elements sequestered within the yet to be sequenced heterochromatin (see above discussions). An additional possibility is that the LTR retrotransposons currently present in the melanogaster genome have derived from elements recently introduced from other species via horizontal transfer. Recent analyses of specific families of Drosophila LTR retrotransposons indicate that horizontal transfer of LTR retrotransposons can occur (Jordan et al. 1999; Terzian et al. 2000). The extent of horizontal transfer and the degree to which it may have contributed to the overall composition of LTR retrotransposons that are present within the D. melanogaster genome remains to be determined.

Subsequent to the submission of this manuscript for publication, others (Frame et al. 2001) have reported a similar phylogenetic characterization for the members of the BEL clade. In their report, the element Tinker is identical to the element family that we call mazi. Similarly, a database of repetitive elements including a section for Drosophila has been made available by Genetic Information Research Institute (Jurka 2000) in which individual members of the families that we identify as antonia, hamilton, mazi, nik, and wolfman have been given the names Quasimodo, Gtwin, Diver, Gypsy5, and Tabor, respectively.

METHODS

Genome Query

Searches of the entire sequenced component of the D. melanogaster genome (using Advanced BLAST, http://www.ncbi.nlm.nih.gov/blast/blast.cgi) were initiated by performing TBLASTN (Altschul et al. 1997) searches using the RT amino-acid sequence of the Drosophila LTR retrotransposons BEL (U23420), copia (M11240), and mdg3 (X95908). Based on preliminary phylogenies we have constructed using the RT amino acid sequences of D. melanogaster LTR retrotransposons characterized previously, these three elements were chosen to represent the most divergent lineages. Nucleotide sequences with homology to the RTs were then subjected to a dot matrix (see below) analysis to reveal the presence of LTR sequences. Accession numbers that did not contain LTRs (as revealed by dot matrix analysis) were not included for further characterization. The characteristic ITRs as well as the DTRs that flank the LTRs were identified (Coffin et al. 1997). The region between LTRs was then translated to reveal coding sequences. Subsequently, the RT of each identified element was used to query the genome until all queries produced TBLASTN hits that overlapped into other element families.

Element Characterization

Each accession number containing a match to RT was retrieved from NCBI and ~10,000 bp on each side of the TBLASTN hit were subjected to further analysis. Sequences were characterized using SeqLab: The Graphical User Interface to the Wisconsin Package (GCG 1999), maintained, and made accessible by the Research Computing Resource (RCR) at the University of Georgia (UGA) (http://www.rcr.uga.edu/biosci/home.html). The dot matrix program COMPARE was used to identify regions of identity within each sequence. DOTPLOT was used to visualize the dot matrixes generated with COMPARE. LTRs appeared as a line offset from and parallel to the identity diagonal. The terminal direct repeats were characterized from the flanking sequences of the LTRs. The terminal dinucleotides of each element LTR were also identified. RT motif amino-acid sequences of each element and the polyprotein of roo were predicted using TRANSLATE.

Multiple Sequence Alignments and Phylogenetic Analyses

Se-Al (courtesy of Andrew Rambaut, ku.ca.xo.ooz@tuabmar.werdna) was used for multiple sequence file format manipulation and labeling. The ClustalW (Thompson et al. 1997) extension to SeqLab (GCG 1999) and ClustalX (Thompson et al. 1997) were used to generate nucleotide and amino acid alignments as described previously (Bowen and McDonald 1999). The seven conserved domains of the RT motif (Xiong and Eickbush 1988), also known as the RT ordered series of motifs (OSM) (Hudak and McClure 1999), are shown boxed in Figure Figure1A.1A. Amino-acid and nucleotide alignment files may be obtained from the authors by request. PHYLIP (Felsenstein 1993) was used for distance calculation, tree production and bootstrap analysis. Phylogenetic analyses were performed on the multiple sequence alignments using distance methods employed by PHYLIP (Felsenstein 1993). The PRODIST program of PHYLIP, employing the Categories model, was used to generate distance matrices that were analyzed with the NEIGHBOR program to generate neighbor-joining tree files. SEQBOOT was also used to generate 100 data replicates that were subsequently analyzed with PRODIST (Categories model), followed by NEIGHBOR, and finally with CONSENSE to generate an unrooted bootstrapped tree as presented in Figure Figure2B.2B. The phylogram presented in Figure Figure2C2C was rooted with the 1731 and copia elements. All trees generated were visualized with TreeViewPPC version 1.5.3 (Page 1996).

LTR Retrotransposon Age Calculation

PAUP (Swofford 1999) was used to calculate intra-element LTR identities and entire element family pairwise identities using the Kimura-2 parameter method. Ages were calculated using the formula T = K/2‘r’, where T = time of divergence, K =divergence, and r= substitution rate (Li 1997). The average synonymous or silent site substitution rate used was .016 substitutions per site per million years as calculated by E.N. Moriyama from 39 genes between the melanogaster and obscura groups where the time of divergence was set at 30 million years ago (Li 1997).

Acknowledgments

We are grateful to Drs. John Avise, Susan Wessler, Kelly Dawe, Michael Bender, and members of our laboratory for comments on earlier drafts of this manuscript. We thank Maney Mazloom for assistance in searching Genbank for Drosophila elements. This work was supported by a National Institutes of Health grant to J.F.M.

The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.

Footnotes

E-MAIL ude.agu.sehcra@enegcm; FAX (706) 542-3910.

Article published on-line before print: Genome Res., 10.1101/gr.164201.

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.164201.

REFERENCES

  • Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al. The genome sequence of Drosophila melanogaster. Science. 2000;287:2185–2195. [PubMed]
  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
  • Atwood A, Lin JH, Levin HL. The retrotransposon Tf1 assembles virus-like particles that contain excess Gag relative to integrase because of a regulated degradation process. Mol Cell Biol. 1996;16:338–346. [PMC free article] [PubMed]
  • Baldrich E, Dimitri P, Desset S, Leblanc P, Codipietro D, Vaury C. Genomic distribution of the retrovirus-like element ZAM in Drosophila. Genetica. 1997;100:131–140. [PubMed]
  • Bell JR, Bogardus AM, Schmidt T, Pellegrini M. A new copia-like transposable element found in a Drosophila rDNA gene unit. Nucleic Acids Res. 1985;13:3861–3871. [PMC free article] [PubMed]
  • Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Rapp BA, Wheeler DL. GenBank. Nucleic Acids Res. 2000;28:15–18. [PMC free article] [PubMed]
  • Bird AP. Gene number, noise reduction and biological complexity. Trends Genet. 1995;11:94–100. [PubMed]
  • Boeke JD, Stoye JP. Retrotransposons, endogenous retroviruses, and the evolution of retroelements. In: Coffin JM, Hughes SH, Varmus H, editors. Retroviruses. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press; 1997.
  • Bowen NJ, McDonald JF. Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-like elements. Genome Res. 1999;9:924–935. [PubMed]
  • Britten RJ. Active gypsy/Ty3 retrotransposons or retroviruses in Caenorhabditis elegans. Proc Natl Acad Sci. 1995;92:599–601. [PMC free article] [PubMed]
  • Coffin JM, Hughes SH, Varmus HE. Retroviruses. NY: Cold Spring Harbor Laboratory Press; 1997.
  • Cook JM, Martin J, Lewin A, Sinden RE, Tristem M. Systematic screening of Anopheles mosquito genomes yields evidence for a major clade of Pao-like retrotransposons. Insect Mol Biol. 2000;9:109–117. [PubMed]
  • Costas J, Naveira H. Evolutionary history of the human endogenous retrovirus family ERV9. Mol Biol Evol. 2000;17:320–330. [PubMed]
  • Csink AK, McDonald JF. Analysis of copia sequence variation within and between Drosophila species. Mol Biol Evol. 1995;12:83–93. [PubMed]
  • Deininger PL, Batzer MA. Alu repeats and human disease. Mol Genet Metab. 1999;67:183–193. [PubMed]
  • Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.5c. Seattle, WA: Dept. of Genetics, University of Washington; 1993.
  • Frame IG, Cutfield JF, Poulter RT. New BEL-like LTR-retrotransposons in Fugu rubripes, Caenorhabditis elegans, and Drosophila melanogaster. Gene. 2001;263:219–230. [PubMed]
  • Friesen PD, Rice WC, Miller DW, Miller LK. Bidirectional transcription from a solo long terminal repeat of the retrotransposon TED: Symmetrical RNA start sites. Mol Cell Biol. 1986;6:1599–1607. [PMC free article] [PubMed]
  • Genetics Computer Group (GCG ) Wisconsin Package Version 10.0. 1999.
  • Goodman M, Porter CA, Czelusniak J, Page SL, Schneider H, Shoshani J, Gunnell G, Groves CP. Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Mol Phylogenet Evol. 1998;9:585–598. [PubMed]
  • Green MM. Mobile DNA elements and spontaneous gene mutation. In: Lambert ME, McDonald JF, Weinstein IB, editors. Eukaryotic transposable elements as mutagenic agents. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory; 1988. pp. 41–50.
  • Hudak, J. and McClure, M.A. 1999. A comparative analysis of computational motif-detection methods. Pac. Symp. Biocomput. 138–149. [PubMed]
  • Jordan IK, Matyunina LV, McDonald JF. Evidence for the recent horizontal transfer of long terminal repeat retrotransposon. Proc Natl Acad Sci. 1999;96:12621–12625. [PMC free article] [PubMed]
  • Jordan IK, McDonald JF. Evidence for the role of recombination in the regulatory evolution of Saccharomyces cerevisiae Ty elements. J Mol Evol. 1998;47:14–20. [PubMed]
  • ————— Phylogenetic perspective reveals abundant Ty1/Ty2 hybrid elements in the Saccharomyces cerevisiae genome. Mol Biol Evol. 1999a;16:419–422. [PubMed]
  • ————— Tempo and mode of Ty element evolution in Saccharomyces cerevisiae. Genetics. 1999b;151:1341–1351. [PMC free article] [PubMed]
  • Jurka J. Repbase update: A database and an electronic journal of repetitive elements. Trends Genet. 2000;16:418–420. [PubMed]
  • Kalendar R, Tanskanen J, Immonen S, Nevo E, Schulman AH. From the cover: Genome evolution of wild barley (Hordeum spontaneum) by BARE-1 retrotransposon dynamics in response to sharp microclimatic divergence. Proc Natl Acad Sci. 2000;97:6603–6607. [PMC free article] [PubMed]
  • Kapitonov V, Jurka J. The age of Alu subfamilies. J Mol Evol. 1996;42:59–65. [PubMed]
  • Kidwell MG, Lisch D. Transposable elements as sources of variation in animals and plants. Proc Natl Acad Sci. 1997;94:7704–7711. [PMC free article] [PubMed]
  • Kim JM, Vanguri S, Boeke JD, Gabriel A, Voytas DF. Transposable elements and genome organization: A comprehensive survey of retrotransposons revealed by the complete Saccharomyces cerevisiae genome sequence. Genome Res. 1998;8:464–478. [PubMed]
  • Lerat E, Capy P. Retrotransposons and retroviruses: Analysis of the envelope gene. Mol Biol Evol. 1999;16:1198–1207. [PubMed]
  • Levin HL, Weaver DC, Boeke JD. Two related families of retrotransposons from Schizosaccharomyces pombe [published Erratum appears in Mol. Cell. Biol. April, 1991. 11(4): 2334] Mol Cell Biol. 1990;10:6791–6798. [PMC free article] [PubMed]
  • Li W. Molecular Evolution. Sunderland, MA: Sinauer; 1997.
  • Li YJ, Satta Y, Takahata N. Paleo-demography of the Drosophila melanogaster subgroup: Application of the maximum likelihood method. Genes Genet Syst. 1999;74:117–127. [PubMed]
  • Malik HS, Henikoff S, Eickbush TH. Poised for contagion: Evolutionary origins of the infectious abilities of invertebrate retroviruses. Genome Res. 2000;10:1307–1318. [PubMed]
  • McClure MA. Evolution of retroposons by acquisition or deletion of retrovirus-like genes. Mol Biol Evol. 1991;8:835–856. [PubMed]
  • McDonald JF. Macroevolution and retroviral elements. Bioscience. 1990;40:183–191.
  • ————— Evolution and consequences of transposable elements. Curr Opin Genet Dev. 1993;3:855–864. [PubMed]
  • Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA, et al. A whole-genome assembly of Drosophila. Science. 2000;287:2196–2204. [PubMed]
  • Nevo-Caspi Y, Kupiec M. Induction of Ty recombination in yeast by cDNA and transcription: Role of the RAD1 and RAD52 genes. Genetics. 1996;144:947–955. [PMC free article] [PubMed]
  • Page RD. TreeView: An application to display phylogenetic trees on personal computers. Comput Appl Biosci. 1996;12:357–358. [PubMed]
  • Petrov DA, Hartl DL. Trash DNA is what gets thrown away: High rate of DNA loss in Drosophila. Gene. 1997;205:279–289. [PubMed]
  • Petrov DA, Lozovskaya ER, Hartl DL. High intrinsic rate of DNA loss in Drosophila. Nature. 1996;384:346–349. [PubMed]
  • Pimpinelli S, Berloco M, Fanti L, Dimitri P, Bonaccorsi S, Marchetti E, Caizzi R, Caggese C, Gatti M. Transposable elements are stable structural components of Drosophila melanogaster heterochromatin. Proc Natl Acad Sci. 1995;92:3804–3808. [PMC free article] [PubMed]
  • Promislow DE, Jordan IK, McDonald JF. Genomic demography: A life-history analysis of transposable element evolution. Proc R Soc Lond B Biol Sci. 1999;266:1555–1560. [PMC free article] [PubMed]
  • Roeder GS, Fink GR. Transposable elements in yeast. In: Shapiro JA, editor. Mobile genetic elements. New York: Academic Press; 1983. pp. 299–328.
  • Russo VEA, Martienssen RA, Riggs AD. Epigenetic mechanisms of gene regulation. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press; 1996.
  • Saigo K, Kugimiya W, Matsuo Y, Inouye S, Yoshioka K, Yuki S. Identification of the coding sequence for a reverse transcriptase-like enzyme in a transposable genetic element in Drosophila melanogaster. Nature. 1984;312:659–661. [PubMed]
  • SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, Bennetzen JL. Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996;274:765–768. [PubMed]
  • SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nature Genet. 1998;20:43–45. [PubMed]
  • Scherer G, Tschudi C, Perera J, Delius H, Pirrotta V. B104, a new dispersed repeated gene family in Drosophila melanogaster and its analogies with retroviruses. J Mol Biol. 1982;157:435–451. [PubMed]
  • Shirasu K, Schulman AH, Lahaye T, Schulze-Lefert P. A contiguous 66-kb barley DNA sequence provides evidence for reversible genome expansion. Genome Res. 2000;10:908–915. [PMC free article] [PubMed]
  • Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999;9:657–663. [PubMed]
  • Song SU, Gerasimova T, Kurkulos M, Boeke JD, Corces VG. An env-like protein encoded by a Drosophila retroelement: Evidence that gypsy is an infectious retrovirus. Genes & Dev. 1994;8:2046–2057. [PubMed]
  • Swofford DL. PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods). Sunderland, MA: Sinaver Associates; 1999.
  • Terzian C, Ferraz C, Demaille J, Bucheton A. Evolution of the Gypsy endogenous retrovirus in the Drosophila melanogaster subgroup. Mol Biol Evol. 2000;17:908–914. [PubMed]
  • Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25:4876–4882. [PMC free article] [PubMed]
  • Tristem M. Identification and characterization of novel human endogenous retrovirus families by phylogenetic screening of the human genome mapping project database. J Virol. 2000;74:3715–3730. [PMC free article] [PubMed]
  • Vieira C, Lepetit D, Dumont S, Biemont C. Wake up of transposable elements following Drosophila simulans worldwide colonization. Mol Biol Evol. 1999;16:1251–1255. [PubMed]
  • Xiong Y, Eickbush TH. Similarity of reverse transcriptase-like sequences of viruses, transposable elements, and mitochondrial introns. Mol Biol Evol. 1988;5:675–690. [PubMed]
  • Xiong Y, Burke WD, Eickbush TH. Pao, a highly divergent retrotransposable element from Bombyx mori containing long terminal repeats with tandem copies of the putative R region. Nucleic Acids Res. 1993;21:2117–2123. [PMC free article] [PubMed]
  • Yang Z. Maximum-likelihood models for combined analyses of multiple sequence data. J Mol Evol. 1996;42:587–596. [PubMed]
  • Yoder JA, Walsh CP, Bestor TH. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 1997;13:335–340. [PubMed]

Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...