• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Nov 24, 2009; 106(47): 19922–19927.
Published online Nov 19, 2009. doi:  10.1073/pnas.0908008106
PMCID: PMC2785268

Distribution, diversity, evolution, and survival of Helitrons in the maize genome


Homology and structure-based approaches were used to identify Helitrons in the genome of maize inbred B73. A total of 1,930 intact Helitrons from eight families (62 subfamilies) and >20,000 Helitron fragments were identified, accounting for ≈2.2% of the B73 genome. Transposition of at least one of these families is ongoing, but the most prominent burst of amplification activity was ≈250,000 years ago. Sixty percent of maize Helitrons were found to have captured fragments of nuclear genes (≈840 different fragment acquisitions, with tens of thousands of predicted gene fragments inside Helitrons within the B73 assembly). Most acquired gene fragments are undergoing random drift, but 4% were calculated to be under purifying selection, whereas another 4% exhibit apparent adaptive selection, suggesting beneficial effects for the host or Helitron transposition/retention. Gene fragment capture is frequent in some Helitron subfamilies, with as many as 10 unlinked genes providing DNA inserts within a single element. Gene fragment acquisition appears to positively influence element survival and/or ability of the Helitron to acquire additional gene fragments. Helitrons with gene fragment captures in the antisense orientation have a lesser chance of survival. Helitron distribution in maize exhibits severe biases, including preferential accumulation in relatively gene-rich regions. Insertions, however, are not usually found inside genes. Rather, Helitrons preferentially insert near (but not into) other Helitrons. This biased accumulation is not caused by a preference for cis or nearby transposition, suggesting a specific association between Helitron integration functions and unknown chromatin characteristics that specifically mark Helitrons.

Keywords: exon shuffling, gene fragment acquisition, genome evolution, insertion specificity, transposable elements

The Helitrons are a class of transposable elements (TEs) that were initially discovered by computational analysis of repetitive nuclear DNA in the model plant Arabidopsis thaliana (1). Subsequent studies have shown that Helitrons are broadly distributed in eukaryotes, including in all studied plants (2). They are characterized by a 5′ TC terminus and a 3′ CTRR terminus accompanied by a predicted small hairpin structure near the 3′ end. Helitrons preferentially insert into the dinucleotide AT, but they do not generate target site duplications. Some elements encode Rep/helicase-like and/or RPA-like proteins that are believed to be involved in the transposition process. The bacterial IS91 element also encodes a Rep/helicase protein and is known to transpose via a rolling circle process (3, 4), so it is expected that Helitrons use the same mechanism for amplification and insertion within eukaryotic genomes.

Plant Helitrons often capture gene fragments. Sometimes, fragments from multiple genes that normally reside in unlinked chromosomal locations are found inside individual elements (513). This gene fragment acquisition is known to occur at the DNA level because contiguous introns and exons are found within the acquired DNA. The captured fragments are usually small, but more than one case has been observed where a full gene has been captured by a Helitron in maize (6, 11). Although the mechanism of gene fragment acquisition is not known, it usually [≈88% of the time in maize (14), based on a small number of Helitrons] involves capture of fragments in the same transcriptional orientation as the Helitron gene encoding Rep/helicase. Some of these Helitrons produce chimeric transcripts where acquired introns are spliced, sometimes alternatively, and the junctions between fragments are occasionally processed as crude de novo introns (6, 13). Hence, the fusion and expression of different gene fragments that are catalyzed by Helitrons have many of the properties of exon shuffling, a mechanism proposed for the creation of the first multidomain genes (15). In maize, >4,000 gene fragment acquisitions were predicted to be within Helitrons in a single maize inbred (5), and dozens more have been seen in other plant species (2), suggesting that the creation of novel gene candidates might be a very active process in plants.

In all of the plant and animal species investigated, Helitrons have been found to show significant biases in structure and distribution. The elements in nematodes, rice, and Arabidopsis accumulate primarily in gene-poor regions, although this bias is less dramatic in the rice genome (2). The predicted hairpin loops near the 3′ end of all Helitrons exhibit a very high predicted melting temperature (Tm), but less so in the grasses rice and sorghum than in the eudicots Arabidopsis and Medicago truncatula (2).

Genomic copy numbers and diversity in Helitrons are quite variable. For instance, the moss Physcomitrella patens has only one identified family consisting of eight very similar members (suggesting very recent transposition) (16). In dramatic contrast, vesper bats contain highly abundant Helitrons (for instance, >100,000 per haploid genome in Myotis lucifugus), but they appear to be missing from all other orders of placental mammals that have been investigated (17). In limited analyses in flowering plants (angiosperms), four relatively small genomes have been found to each contain several hundred elements from at least 10 families per species, with some of these families (defined by a unique 3′ end structure) found in most of these taxa (2).

The recent full genome sequence analysis of the nuclear genome in maize inbred B73 has provided the opportunity for comprehensive discovery of Helitrons in this important plant species (18). Helitrons in maize have been very recently active, as suggested by their association with mutations in sh2 and ba1 genes (6, 7) and their numerous presence/absence polymorphisms across maize haplotypes (5, 810). Hence, maize should be an ideal organism for the further study of Helitron evolution and function. We used a structure-based approach (2) to find Helitrons in maize, uncovering ≈2,000 intact elements and many thousands of Helitron fragments that together comprise >2% of the genome (18). This article describes the discovery and analysis process for these Helitrons in maize and the properties of the identified elements. The results indicate unexpected specificities in both action and evolution and emphasize the exceptional role that Helitrons play in the rearrangement and enrichment of eukaryotic genomes.

Results and Discussion

Helitron Identification and Classification.

The BAC-by-BAC sequence data for the maize nuclear genome in inbred B73, covering ≈90% of that genome, was analyzed one BAC at a time, encompassing ≈2.05 Gb of analyzed DNA (18). Initially, a structure-based approach was used to search for Helitrons in this sequence. The program used, HelSearch (2), initially identifies only intact elements (with at least two intact copies, so ends can be verified). A total of 1,923 intact elements were found in this manner (see Dataset S1 for details). Each element was inspected manually, and all were confirmed Helitrons, so the false positive rate in this analysis was zero. Seven intact Helitrons identified by earlier studies (5, 7, 8, 13) were not found in this structure-based search because only one intact copy was present in the assembled sequences. When combined, the resulting 1,930 intact Helitrons account for 6.2 Mb of nuclear DNA in the B73 reference sequence (18).

Intact Helitrons were classified into eight families based on the presence of a unique most-terminal 3′ end, with two elements requiring at least 80% identity to be considered members of the same family (2). Members of a family that had very different 5′ ends were classified into subfamilies, with sequences sharing the most 5′-terminal 30 bp (at least 80% identity) classified as members of the same subfamily. This definition of families and subfamilies is based on two primary criteria: namely, the >80% homology rule for family designation that a consortium of plant TE researchers has concluded is the appropriate threshold for distinguishing families (19) and the fact that the ends of TEs of all types are the primary sites where activating factors (e.g., transposases) determine the specificity of the unique family each unique activating protein will mobilize and/or integrate. Of these eight families (62 subfamilies), five were previously unknown in maize (Table 1). The family we named Hip is the most abundant, with 1,897 elements that make up 98% of the total intact elements, including the previously identified Helitrons inserted into sh2–7527 and ba1-ref, and those initially named HelA, McC_bz1_GHI, B73_9002_NOPQ, B73_14578 and Hel-BSSS53-z1C1 (59, 13).

Table 1.
Maize Helitron families

We compared our subfamilies with the results of Du et al. (41) (Dataset S2). Eight of our subfamilies were not found in the Du et al. database, whereas 35 of our identified subfamilies were unnamed in their analysis and only found in their medium- or high-quality output of Helitronfinder. Four of the subfamilies in Du et al. (41) were not found in our analysis.

Like previously investigated genomes (2), maize has some unique Helitron families and some shared families. Hip, the largest family in maize, is also the most numerous family in sorghum and the second most abundant in rice. Hip is the largest family in Medicago, whereas it is a low-copy-number family in Arabidopsis. However, the dominance of Hip in maize far exceeds the relative dominance of any Helitron family in any other investigated angiosperm genome, where the most abundant intact Helitrons of any single family make up only 42–70% of the total intact Helitrons. All of the other maize Helitron families that were identified have not yet been found in other genomes. Sorghum and rice share some families, such as Hair, Hole, Hoke, and Hoy (2), but no intact elements of these families were found in maize. Hole, Hoke, and Hoy are low-copy-number families, whereas Hair is a high-copy-number family, having >100 intact elements in both sorghum and rice. The 5′ and/or 3′ ends of some of these families from rice and sorghum do exist in the maize genome, but intact elements were not identified, so we could not guarantee that the maize end homologies were evidence of true Helitrons. Although some elements from different genomes belong to the same family, this does not necessarily mean that they are the most related elements, because families are defined by a shared 3′ end sequence. Because of dramatic changes in the 3′ end (for instance, new end sequence acquisition) (2), some Helitrons have evolved very recently into what we now call different families. These shared Helitron families in rice, sorghum, and maize are not per se proof of a common ancestry in an ancient grass progenitor, and they do not contain highly similar internal regions suggestive of very recent horizontal transfer. The results suggest a highly conserved set of 3′ end sequences that are needed for Helitron function, but repeated convergent evolution to these sequences is also possible. Further analysis across a broader range of grass species will help resolve this issue.

Counting the total number of 5′ and 3′ ends for these eight families in the B73 maize dataset yielded a respective ≈22,000 and ≈21,000 apparent ends. This count provides an estimate of the total number of Helitrons, intact plus fragmented, of >22,000. These results suggest a very high ratio of fragmented to intact elements, which is at least partly an artifact of the short length of contiguous sequences (contigs) in the current version of the B73 reference sequence (18). Any of the numerous errors in order or orientation of contigs within a BAC sequence scaffold could lead to the misinterpretation of an intact Helitron as two or more fragments of Helitrons.

Because TEs of different types can insert into each other, it was often not clear whether the DNA inside an identified Helitron was Helitron-specific or representative of an unrelated TE. Hence, maize TE databases for LTR retrotransposons and cut-and-paste DNA transposons were obtained (18). These databases were used to remove all of these types of TEs from the maize Helitron database. The resultant Helitron database was used in a BLAST-based search of the entire maize genomic sequence. Homologies with at least 100 contiguous base pairs of at least 80% identity were identified. In addition, a BLASTX search for Helitron-specific helicases was performed with a minimum expect value of e-10. The results from both BLAST searches were combined, and redundancy was removed. From these analyses, Helitrons were found to contribute at least 2.2% of the maize nuclear genome in line B73, a total of 45.5 Mb in the 2045 Mb of the genome that has been sequenced (18). Because the HelSearch analysis requires two intact elements in a single subfamily before the existence of a Helitron can be confirmed, many Helitron subfamilies or families with only a single intact member or no intact members might be missed. Hence, linear regression analysis was performed to estimate the number of intact single-copy elements that are likely to have been missed (Fig. S1) (2). By this approach, it can be estimated that there are ≈20 intact single-copy subfamilies of Helitrons in the maize genome that we did not identify, and it provides an interesting comparison to the 62 Helitron subfamilies discovered with more than one intact member. Hence, as in previous angiosperm investigations (2), it is clear that a significant amount of Helitron diversity was missed by this and all previous studies, but that the great majority of Helitron genomic contributions have been identified in maize by discovery of the multicopy element families and subfamilies.

As observed in other species (2), the 3′ hairpins of maize Helitrons average a much higher predicted Tm than that predicted for the full complement of similar hairpins in the maize genome sequence (Fig. S2). Unlike the full set of predicted hairpins, the maize Helitron hairpins do not exhibit a normal distribution, suggesting a strong selection for a high complementarity and GC content in these components of Helitron structure.

Helitron Divergence.

LTR retrotransposon insertion dates can be estimated by the degree of divergence of their two LTRs (2022) because they should be 100% identical at the time of insertion (20). A similar approach was used to estimate the amplification time of intact Helitrons, relying not on conserved end homology but on the relatedness of different copies. By an all-by-all BLAST search of all intact Helitrons, the most related elements were identified. These are most likely to be derived from each other or from a shared common ancestor at a more recent time than from Helitrons that are less similar in sequence (23, 24). An intact Helitron was aligned with its second best hit (i.e., not with itself) by CLUSTALW. The degree of divergence was determined by using the baseml module of PAML (version 4.2) and used to calculate amplification dates. Fig. 1 shows the predicted amplification dates for these intact Helitrons. The vast majority (99%) of the intact Helitrons are predicted to have amplified within the last 6 million years, with an amplification peak at 0.25 million years ago (Mya). Approximately 3% of intact maize Helitrons are 100% identical, suggesting very recent duplication and insertion. For comparison, the predicted amplification times of intact Helitrons in rice, sorghum, and Arabidopsis were calculated in the same manner (Fig. 1). Rice and sorghum both gave ≈3% intact Helitrons with 100% identity and an amplification peak ≈0.25 Mya, nearly identical to what was observed for maize despite their divergence from common ancestors >11 Mya (25). In contrast, <1% of intact Helitrons in Arabidopsis are 100% identical, and the overall Helitron population exhibited a major amplification between 1 Mya and 2 Mya (Fig. 1).

Fig. 1.
Helitron amplification dates for maize, rice, sorghum, and Arabidopsis. Solid circles denote the mean amplification date, and solid triangles denote the median amplification date.

Because transition mutation frequencies are quite variable across genomic regions, and especially frequent in TEs compared with coding exons (20), it seemed appropriate to further pursue this analysis by ignoring transitions and only counting transversions. When divergence and amplification dates were recalculated in this manner, similar results were obtained (Fig. S3).

To carefully characterize the nature of Helitron divergence in recent times, 100 intact Helitrons were randomly selected for further analysis. Each was aligned to its second best hit by CLUSTALW. Nucleotide changes and indels were counted (Table S1). More than 6,000 single nucleotide changes were found. Small indels (1–5 bp) were found to be more frequent than larger indels, a result previously observed as an outcome of DNA removal processes in angiosperms (21). The ratio of transition to transversion mutations was observed to be 3.05 to 1, suggesting that most Helitrons are heavily cytosine-methylated in the maize genome (20).

Gene Fragment Acquisition.

A BLASTX search against the National Center for Biotechnology Information nonredundant protein database was performed for each intact element. More than 60% of the intact elements (1,194 elements) were found to have acquired one or more gene fragments. See Fig. 2 for examples and Dataset S3 for additional details. The Helitron with the most captured gene fragments (a total of 10) is a truncated element (the 5′ boundary could not be precisely identified, so this element is not mentioned in Fig. 3 and Dataset S3).

Fig. 2.
Examples of maize Helitrons (Hip1_120 and Hide1_2) that have captured gene fragments. Each colored box denotes an exon of an acquired gene fragment. Exons that come from the same gene are shown in the same color.
Fig. 3.
Distribution of the number of gene fragments captured per Helitron exemplar. x axis indicates the number of gene fragments captured per exemplar. y axis indicates the number of exemplar types that have acquired this many gene fragments. The dark boxes ...

The numbers of gene fragments acquired per element is shown in Fig. 3. Intact maize Helitrons were further classified into 498 “exemplars,” where each exemplar had a unique internal sequence that was >20% different at the nucleotide level from any other Helitron internal regions. There are 104 Helitron exemplars that have not captured any gene fragments and 152 exemplars that have acquired one gene fragment. Two exemplars represent unrelated element groups that have each taken up as many as seven different gene fragments. If these gene fragments acquisitions were independent events, one expects that the frequencies of exemplars with one, two, three, or more captured fragments would follow a Poisson distribution. The parameter λ was estimated to be 1.81 (overall mean). Given that there have been 840 different acquisition events, the predicted frequencies of exemplars in each category are shown in Fig. 3. Note that the observed exemplars that have acquired two or three gene fragments are fewer than predicted whereas those that have acquired five or six gene fragments are more frequent than predicted by a random acquisition model. A goodness of fit test yielded P < 0.0001, suggesting that the capture events are not independent. These data suggest that acquisition of one gene fragment facilitates the capture of additional gene fragments or that there is superior survival for fragment-containing Helitrons. Alternatively, or in addition, some maize Helitrons may have an unusual propensity (of unknown molecular basis) for gene fragment acquisition that leads to an overrepresentation of Helitrons that contain multiple gene fragment captures.

There are >700 intact elements that have acquired a phosphatase 2C-like gene fragment, mostly in the Hip1 subfamily and its derivative subfamilies, Hip26 and Hip31. Hip1 is the most numerous (>1,000 intact elements) and most active subfamily in B73 maize. It is likely that the maize phosphatase 2C-like gene fragment capture was a single event that has been amplified by Helitron transposition. Subfamily Hip26 was created by acquisition of a new 5′ end by a Hip1 subfamily member, and the 5′ ends of Hip1 and Hip31 have experienced a number of small sequence changes that have led to their classification as different subfamilies. Sorghum Helitron Hip_SB5_1 also has acquired a phosphatase 2C-like gene fragment (2), but a different portion of the gene from a different member of the phosphatase 2C-like gene family. This coincidence may be pure chance or may reflect either some exceptional lability in this class of gene that promotes its acquisition/retention or, more likely, that capture and/or retention of a phosphatase 2C-like gene fragment has some selective advantage for a Helitron or its host genome. As expected for a predicted independent origin, Hip_SB5_1 in sorghum does not evidence any similarity to maize Helitrons at the nucleotide level, except at the 3′ end.

It is thought that the gene fragments captured by Helitrons have a strong bias in the orientation of their acquisition and/or retention (14). When investigated in the available dataset from the B73 genome of maize, 282 exemplars were found to have gene fragments that are exclusively in the same orientation as the Helitron Rep/helicase gene. There are 20 exemplars that have gene fragments in the opposite orientation and another 92 exemplars with multiple gene fragments that are in opposing orientations in individual elements. Hence, by this method of analysis, acquisition and/or retention of gene fragments within Helitrons is strongly biased (282/20 = 14.1:1 ratio) toward a conserved orientation that is compatible with Helitron promoter-driven expression.

To test whether Helitrons preferentially acquire gene fragments in the same orientation or gene fragments are acquired randomly, gene fragment orientation was plotted against element amplification time (Fig. S4). The Helitrons that had acquired gene fragments in the sense orientation had a broad amplification time distribution, but the majority of elements containing a gene fragment in the opposite orientation were of very recent origin. A Mann–Whitney test for significance in the difference in the respective median and mean ages of sense (0.71 Mya and 1.38 Mya) and antisense (0.38 Mya and 1.14 Mya) gene fragments gave P = 0.089. These results suggest that gene fragments acquired in the antisense orientation are more rapidly removed by selection. However, the effect is not overpowering, suggesting that Helitrons may also preferentially acquire gene fragments in the same orientation, so these combined biases account for the current ≈14:1 ratio of sense to antisense inserts.

Helitrons in the maize B73 genome were estimated to have acquired ≈4,000 gene fragments in a previous study (5). However, because that study was based on a presence/absence comparison between inbreds B73 and Mo17, many shared Helitrons and gene fragment captures could have been missed. Hence, the total number of gene fragments acquired by Helitrons was estimated from (i) the number of observed gene fragments inside intact Helitrons (2,152) and (ii) the ratio of intact Helitrons to total Helitrons (>22,000:1,930 or ≈11.4:1) detected in the assembled B73 reference genome (18). If there is no bias for or against the truncation and sequence decay of Helitrons with respect to acquired gene fragments, then >24,000 gene fragments inside Helitrons are estimated in the B73 genome. Of course, enumeration of these captured fragments is likely to yield an overestimation because of the fragmented status of the B73 reference sequence assembly (18).

Evolution of Gene Fragments.

A simple way to determine whether the fragments inside Helitrons perform an important function is to look for biases either for or against synonymous mutations. Previous studies have reported such selection on acquired DNA fragments inside other TEs (2628). However, those investigations can be misleading if the researchers compared the acquired gene fragment to its presumed ortholog in the host genome. Any perceived evidence of purifying selection, for instance, might actually be an outcome of sequence divergence between the allele of the host gene chosen as the ortholog and the actual allele that was the origin of the acquired gene fragment. To search for evidence of natural selection on gene fragments acquired by Helitrons, 44 Helitrons with independent gene fragments acquisition were randomly chosen. Because some elements had acquired more than one gene fragment, this study included a total of 85 different gene fragments (Dataset S4). Seven gene fragments were too short for informative analysis. Nucleotide sequences of gene fragments from the same acquisition event in different Helitrons were aligned and used to calculate the nonsynonymous and synonymous substitution dN/dS (ω) ratio between acquired gene fragments. P of 0.05 with Bonferroni corrections was chosen as the significance level. As a result, three gene fragments (4% of the total) exhibited significant evidence of purifying selection. For the abundant phosphatase 2C-like gene fragment acquired by Helitrons, 50 elements were randomly chosen and analyzed. The results indicated that these gene fragment are under strong negative selection (P < 0.0001).

To investigate how inaccurate this analysis would have been if we compared gene fragments inside Helitrons with the host gene from which they were derived, a BLASTN search against the B73 maize gene set (18) was first performed to identify the genes that contributed gene fragments to Helitrons. Seven gene fragments were not attributable to any corresponding host gene. For the 71 other inserts in Helitrons, captured gene fragments within the exemplar were always most homologous to only one maize gene. This finding suggests that none of these acquisitions within an exemplar group were independent captures from different genes in the same gene family. With the best candidate donor gene now identified, it was simple to calculate ω for the combined dataset of Helitron gene fragments and their corresponding host genes. From this analysis, nine gene fragments appeared to be under significant purifying selection (P < 0.05 with the Bonferroni correction). Hence, this type of comparison to host genes is likely to provide a major overestimate of purifying selection on gene fragments inside TEs.

To determine whether any of the gene fragments were under adaptive selection, M1 (neutral) and M2 (positive selection) models were built by PAML, and χ2 tests were performed on those two models. Three gene fragments (4% of the total) yielded P values for adaptive selection that were <0.05 with a Bonferroni correction (Dataset S4).

Transcriptional Activity.

Screening maize EST databases (>99.5% identical to a full-length EST sequence) indicated that at least 9% of the identified maize Helitron sequences are expressed in at least one tissue. Comparison to an smRNA database (29) indicated that ≈90% of the identified Helitrons had at least one small RNA match, suggesting that maize Helitrons are subject to significant levels of epigenetic suppression.

Insertion Preferences.

Sequences flanking intact Helitron insertion sites were obtained to assess any possible insertion preference. Fig. S5 A and B shows the analysis suggesting that Helitrons insert preferentially into AT-rich DNA in maize, as reported for Helitrons in other plant and animal genomes (2). As seen previously, the last 3 bp upstream and 8–10 bp downstream of the insertion exhibit an extreme AT richness, reflecting an insertion orientation bias (2).

Helitron distributions on each chromosome of the B73 reference maize genome (18) were determined for both intact and fragmented elements. Fragmented elements were considered valid if they contained at least 100 bp of contiguous >80% identity to a known intact element. In contrast to Helitrons in other plant genomes (2), maize Helitrons were found to be most abundant in gene-rich regions of the chromosomes (Fig. S5C). The reasons for this dramatic difference are not known, but there are several possibilities.

It should be remembered that the Helitron distribution in a genome is the result of the balance between TE insertion and DNA removal (3032). The great haplotype variability in maize (10, 33) indicates that TEs are completely removed from a region in <2 million years (21, 22). Moreover, DNA removal (at least by unequal homologous recombination) appears to be more rapid in gene-rich regions (which have a higher level of meiotic recombination) than in pericentromeric regions (21, 22, 34). Hence, if maize Helitrons are mostly younger and more active than those in other studied species (6, 7), one might see a greater level of accumulation in gene-rich regions because there has been less time for their removal. To test this hypothesis, nonparametric Mann–Whitney tests were performed on Helitrons amplification times to see whether the Helitron amplification times (Fig. 1) are significantly different in other genomes. The intact maize Helitrons were compared with those in Arabidopsis, rice, and sorghum. On average, maize Helitrons were found to be significantly younger than Arabidopsis Helitrons (P = 0.017), but were determined to be significantly older than Helitrons in rice and sorghum (P < 0.0001). However, if transversions alone are used in the calculations, maize Helitrons amplification dates were not significantly different from those in Arabidopsis, but were still significantly older than those in rice and sorghum (P < 0.0001). T test approximations yielded the same conclusions. This analysis indicates that maize Helitrons are not enriched in gene-rich regions because of a shorter average duration in regions of the genome that show a high rate of DNA removal.

A second model proposes that the gene-rich regions of maize are structurally much more like the gene-poor regions of the sorghum, rice, Arabidopsis, and nematode genomes than they are like the gene-rich regions of these small genomes. Because the gene-poor regions of these four small genomes are composed of large blocks of repetitive DNA intermixed with rare genes, they may have the same chromatin conformations as the gene-rich regions of the maize genome, which are composed of large blocks of repetitive DNA intermixed with genes. If chromatin composition determines the targeting of Helitrons, as it does for Ty elements in yeast (35), then it is likely that Helitrons in both maize and the four smaller genomes are all inserting in the same preferred chromatin types. In maize, we know that some families of LTR retrotransposons exhibit a bias toward insertion into heterochromatin near genes, whereas others exhibit a bias for insertion into heterochromatin that is not near genes (18, 36, 37). Helitrons in maize appear to share the behavior of those elements with the gene-associated heterochromatin bias.

Sequences encompassing 500 bp upstream and 500 bp downstream of each intact Helitron were retrieved and annotated to search for additional insertion site characteristics. Using the annotations provided for the B73 draft sequence (18), it was observed that 6% (109) of Helitrons are inserted into or near genes, 34% (658) are inserted into or near LTR retrotransposons, 23% (436) are inserted into or near non-Helitron DNA transposons, 26% (497) are inserted into (117 elements) or near (380 elements) Helitrons, and the other 11% were on small fragments that did not allow definitive determination of the nature of the target site. Given that the Helitrons in B73 make up only ≈2% of the genome, a random Helitron insertion model predicts only 64 intact Helitrons inserted into or within 500-bp range of another Helitron, not the 497 that were observed. This ≈7.8-fold bias for accumulation near Helitrons is more dramatic than the ≈2-fold bias for accumulation near other DNA transposons [that contribute ≈10% of the B73 genome (18)], but both are significant (P < 0.0001).

One possible explanation for the clustering of Helitrons might be that they primarily transpose to nearby sites. However, it was observed that only 31 Helitrons were inserted into or near their most related Helitron in the B73 genome, whereas a random model predicts that there would have been 46 such cases. Hence, Helitrons in maize significantly (P = 0.01) avoid inserting into or near their parental element, thus rejecting the short transposition distance model. For 117 Helitrons inserted into another Helitron, 84 were observed to be in the same orientation as the target Helitron, whereas 33 were in the opposite orientation. For 380 Helitrons inserted near another Helitron, 235 were in the same orientation, whereas 145 were in the opposite orientation (73 head to head and 72 tail to tail). If the insertion orientation had no bias, the ratio of insertion in the same direction and in the opposite direction would be 1:1. χ2 tests gave P < 0.0001 for nonrandomness of both Helitron-internal and nearby insertions. None of these pairs of inserted Helitrons were present as more than one paired copy, so they are not cointegrants, and none are present as the same element in tandem, as observed in vesper bats (17).

A total of 226 Helitrons were observed to be inserted into or near a Helitron of the same subfamily, whereas 229 are inserted into or near a Helitron of a different subfamily. If there was no bias for accumulation relative to the specific Helitron subfamily properties, one would expect 115 Helitrons inserted into or near a Helitron of the same subfamily. Hence, there is a significant (P < 0.0001) bias for Helitrons to accumulate near Helitrons with similar terminal sequences, although they are usually distinguishable members of that subfamily.

Taken together, all of these characteristics of Helitron distribution indicate a strong bias for insertion into regions of the genome that contain Helitrons of the same family and subfamily. It is more difficult to explain these results as an outcome of biases in DNA removal, because there is no precedent for a DNA removal process that is somehow more active at removing DNA that is more weakly related (i.e., a more distant Helitron family) to a nearby Helitron than it is at removing DNA that is more strongly related (i.e., the same Helitron subfamily). A simpler model proposes that Helitrons in the genome exist in a unique chromatin state (perhaps with associated proteins involved in rolling circle amplification) that attracts other Helitrons, and attracts those most aggressively that have the same coevolved association between their structure and the particular rolling circle amplification enzymes encoded by that Helitron subfamily.

Materials and Methods

Helitron Identification and Abundance.

The structure-based approach to Helitron discovery that was used has been described (2). Maize genomic sequences were downloaded from www2.genome.arizona.edu/genomes/maize (18). After the structural search, a second search was used on the residual sequence data to find any previously identified Helitrons that may have been missed. A BLAST search of all intact maize Helitrons was performed against a comprehensive maize TE database containing LTR retrotransposons and cut-and-paste DNA transposons (18). All identified homologies were manually inspected. Non-Helitron TE fragments were removed from the Helitron database (replaced by N). To find the genome contribution of all Helitron elements, both intact and fragmented, a BLAST search of the entire genome was performed against all intact elements with non-Helitron TE fragments removed. Homologies with at least 80% identity and at least 100 bp in contiguous length were counted. A BLASTX search for Helitron-specific Rep/helicase genes was also performed against the B73 draft sequence (18) with a maximum expect value of e-10. Helitron-specific Rep/helicases were retrieved from Repbase (38). The results from both BLAST searches were combined to calculate genome contribution. The total number of 5′ and 3′ ends (30 bp with at least 80% identity to intact elements) was used to estimate the minimum total number of elements in the genome.

Helitron Family, Subfamily, and Exemplar Assignment.

Sequences with the most similar 3′ ends (30 bp with at least 80% identity) were classified as members of the same family, and sequences with the most similar 5′ ends (30 bp with at least 80% identity) were classified as members of the same subfamily. A short word starting with H was assigned as the name for each Helitron family, and a number follows the family name to denote the subfamily. Exemplars were defined as those Helitrons with unique internal sequences (<80% identity to any other exemplar). These classifications yielded 498 exemplars in 62 subfamilies within eight families.

Helitron Divergence and Estimation of Helitron Amplification Dates.

An all-by-all BLAST search was performed with all intact B73 maize Helitrons. Each intact Helitron was aligned with its second best hit by CLUSTALW, and the corresponding divergence was calculated by the baseml module of PAML (version 4.2) (39). The amplification time was calculated from the formula T = k/2r (k = divergence), using the substitution rate, r, of 1.3 × 10−8 per site per year for rice, sorghum, and maize (21) and 1.05 × 10−8 per site per year for Arabidopsis (40). Nonparametric Mann–Whitney tests were performed to see whether maize Helitrons are significantly older than Helitrons in other species. The NPAR1WAY procedure of SAS (version 9.1) was used to calculate P values.

Helitron Gene Fragment Acquisition.

A BLASTX search of the National Center for Biotechnology Information nonredundant protein database (as of October 3, 2008) was performed for each intact Helitron. Gene fragments were identified if a homology was detected with a maximum expect value of e-10 or e-5 if the homology was from a species other than maize. TE-related proteins were excluded.

Gene Fragment Evolution.

A total of 44 Helitron exemplars that had acquired gene fragments were randomly chosen. PERL programs were used to facilitate the alignment of nucleotide sequences of gene fragments within the same exemplar type, with the initial alignment based on the predicted amino acid sequences. Alignments were manually inspected. Sequences after a stop codon were removed. Sequences <50 bp also were not evaluated. The codeml module of PAML (version 4.2) (39) was used to calculate dN/dS (ω) ratios to infer selection on gene fragments. Model M0 with ω fixed at 1 and model M0 with ω estimated from the data were built. Twice the differences of log likelihood of the above two models were calculated and χ2 tests with df 1 were performed to assess whether ω was significantly different from 1. BLASTN searches against the maize gene set (18) were performed to identify the host genes that the gene fragments in Helitrons had been acquired from. Model M1 (neutral) and model M2 (positive selection) were built. Twice the differences in the log likelihood of the two models above were calculated and χ2 tests with df 2 were performed to assess whether significant adaptive selection was underway. All PAML analyses were run multiple times to verify convergence.

Helitron Insertion Preferences.

Flanking sequences (50 bp both upstream and downstream) of all intact Helitrons insertion sites were used to calculate base composition. Base composition was calculated by Pictogram. χ2 tests of GC content for 50 bp upstream and downstream of Helitron insertion sites on each position compared with random AT flanking sites were performed. Flanking sequences (500 bp both upstream and downstream) of all intact Helitrons insertion sites were searched by BLAST against a comprehensive maize TE database (18) with an expect value cutoff of e-10. The insertion sites for intact Helitrons were compared with maize gene annotation as well. A prediction of the number of intact Helitrons inserted into or within 500 bp of another Helitron for a random insertion process was calculated by using the number of intact Helitrons [1,930] × (total length of Helitrons in maize genome [45.5 Mb] + (500 + 500) × total number of Helitrons [22,000])/genome size [2,045 Mb]. The predicted number of insertions into or near the same subfamily by a random insertion process was calculated given the total number of insertions into or near Helitrons multiplied by the frequency of each subfamily. The predicted number of insertions into or near the most closely related Helitron if there was not bias for insertion was calculated by using the total number of insertion into the same subfamily multiplied by the frequency of the most related elements. χ2 tests were performed by comparing the observed and expected numbers with 1 df to assess whether the insertions were significantly biased or not.

Supplementary Material

Supporting Information:


We thank R. Baucom, J. Estill, J. Leebens-Mack, H. Wang, and Q. Zhu for assistance with the APOLLO and PAML programs; three anonymous reviewers for their suggestions to improve this manuscript; D. Promislow for his advice on data analysis; and C. Du and H. Dooner for assistance with comparisons of our databases. This work was supported by National Science Foundation Grant DBI-0607123.


The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

See Commentary on page 19747.

This article contains supporting information online at www.pnas.org/cgi/content/full/0908008106/DCSupplemental.


1. Kapitonov VV, Jurka J. Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA. 2001;98:8714–8719. [PMC free article] [PubMed]
2. Yang L, Bennetzen JL. Structure-based discovery and description of plant and animal Helitrons. Proc Natl Acad Sci USA. 2009;106:12832–12837. [PMC free article] [PubMed]
3. Mendiola MV, de la Cruz F. IS91 transposase is related to the rolling-circle-type replication proteins of the pUB110 family of plasmids. Nucleic Acids Res. 1992;20:3521. [PMC free article] [PubMed]
4. Mendiola MV, Bernales I, de la Cruz F. Differential roles of the transposon termini in IS91 transposition. Proc Natl Acad Sci USA. 1994;91:1922–1926. [PMC free article] [PubMed]
5. Morgante M, et al. Gene duplication and exon shuffling by Helitron-like transposons generate intraspecies diversity in maize. Nat Genet. 2005;37:997–1002. [PubMed]
6. Lal SK, Giroux MJ, Brendel V, Vallejos CE, Hannah LC. The maize genome contains a Helitron insertion. Plant Cell. 2003;15:381–391. [PMC free article] [PubMed]
7. Gupta S, Gallavotti A, Stryker GA, Schmidt RJ, Lal SK. A novel class of Helitron-related transposable elements in maize contain portions of multiple pseudogenes. Plant Mol Biol. 2005;57:115–127. [PubMed]
8. Lai J, Li Y, Messing J, Dooner HK. Gene movement by Helitron transposons contributes to the haplotype variability of maize. Proc Natl Acad Sci USA. 2005;102:9068–9073. [PMC free article] [PubMed]
9. Xu J, Messing J. Maize haplotype with a Helitron-amplified cytidine deaminase gene copy. BMC Genet. 2006;7:52–64. [PMC free article] [PubMed]
10. Wang Q, Dooner HK. Remarkable variation in maize genome structure inferred from haplotype diversity at the bz locus. Proc Natl Acad Sci USA. 2006;103:17644–17649. [PMC free article] [PubMed]
11. Jameson N, et al. Helitron mediated amplification of cytochrome P450 monooxygenase gene in maize. Plant Mol Biol. 2008;67:295–304. [PubMed]
12. Sweredoski M, DeRose-Wilson L, Gaut BS. A comparative computational analysis of nonautonomous Helitron elements between maize and rice. BMC Genomics. 2008;9:467–479. [PMC free article] [PubMed]
13. Brunner S, Pea G, Rafalski A. Origins, genetic organization, and transcription of a family of nonautonomous helitron elements in maize. Plant J. 2005;43:799–810. [PubMed]
14. Kapitonov VV, Jurka J. Helitrons on a roll: Eukaryotic rolling-circle transposons. Trends Genet. 2007;23:521–529. [PubMed]
15. Gilbert W. The exon theory of genes. Cold Spring Harbor Symp Quant Biol. 1987;52:901–905. [PubMed]
16. Rensing SA, et al. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 2008;319:64–69. [PubMed]
17. Pritham EJ, Feschotte C. Massive amplification of rolling-circle transposons in the lineage of the bat Myotis lucifugus. Proc Natl Acad Sci USA. 2007;104:1895–1900. [PMC free article] [PubMed]
18. Schnable PS, et al. The B73 maize genome: Complexity, diversity and dynamics. Science. 2009 in press. [PubMed]
19. Wicker T, et al. A unified classification system for eukaryotic transposable elements. Nat Rev Genet. 2007;8:973–982. [PubMed]
20. SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20:43–45. [PubMed]
21. Ma J, Bennetzen JL. Rapid recent growth and divergence of rice nuclear genomes. Proc Natl Acad Sci USA. 2004;101:12404–12410. [PMC free article] [PubMed]
22. Ma J, Devos KM, Bennetzen JL. Analyses of LTR-retrotransposon structures reveal recent and rapid genomic DNA loss in rice. Genome Res. 2004;14:860–869. [PMC free article] [PubMed]
23. Lerat E, Rizzon C, Biemont C. Sequence divergence within transposable element families in the Drosophila melanogaster genome. Genome Res. 2003;13:1889–1896. [PMC free article] [PubMed]
24. Hollister JD, Gaut BS. Population and evolutionary dynamics of Helitron transposable elements in Arabidopsis thaliana. Mol Biol Evol. 2007;24:2515–2524. [PubMed]
25. Swigonova Z, et al. Close split of sorghum and maize genome progenitors. Genome Res. 2004;14:1916–1923. [PMC free article] [PubMed]
26. Jiang N, Bao Z, Zhang X, Eddy SR, Wessler SR. Pack-MULE transposable elements mediate gene evolution in plants. Nature. 2004;431:569–573. [PubMed]
27. Hanada K, et al. The functional role of Pack-MULEs in rice inferred from purifying selection and expression profile. Plant Cell. 2009;21:25–38. [PMC free article] [PubMed]
28. Juretic N, Hoen DR, Huynh ML, Harrison PM, Bureau TE. The evolutionary fate of MULE-mediated duplications of host gene fragments in rice. Genome Res. 2005;15:1292–1297. [PMC free article] [PubMed]
29. Johnson C, Bowman L, Adai A, Vance V, Sundaresan V. CSRDB: A small RNA integrated database and browser resource for cereals. Nucleic Acids Res. 2007;35:D829–D833. [PMC free article] [PubMed]
30. Bennetzen JL. Patterns in grass genome evolution. Curr Opin Plant Biol. 2007;10:176–181. [PubMed]
31. Bennetzen JL. Transposable elements, gene creation, and genome rearrangement in flowering plants. Curr Opin Genet Dev. 2005;15:621–627. [PubMed]
32. Bennetzen JL, Ma J, Devos KM. Mechanisms of recent genome size variation in flowering plants. Ann Bot. 2005;95:127–132. [PubMed]
33. Fu H, Dooner HK. Intraspecific violation of genetic colinearity and its implications in maize. Proc Natl Acad Sci USA. 2002;99:9573–9578. [PMC free article] [PubMed]
34. Ma J, Bennetzen JL. Recombination, rearrangement, reshuffling, and divergence in a centromeric region of rice. Proc Natl Acad Sci USA. 2006;103:383–388. [PMC free article] [PubMed]
35. Brady TL, Schmidt CL, Voytas DF. Targeting integration of the Saccharomyces Ty5 retrotransposon. Methods Mol Biol. 2008;435:153–163. [PubMed]
36. Baucom RS, et al. Retroelement diversity, distribution and evolution in the B73 maize genome. PLoS Genet. 2009 in press.
37. Liu R, et al. A GeneTrek analysis of the maize genome. Proc Natl Acad Sci USA. 2007;104:11844–11849. [PMC free article] [PubMed]
38. Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–467. [PubMed]
39. Yang Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24:1586–1591. [PubMed]
40. DeRose-Wilson LJ, Gaut BS. Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata. BMC Evol Biol. 2007;7:66–79. [PMC free article] [PubMed]
41. Du C, Fefelova N, Caronna J, He L, Dooner HK. The polychromatic Helitron landscape of the maize genome. Proc Nat Acad Sci USA. 2009 in press. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...