• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Genomics. Author manuscript; available in PMC Jul 1, 2010.
Published in final edited form as:
PMCID: PMC2703446
NIHMSID: NIHMS111559

Estimating the age of retrotransposon subfamilies using maximum likelihood

Abstract

We present a maximum likelihood model to estimate the age of retrotransposon subfamilies. This method is designed around a master-gene model which assumes a constant retrotransposition rate. The statistical properties of this model and an ad hoc estimation procedure are compared using two simulated data sets. We also test whether each estimation procedure is robust to violation of the master gene model. According to our results, both estimation procedures are accurate under the master gene model. While both methods tend to overestimate ages under the intermediate model, the maximum likelihood estimate is significantly less inflated than the ad hoc estimate. We estimate the ages of two subfamilies of human-specific LINE-I insertions using both estimation procedures. By calculating confidence intervals around the maximum likelihood estimate, our model can both provide an estimate of retrotransposon subfamily age and describe the range of subfamily ages consistent with the data.

Keywords: LINE-1, maximum likelihood, subfamily age, retrotransposon

Introduction

Retrotransposons are genomic sequences capable of producing duplicates that insert into a new position within the host genome [1]. Retrotransposons disrupt host genetic structure as they duplicate themselves by inducing transductions, duplications, and deletions [24]. Retrotransposons can promote genetic instability, influence gene expression, and affect the process of double-strand breaks and DNA repair [3,57]. It is even thought that this genome shuffling could create the fertility barrier necessary for speciation to occur [8]. Retrotransposons thus act as powerful mutagens in the genome of their hosts.

These mobile elements not only contain information about themselves, but also about the history of their hosts. Retrotransposons accumulate mutations over time as their frequency distribution within the host population changes through the process of genetic drift. The known mechanics of retrotransposition make these elements especially well-suited as genetic markers. The ancestral state of retrotransposon insertions is always known to be the empty (no insertion present) allele and nearly all insertion sites are free of homoplasy [4,911]. Polymorphic subfamilies of retrotransposons are thought to have arisen within the last few million years, and therefore, their distribution and diversity reflect relatively recent history [12]. The biology of polymorphic mobile elements thus provides researchers mutational events describing a known period of evolutionary history.

Although retrotransposons are powerful generators of genomic variation, the number of active elements and rate of retrotransposition are not well understood. Under the strict master gene model, a single element, the “master gene,” generates all daughter elements within a subfamily [13]. Only one master gene is active at a time, eventually being replaced by another. Under the strict transposon model, all elements are capable of retrotransposition. Intermediate models assert that a few elements descended from a master gene are themselves capable of retrotransposition [14,15]. In this case, multiple master genes may coexist within a single subfamily. Identifying which model best describes available data has been difficult. Brookfield and Johnson [15] have shown that intermediate models can produce phylogenies that mimic those created by the master gene model as long as the number of retrotranspositionally-active elements is few and the rate at which elements are removed from the host genome is low. However, Cordaux and others [16] have shown that phylogenetic networks, rather than trees requiring bifurcating relationships, can be used to identify the number of active elements within a subfamily of Alu insertions. A reliable estimate of subfamily age is necessary to estimate reliable insertion rates and may help describe the underlying biological process of retrotransposition.

Established methods used to estimate the age of subfamilies include relative measures [17], require estimates of insertion rates [18], were developed for multispecies comparisons [19,20], or are restricted to recent subfamilies with polymorphic insertion frequencies [21]. These restrictions leave many researchers either to estimate subfamily age as slightly before the age of the most divergent element [22] or simply to estimate the ages of individual elements instead of the subfamily itself [23,24].

Here we evaluate two approaches that use sequence data to estimate the age of retrotransposon subfamilies. We introduce a maximum likelihood estimation procedure that incorporates individual retrotransposon sequences as well as the process by which those retrotransposons were ascertained. The second approach, which we call ad hoc estimation, uses the average sequence diversity among retrotransposons within a subfamily, as described elsewhere [25]. We describe the statistical properties of these methods by comparing their performance under the strict master gene model and an intermediate model using computer simulation. We then estimate the age of the Pre-Ta and Ta-1 subfamilies of human LINE-1 (L1) retrotransposons by applying both methods to published data. The Pre-Ta and Ta-1 subfamilies are included in our applied analysis because they contain several hundred members, have 3′ UTR sequences available, and the Pre-Ta subfamily is believed to be older than the Ta-1 subfamily based on sequence analysis and differing insertion frequency distributions [26,27]. This will allow a second test of accuracy, namely whether the estimates will show that the Pre-Ta subfamily is older than the Ta-1 subfamily.

Results

The age estimation procedures

For our likelihood estimates, we assume that the master gene(s) generate daughter elements at a constant rate. Each daughter element begins as an exact replicate of the master gene, accumulating mutations at a neutral rate. Each of these new mutations is a novel event that occurred after insertion as described by the infinite sites model [28]. The age T of a subfamily is measured backwards in mutational time in units of 1/u, where u is the mutation rate per base pair (bp) per year. In this way, each element i inserted tiunits of time ago is expected to differ from the master gene by an average of ti mutations per nucleotide of sequence. Elements within the subfamily are inserted at a uniform rate across the interval [0, T], and so the expected value of ti for a randomly chosen element is T/2. This value is both the expected number of mutations per base pair of sequence on a random element and the expected age of that random element in mutational time. Multiplying this time estimate by two approximates the age of the subfamily. This defines the ad hoc estimation method previously described [25].

The new model begins with a data set of sequences for all n elements belonging to the subfamily identified within a single haploid genome. Each element i has a length of ki base pairs with xi substitutions relative to the consensus or ancestral sequence. The distribution of substitution events observed within element i is Poisson with mean ti, where ti is the insertion date of element i.

The likelihood of an estimate of T is a function of both the ascertainment process and the mutational process that generates sequence diversity. As we assume that new elements are equally likely to hit the haploid genome at any point in time between T and the present, the probability density of a retrotransposition event can be written as 1/T. The number of mutations hitting the sequence of the ith element is a Poisson random variable, with mean kiti. Conditional on kiti, the likelihood of the ith element is

Li=1Tekiti(kiti)xixi!.
(1)

The probability of element i appearing in the data set is found by integrating the probability curve. The likelihood of T given the data is therefore equal to

L(T)=i=1n0TLidt.
(2)

The derivative of L(T) is set equal to zero, then solved for T in order to maximize the likelihood estimate T of T. The derivative of the log-likelihood of T is equal to

ddTlnL(T)=nT+i=1nekiT(kiT)xi0Tekiti(kiti)xidt.
(3)

The sampling variance of T is estimated by the negative reciprocal of the second derivative of the likelihood evaluated at T = T·T can be interpreted in years by dividing the estimate by a sequence mutation rate per bp per year. We used two estimates of this sequence mutation rate to interpret our results. The first DNA sequence mutation rate (0.105% per million years) is derived from estimates of pseudogene sequence divergence between human and chimpanzee populations, assuming that the two populations split 6 million years ago (MYA) from a shared ancestral population with an effective population size of 104 [29]. The second DNA sequence mutation rate (0.25% per million years) is derived from sequence divergence between human-specific and orangutan-specific L1 subfamilies [22].

Simulation

Although the master gene model is a fair approximation for the amplification dynamics of L1 retrotransposons, there are notable exceptions: some mutations hit the L1 master gene(s) and eventually lead to subfamily-specific mutations, while multiple L1 elements within polymorphic subfamilies are full-length and capable of retrotransposition [14]. The performance of the new model and the ad hoc estimate are evaluated using computer simulation under the master gene model and under an intermediate model that allows for multiple active elements.

The first simulated data set tests the performance of the maximum likelihood estimation procedure and ad hoc estimation under the master gene model. A master gene is inserted at time T proportional to 6 MYA (approximating the human-chimpanzee divergence) and spawns a number n of daughter elements in a haploid genome. Each daughter element i accumulates mutations under the Poisson distribution with probability kiti conditional on its insertion date ti. Both n and the distribution of ki are set equal to that observed in the Ta-1 data set described elsewhere [2]. We generated 104 data sets by sampling from the Poisson distribution and the distribution of ki just described. Each of these simulated data sets is used to estimate T.

The second simulated data represents an intermediate model of retrotransposition. Data sets are generated as described under the master gene model, except that 20% of generated elements are allowed to have spawned not from the master gene but from an older daughter element. These “granddaughter” elements represent the product of retrotranspositionally-active daughters of the original master gene. The granddaughters inherit the mutations already accumulated by its parental element while still accumulating additional mutations as it matures.

Table 1 summarizes the simulation results, reporting characteristics of the distribution of subfamily age estimates by simulation model. These distributions are illustrated in Figure 1. The null hypotheses that the new maximum likelihood model and the ad hoc estimate produce estimates of T equal to the true value cannot be rejected by the simulated data sets (P > 0.05). Under the strict master gene model, the average maximum likelihood estimate (T = 5.9967 MYA) and average ad hoc estimate (T = 6.0032 MYA) were accurate and do not significantly differ (P > 0.05). Neither method is particularly biased, with relative biases less than 0.1% of their estimate of T.

Figure 1
Comparison of estimation methods across models of retrotransposition, T = 6 million years ago
Table 1
Summary statistics of simulation results, in millions of years

Under our intermediate model simulation, the average maximum likelihood estimate (T = 6.4689 MYA) and average ad hoc estimate (T = 6.5908 MYA) are clearly inflated, but neither is able to reject the true value of 6 MYA (P > 0.05). This upward bias is shown in Figure 1, as the age distributions estimated from the simulations based on the intermediate model are shifted to the right, relative to the distribution of age estimates from the simulations based on the master gene model. The distribution of maximum likelihood estimates is less shifted than the ad hoc distribution, producing an average maximum likelihood estimate of T that is significantly less than the average ad hoc estimate (P < 0.025). While their relative biases are comparable, the new maximum likelihood estimate is slightly less biased than the ad hoc estimate (relative bias = 7.25% and 8.96 %, respectively).

Application

L1s are non-LTR (Long Terminal Repeat) retrotransposons that have been actively inserting into the mammalian genome for 150 million years and number ~0.5 million copies in the human genome [3,30,31]. Nearly all L1 elements have been silenced by 5′ truncation, inversions, and point mutations [3,26,32]. Although several full-length L1s are capable of generating daughter elements, the majority of new insertions believe to be generated by a small subset of “hot” L1s [14], and so follow an intermediate model of retrotransposition. Pre-Ta is the oldest subfamily of human polymorphic L1 subfamilies. The 362 unique Pre-Ta elements identified in the haploid human genome have an average age of 2.34 million years [26]. Analysis of ~208bp of sequence from these 362 Pre-Ta elements produces a maximum likelihood estimate T equal to 0.0108 units of mutational time (95% CI: 1.0765*10−2, 1.0835*10−2). A total of 404 substitutions are observed in 72,872bp of sequence, equaling a sequence divergence among Pre-Ta elements of 0.55%. This sequence divergence yields an ad hoc estimate of T equal to 0.0110 units of mutational time.

Ta-1 is the youngest subfamily of human polymorphic L1s, with elements averaging an age of 1.71 million years [2]. Analysis of ~886bp of sequence from the 191 Ta-1 elements ascertained in the haploid human genome database [2] leads to a maximum likelihood estimate T equal to 0.0050 units of mutational time (95% CI: 4.9715*10−3, 5.0285*10−3). A total of 402 substitutions are observed in 154,384bp of sequence. This indicates a sequence divergence of 0.26% within the Ta-1 subfamily, leading to an ad hoc estimate T equal to 0.0052 units of mutational time.

In order to interpret T, it can be converted from mutational time to years by dividing it by an appropriate DNA sequence mutation rate. If we apply a mutation rate of 0.105% per million years, as estimated from the sequence divergence between human and chimpanzee pseudogenes [29], the age of Pre-Ta subfamily is estimated to be 10.29 MYA (95% CI: 10.25, 10.32 MYA) using the maximum likelihood estimate, or 10.48 MYA using the ad hoc estimate. This same mutation rate estimates the age of the Ta-1 subfamily to be 4.79 MYA (95% CI: 4.76, 4.82 MYA) using the maximum likelihood estimate or 4.95 MYA using the ad hoc estimate. If instead we apply an L1-specific DNA sequence mutation rate of 0.25% per million years, as estimated from the sequence divergence between human-specific and orangutan-specific L1 subfamilies [22], the age of the Pre-Ta subfamily is estimated to be 4.32 MYA (95% CI: 4.31, 4.33 MYA) using the new maximum likelihood estimate, or 4.40 MYA using the ad hoc estimate. This L1-specific mutation rate estimates the age of the Ta-1 subfamily to be 2.01 MYA (95% CI: 2.00, 2.02 MYA) using the maximum likelihood estimate compared to the ad hoc estimate of 2.08 MYA. In every case, the maximum likelihood estimate of T is significantly less than the ad hoc estimate of T (P < 0.025).

Discussion

Our simulation study indicates that the maximum likelihood model and the ad hoc procedure both closely predict the true value of T, though they are biased upwards as the number of retrotranspositionally-active elements in the subfamily increases. Despite this slight bias, both estimates failed to reject the true value of T under both the master gene model and the intermediate model. This suggests both methods are robust to moderate violation of the master gene model. However, the average maximum likelihood estimate is significantly less than the average ad hoc estimate T under the intermediate model.

When applied to real data collected for Pre-Ta and Ta-1 subfamilies of human L1s, the Pre-Ta subfamily was reliably estimated to be approximately twice the age of the Ta-1 subfamily. Although the maximum likelihood and ad hoc estimates of T were quite similar, the maximum likelihood estimate is significantly less than the ad hoc estimate (P < 0.025). This is consistent with what we observed in our simulation study under the intermediate model, although the differences in the applied case are less extreme than in the simulation study.

As demonstrated in the simulation study, both our maximum likelihood and ad hoc methods inflate estimates of T as the proportion of active elements increases. As it is known that L1 subfamilies do not strictly follow the master gene model, it is likely our estimates of T are inflated. It is difficult to determine the exact amount of bias in our estimates in this applied example. However, we do know that ~26% of Pre-Ta L1s are approximately full length, while ~31% of Ta (including Ta-0 and Ta-1 subfamilies) elements are approximately full length [2,26]. Using Brouha et al.’s [14] observation that ~7% of full length L1 elements are “hot” L1s, we can estimate that approximately 2% of all Pre-Ta and Ta-1 elements account for the majority of retrotransposition within their subfamilies. The results of our simulation study therefore suggest it is therefore likely that our estimates of L1 subfamily ages are minimally biased.

The conversion of the estimated T values into units of millions of years highlighted an important difference between the mutation rate estimated from pseudogenes and the rate estimated from L1 sequence data. Using the lower pseudeogene mutation rate, the Pre-Ta subfamily is estimated to have emerged 10.29 MYA using our maximum likelihood approach. As both subfamilies are known to be human-specific polymorphisms, this date in excess of 6 MYA does not seem reasonable. If instead the L1-specific mutation rate is used, the maximum likelihood estimates of T suggest that Pre-Ta subfamily emerged with the Australopithecines, while the Ta-1 subfamily arose at the dawn of the genus Homo [33]. Violation of the assumptions of the master gene model, such as a variable L1 insertion rate or multiple “hot” L1’s, could alter sequence diversity with L1 subfamilies, and therefore cause the L1-specific DNA sequence mutation rate to differ from the pseudogene mutation rate.

In the search for the age of retrotransposon subfamilies, this paper has introduced a maximum likelihood estimation procedure and compared its statistical properties to those of the ad hoc procedure. The two methods produce similar estimates and accurately estimate the age of a subfamily when that subfamily mimics the master gene model of retrotransposition. This work suggests that the ad hoc method can be used to easily obtain the age of a retrotransposon subfamily, while the new maximum likelihood method may be used to estimate confidence intervals around such an age. Significant differences between the ad hoc and maximum likelihood estimate of T suggests violation of the master gene model and may implicate an intermediate model of retrotransposition.

While the new maximum likelihood estimation procedure may perform well for polymorphic subfamilies of human L1s, not all families of mobile elements follow the master gene model as closely. Our simulation study results suggest that as the deviation from the master gene model increases, so does both the maximum likelihood and ad hoc estimates of T. Care should be taken when interpreting results using our method when applied to subfamilies of retrotransposons known to strongly deviate from the master gene model. One approach may be to independently analyze subfamilies or clusters of elements within a subfamily that appear to descend from a very few master genes using our model, as we did for the two active subfamilies of human L1s.

Future development of this model could relax some of its underlying assumptions. The model has been designed to analyze active subfamilies, though it could be extended to allow for the study of inactive subfamilies. However, this extension would require knowing the time at which the subfamily became inactive, which may not be estimable with certainty. The assumptions of the maximum likelihood model could be modified to incorporate a variable number of active master genes within a given subfamily, or to allow fluctuation in retrotransposition rate over time. The results of our simulation study suggest that both our maximum likelihood estimate and the ad hoc estimate will be inflated in the presence of multiple active master genes, while variation in retrotransposition rate over time will likely bias estimates of subfamily age toward time periods with high retrotransposition rates. Until further development, the model in its current form provides insights into retrotransposon biology and may be applied to active retrotransposon subfamilies believed to approximately follow the master gene model of retrotransposition.

Materials and Methods

Simulation

We assess the accuracy of the new model and ad hoc estimation using 95% confidence intervals and estimates of bias. Our empirical 95% confidence intervals represent the central 95% of T estimated from simulated data under each condition. If the empirical 95% confidence interval excludes T = 6 MYA, then we can reject the null hypothesis that our estimate is equal to the true value of T. We calculate a 95 % confidence interval about the mean maximum likelihood estimate of T as

T^±1.96σn,
(4)

where σ = the sample standard deviation and n = the number of sequences analyzed. This calculated 95% confidence interval is used to test the null hypothesis that the mean maximum likelihood T is greater than or equal to the ad hoc estimate. If the ad hoc estimate of T is too large to be captured by the maximum likelihood 95% confidence interval, we are able to reject the null hypothesis at the 0.025 level. The bias of an estimate is calculated as the square root of the difference between the mean squared error and the variance of the estimate. Bias then describes the degree to which the estimate is shifted away from the true value.

Application

Sequences of the 3′ UTR belonging to human-specific Pre-Ta and Ta L1 elements were collected from the literature [2,26]. Ta-1 elements were identified from this data set using subfamily-defining mutations [32]. Clustered substitutions, inversions, or other mutations not resulting from single base misincorporation were eliminated from the analysis [34]. We analyzed ~208bp of sequence from each of 362 Pre-Ta L1s ascertained in the haploid human genome [26]. For comparison, we also analyzed ~886bp of sequence from each of 191 Ta-1 L1s ascertained in the haploid human genome [2]. The sequences were then compared to their consensus and the number of substitutions was recorded. These values were then evaluated using the new maximum likelihood estimator to find an estimate of T. The total number of substitutions observed and base pairs analyzed were used to calculate sequence divergence within each subfamily. This was then used to calculate the ad hoc estimate [25]. 95% confidence intervals were constructed as given in equation 4.

Scripts to perform these analyses were written using Matlab and are available from the authors upon request.

Acknowledgments

We thank Henry Harpending for his helpful comments during the preparation of this manuscript. This research was supported by NIH grant GM-59290 and NSF grant BCS-0218370.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1. Charlesworth B, Sniegowski P, Stephan W. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature. 1994;371:215–220. [PubMed]
2. Myers JS, Vincent BJ, Udall H, Watkins WS, Morrish TA, Kilroy GE, Swergold GD, Henke J, Henke L, Moran JV, Jorde LB, Batzer MA. A comprehensive analysis of recently integrated human Ta L1 elements. Am J Hum Genet. 2002;71:312–26. [PMC free article] [PubMed]
3. Ostertag EM, Kazazian HH., Jr Biology of mammalian L1 retrotransposons. Annu Rev Genet. 2001;35:501–38. [PubMed]
4. Vincent BJ, Myers JS, Ho HJ, Kilroy GE, Walker JA, Watkins WS, Jorde LB, Batzer MA. Following the LINEs: an analysis of primate genomic variation at human-specific LINE-1 insertion sites. Mol Biol Evol. 2003;20:1338–48. [PubMed]
5. Morrish TA, Gilbert N, Myers JS, Vincent BJ, Stamato TD, Taccioli GE, Batzer MA, Moran JV. DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat Genet. 2002;31:159–65. [PubMed]
6. Symer DE, Connelly C, Szak ST, Caputo EM, Cost GJ, Parmigiani G, Boeke JD. Human l1 retrotransposition is associated with genetic instability in vivo. Cell. 2002;110:327–38. [PubMed]
7. Gasior SL, Wakeman TP, Xu B, Deininger PL. The human LINE-1 retrotransposon creates DNA double-strand breaks. J Mol Biol. 2006;357:1383–1393. [PMC free article] [PubMed]
8. Dimitri P, Junakovic N. Revising the selfish DNA hypothesis: new evidence on accumulation of transposable elements in heterochromatin. Trends Genet. 1999;15:123–124. [PubMed]
9. Shedlock AM, Okada N. SINE insertions: powerful tools for molecular systematics. Bioessays. 2000;22:148–60. [PubMed]
10. Salem AH, Ray DA, Batzer M. Identity by descent and DNA sequence variation of human SINE and LINE elements. Cytogenet Genome Res. 2005;108 [PubMed]
11. Xing J, Witherspoon DJ, Ray DA, Batzer MA, Jorde LB. Mobile DNA elements in primate and human evolution. Am J Phys Anthropol. 2007;134:2–19. [PubMed]
12. Tajima F. Relationship between DNA polymorphism and fixation time. Genetics. 1990;125 [PMC free article] [PubMed]
13. Deininger PL, Batzer MA, Hutchison CA, 3rd, Edgell MH. Master genes in mammalian repetitive DNA amplification. Trends Genet. 1992;8:307–11. [PubMed]
14. Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, Kazazian HH., Jr Hot L1s account for the bulk of retrotransposition in the human population. Proc Natl Acad Sci U S A. 2003;100:5280–5. [PMC free article] [PubMed]
15. Brookfield JFY, Johnson LJ. The evolution of mobile DNAs: When will transposons create phylogenies that look as if there is a master gene. Genetics. 2006;173 [PMC free article] [PubMed]
16. Cordaux R, Hedges DJ, Batzer MA. Retrotransposition of Alu elements: how many sources? Trends Genet. 2004;20:464–7. [PubMed]
17. Kass DH, Batzer MA, Deininger PL. Gene conversion as a secondary mechanism of short interspersed element (SINE) evolution. Mol Cell Biol. 1995;15:19–25. [PMC free article] [PubMed]
18. Promislow DEL, Jordan IK, McDonald JF. Genomic demography: A life-history analysis of transposable element evolution. Proc R Soc Lond B Biol Sci. 1999;266:1555–1560. [PMC free article] [PubMed]
19. Kumar S, Hedges SB. A molecular timescale for vertebrate evolution. Nature. 1998;392:917–920. [PubMed]
20. Yang Z, Yoder AD. Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking Mouse Lemur species. Systemic Biology. 2003;52:705–716. [PubMed]
21. Hedges DJ, Cordaux R, Xing J, Witherspoon DJ, Rogers AR, Jorde LB, Batzer M. Modeling the Amplification Dynamics of Human Alu Retrotransposons. PLoS Comput Biol. 2005;1:e44. [PMC free article] [PubMed]
22. Boissinot S, Chevret P, Furano AV. L1 (LINE-1) retrotransposon evolution and amplification in recent human history. Mol Biol Evol. 2000;17:915–28. [PubMed]
23. Jordan IK, McDonald JF. Tempo and mode of Ty element evolution in Saccharomyces cervisiae. Genetics. 1999;151:1341–1351. [PMC free article] [PubMed]
24. Xing J, Hedges DJ, Han K, Wang H, Cordaux R, Batzer M. Alu element mutation spectra: Molecular clocks and the effect of DNA methylation. J Mol Biol. 2004;344:675–682. [PubMed]
25. Carroll ML, Roy-Engel AM, Nguyen SV, Salem AH, Vogel E, Vincent B, Myers J, Ahmad Z, Nguyen L, Sammarco M, Watkins WS, Henke J, Makalowski W, Jorde LB, Deininger PL, Batzer MA. Large-scale analysis of the Alu Ya5 and Yb8 subfamilies and their contribution to human genomic diversity. J Mol Biol. 2001;311:17–40. [PubMed]
26. Salem AH, Myers JS, Otieno AC, Scott Watkins W, Jorde LB, Batzer MA. LINE-1 preTa elements in the human genome. J Mol Biol. 2003;326:1127–46. [PubMed]
27. Boissinot S, Furano AV. The recent evolution of human L1 retrotransposons. Cytogenet Genome Res. 2005;110:402–6. [PubMed]
28. Brookfield JFY. Evolutionary forces generating sequence homogeneity within retrotransposon families. Cytogenet Genome Res. 2005;110:383–391. [PubMed]
29. Nachman MW, Crowell SL. Estimate of the mutation rate per nucleotide in humans. Genetics. 2000;156:297–304. [PMC free article] [PubMed]
30. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed]
31. Khan H, Smit A, Boissinot S. Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res. 2006;16:78–87. [PMC free article] [PubMed]
32. Boissinot S, Entezam A, Young L, Munson PJ, Furano AV. The insertional history of an active family of L1 retrotransposons. Genome Res. 2004;14:1221–1231. [PMC free article] [PubMed]
33. Wood B. Hominid revelations from Chad. Nature. 2002;418:133–135. [PubMed]
34. Gilbert N, Lutz S, Morrish TA, Moran JV. Multiple fates of L1 retrotransposition intermediates in cultured human cells. Mol Cell Biol. 2005;25:7780–7795. [PMC free article] [PubMed]
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links