• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Genomics. Author manuscript; available in PMC Jul 1, 2008.
Published in final edited form as:
PMCID: PMC2045505

In search of polymorphic Alu insertions with restricted geographic distributions


Alu elements are transposable elements that have reached over one million copies in the human genome. Some Alu elements inserted in the genome so recently that they are still polymorphic for insertion presence or absence in human populations. Recently, there has been an increasing interest in using Alu variation for studies of human population genetic structure and inference of individual geographic origin. Currently, this requires a high number of Alu loci. Here, we used a linker-mediated PCR method to preferentially identify low frequency Alu elements in various human DNA samples with different geographic origins. The candidate Alu loci were subsequently genotyped in 18 worldwide human populations (~370 individuals), resulting in the identification of two new Alu insertions restricted to populations of African ancestry. Our results suggest that it may ultimately become possible to correctly infer the geographic affiliation of unknown samples with high levels of confidence without having to genotype as many as 100 Alu loci. This is desirable if Alu insertion polymorphisms are to be used for human evolution studies or forensic applications.

Keywords: Alu insertions, humans, polymorphism, genetic structure, geographic inference


Alu elements are ~300-bp-long transposable elements that have expanded in primate genomes within the last ~65 million years (My) [1]. Alu elements mobilize (i.e. produce new copies) via a “copy and paste” mechanism, in which the RNA transcript of an active element is reverse transcribed as cDNA and the duplicate element is inserted at a new genomic location [1]. Although only a subset of all Alu elements are capable of producing new copies [2; 3], they continue to expand in the human genome at a substantial rate [4]. As a result, Alu elements have reached over one million copies in the human genome, making them the most successful mobile elements in the human genome [1; 5]. Concomitantly, Alu elements have had a considerable impact on their host genome, e.g. by inducing genetic disease [6] or promoting genomic plasticity [7; 8; 9]. Therefore, Alu elements represent an important source of human genomic variation [1].

Some Alu elements have inserted in the genome so recently that they are still polymorphic for insertion presence or absence at the individual or population levels [1; 10]. Because recently integrated Alu elements follow a neutral model of evolution [11], they represent an important source of genetic markers for human population studies [1]. Hence, Alu elements have proven useful for addressing questions related to human evolution [12; 13; 14; 15]. More recently, there has been an increasing interest in using Alu variation for studies of human population genetic structure and inference of individual geographic origin [16; 17]. For example, by genotyping 100 polymorphic Alu loci, Ray et al. [17] were able to correctly infer the geographic affiliation of 18 unknown human individuals with high levels of confidence. However, as noted by the authors, the discovery and characterization of novel Alu insertions with restricted geographic distributions may allow reducing the number of elements required without loss of confidence in the results, which could be desirable for applications such as in forensics [17].

Previously, it has been technically challenging to identify recently integrated Alu elements due to the difficulty of detecting one new insertion among one million pre-existing elements in the genome. More recently, computationally-based approaches have facilitated the identification of a number of elements differentially inserted among individuals or populations [10; 18; 19; 20; 21]. Obviously, such approaches are limited by the availability of genomic sequence data. This may result for example in the preferential recovery of high frequency elements since the elements have to be present in the sequence to be identified. Moreover, the geographic origin or ethnicity of the samples used to generate genomic data may sometimes be unknown or vague, thus rendering the identification and characterization of Alu insertions with restricted geographic distributions more difficult. To circumvent these potential disadvantages, we used a linker-mediated PCR-like method previously designed to recover newly inserted Alu elements [22], modified to target low frequency Alu elements in various samples with different geographic origins. The candidate Alu insertion loci were subsequently genotyped in a panel of worldwide human populations, resulting in the identification of several Alu insertions with putatively restricted geographic distributions. These novel genetic markers may prove useful for human evolution studies or forensics applications.

Results and Discussion

We searched for new Alu insertion loci in two separate individuals and five pools of three individuals with different geographic origins (Table 1). For each set of experiments, we sequenced from ~200 to ~500 different clones. We identified a total of nine candidate Alu loci that were amenable to PCR (Table 1). However, one locus (RC3) was recovered twice independently, in the African sample L945 and the African pooled sample. Therefore, there were eight potential new Alu elements for further analysis.

Table 1
Summary of display cloning and sequencing results.

Sequence features

The eight loci were sequenced from the individual sample in which they were identified, along with the human sample Hela and three other non-human primates, to test whether the recovered elements were specific to humans, and, if so, to obtain the sequence of the ancestral pre-insertion site of the Alu elements. In all cases, the Alu element was absent at the orthologous site in non human primates, confirming the recent integration of the elements specifically in the human lineage. As shown in Figure 1, all eight Alu elements displayed the hallmarks of recent integration, including conserved target site duplications ranging in size from 10 to 15 nucleotides, and a long poly-A tail at the 3’ end of the element ranging in size from 19 to 47 nucleotides [22; 23].

Figure 1
Sequence features of eight Alu insertion polymorphisms and their flanking sequence

Inspection of the nucleotide sequences of the eight Alu elements showed that two belong to the Ya8 subfamily (RC2 and RC3) while the remaining six elements belong to the Ya5 subfamily (Figure 1). Although the protocol we used is designed to preferentially recover Ya8 elements [22], the above results are not surprising for two reasons: (i) the Ya8 and Ya5 subfamilies are two closely related, human-specific Alu subfamilies [21], and (ii) the Ya8 subfamily comprises less than 50 copies in the human genome [11] while there are ~2,000 copies of the Ya5 subfamily in the human genome [21]. Since all new loci we identified are Ya8 or closely related Ya5 Alu elements, we believe that our approach is reasonably selective, especially when taking into consideration the fact that thousands of human-specific Alu insertions exist in the human genome [24]. This conclusion is further supported by the fact that the vast majority of the clones we sequenced for each sample (e.g. ~78% of the 507 clones sequenced for sample L945) yielded either Ya8 or Ya5 Alu loci.

The eight Alu sequences we identified in this study diverged from their respective subfamily consensus sequence by 0 to 2 nucleotide substitutions, further suggesting that they were integrated in the human genome very recently. Overall, all the sequence features associated with the eight Alu loci suggest they may be recent enough to be highly polymorphic in human populations.

Population diversity

To test the degree of variation of the eight new Alu insertion loci in humans, we genotyped them in worldwide samples encompassing ~460 chromosomes (Table 2). All loci were found to be polymorphic, with the frequency of the allele with the Alu insert ranging from 1.4% to 45.0%, and an average frequency across the eight loci of 23.0%. The frequencies recorded for the eight new loci were significantly lower (Mann-Whitney U-test, P = 0.003) than those recorded for the eight polymorphic AluYa8 loci (67.7% on average) previously identified in the human genome reference sequence [11]. Furthermore, the frequency distribution of the eight new loci was clearly skewed towards low frequencies (Figure 2). By contrast, the frequency distribution of the eight polymorphic AluYa8 loci previously identified in the human genome reference sequence [11] was skewed towards high insertion frequencies (Figure 2). These results indicate that our approach preferentially detects low frequency Alu elements in human populations, as compared to polymorphic Alu elements identified from genome database searches.

Figure 2
Frequency distributions of polymorphic Alu elements
Table 2
Alu insertion frequencies for twelve human populations (for each locus, the frequency of the allele with the Alu insert is given) and average heterozygosity (Het) per locus. Average n: average number of individuals genotyped for each locus.

The average heterozygosity per locus was quite variable, with two extremely low values (~0.03 for RC5 and A1) and others that ranged from 0.30 to 0.46 (Table 2). While most loci were found at appreciable frequencies in all three major continental areas, two loci (RC5 and A1) displayed a remarkable distribution pattern since they were both found uniquely in populations with African ancestry. This pattern explains the extremely low heterozygosity values recorded for these loci on a worldwide scale, because all non-African individuals possessed the homozygous absent genotype at these two loci, thus leading to reduced genetic diversity on a global scale.

To further assess the restricted geographic distribution of the RC5 and A1 loci, we genotyped them in 6 additional populations from Africa, Asia and Europe encompassing 280 different chromosomes (Table 3). The extended results were consistent with the results reported for the smaller dataset. In sum, the RC5 and A1 loci were genotyped in a worldwide panel of ~370 individuals (corresponding to ~740 different chromosomes) and they were both found exclusively in populations of African ancestry and completely lacking in non African samples.

Table 3
Alu insertion frequencies for six human populations (for each locus, the frequency of the allele with the Alu insert is given). Average n: average number of individuals genotyped for each locus.

It is surprising that the two African-specific Alu loci are actually present in every African group (except RC5 in Egyptians, Table 2). This is because one would expect that Alu elements widespread in Africa would also be found outside Africa, as this would imply that the elements inserted in the human genome prior to the expansions of modern humans within and then out of Africa ~50,000 years ago [25; 26]. By contrast, Alu loci restricted to Africa would be expected to have inserted in the human genome so recently that they would be found only in some, but not all African populations. The fact that we found the two Alu loci RC5 and A1 in diverse African groups such as Pygmies, San, South African Bantus, African Americans (likely of West African ancestry) and Egyptians suggests that some migration that influenced all African groups may have taken place after modern humans left Africa ~50,000 years ago. However, further studies involving more loci are needed to test whether such a migration indeed occurred or the observed pattern based on two Alu loci is the result of chance alone.

Concluding remarks

In sum, our approach was able to recover two new Alu elements that exhibit restricted geographic distributions. The two African-specific Alu loci we identified may prove useful for human evolution studies or forensics applications. It is noteworthy that these two loci were found within African populations at low frequencies (<25%). Therefore, the absence of the element in an individual would not be informative with respect to geographic affinities. However, the presence of the element in an individual would suggest an African ancestry with a high probability. Ideally, several Alu markers with relevant geographic distributions could be used in conjunction to increase the resolution and confidence in the results. Using the available data, we can estimate how many Alu loci would be needed to infer geographic affiliation with a high degree of confidence. Assuming the loci are in Hardy-Weinberg equilibrium and in linkage equilibrium, and assuming they are truly African-specific, then the presence of at least one Alu insert at any locus indicates that the individual is of African ancestry. So, individuals from Africa are incorrectly classified only if they are homozygous for the absence of the insert at both loci. The two African-specific loci RC5 and A1 have an average insert frequency of 0.066 in African populations, so the frequency of homozygotes for the absence of the insert at both loci is (0.934)2×(0.934)2 = 0.761, and the probability of correctly classifying an African individual as African is 1-0.761 = 0.239 based on two loci. In general, for n loci with an average insert frequency of p, the probability of correctly classifying an individual is 1-((1-p)2)n). If the average insert frequency is 0.066, then 22 loci would be needed to have a 95% chance of correctly classifying an individual as African. In other words, with 22 African-specific loci, each with average insert frequency of 0.066, there is less than a 5% chance that an African individual would be homozygous for the absence of the insert at all 22 loci. Thus, it may ultimately become possible to correctly infer the geographic affiliation of unknown samples with high levels of confidence without having to genotype as many as 100 Alu loci [17]. To this end, the identification of additional informative Alu loci is desirable, and we demonstrated here that our approach is capable of recovering new Alu loci with restricted geographic distribution.

Materials and Methods

DNA samples

A total of 17 human DNA samples were used to ascertain new Alu insertions, either separately or as pools of three individuals (Table 1). DNA samples were obtained from the Coriell Institute for Medical Research, except L945, 10237, 10408 and 1052, which were available from previous studies in our laboratory. The human-specific nature of the candidate Alu loci was evaluated by their presence or absence in four primate species, including human HeLa (cell line ATCC-CCL2), common chimpanzee [Clint] (NS06006B), gorilla (AG05251) and orangutan (ATCC-CR6301). Each locus was also genotyped in a panel of 228 individuals (456 chromosomes) from 12 human populations originating from the three major continental groups: Africa, Asia and Europe [12; 13]. Details on the populations and sample sizes are shown in Table 2. We assigned our South American samples to the Asian continental region because of the genetic roots of Amerindians in Asia [27]. Alu loci RC5 and A1 were further genotyped in 140 additional individuals (280 chromosomes) from 6 diverse human populations (Table 3) [12; 13].

Identification of candidate Alu insertion loci

We used a modification of the ASAP PCR previously described [22]. Genomic DNA was digested with NdeI. Double-stranded linkers MSET and MSEB [22] were subsequently ligated to the digested DNA. Three successive PCR reactions were then performed using the linker primer LNP and nested Alu primers ASII, HS18R and HS16R [22], to obtain collections of PCR products enriched in Alu elements belonging to the Ya8 subfamily. The AluYa8 subfamily is one of the youngest and most polymorphic Alu subfamilies currently known in humans [11; 22]. Third round PCR products were cloned into vectors using the TOPO-TA cloning kit (Invitrogen), according to the manufacturer’s instructions. Colonies were randomly picked and DNA sequencing was performed using chain termination sequencing on an Applied Biosystems 3100 automated DNA sequencer.

The resulting sequences contained the 5’ region of an Alu element along with the Alu element 5’ flanking sequence. The flanking sequences were used as queries in BLAT searches against the May 2004 freeze of the human genome reference sequence, as implemented in the University of California, Santa Cruz, genome browser (http://genome.cse.ucsc.edu) to determine whether an Alu insertion was already known to be located at each locus.

Genotyping of Alu loci

When the BLAT search predicted the absence of the Alu element in the human genome reference sequence, 1,000 bp of flanking sequence from each side of the predicted Alu insertion site were extracted and oligonucleotide primers were designed as previously described [11]. PCR reactions were conducted to amplify the candidate loci first in four primate species and then in various human samples, as previously described [11]. Specific information on each locus including chromosomal location, primer sequences, annealing temperature and PCR product sizes is shown in Table 4. Resulting PCR products were separated on 2% agarose gels, stained with ethidium bromide and visualized using UV fluorescence. PCR products from the four primate samples were sequenced as described above. Sequences generated in this study have been deposited in Genbank under accession numbers EF372292-EF372328.

Table 4
Candidate Alu loci amenable to PCR and PCR amplification conditions.


We thank K. Han for technical assistance, and A.-H. Salem, S. Milligan, J.-P. Moisan, M. Hochmeister, L. Henke, J. Henke, M. Tahir, and P. Ioannou for providing samples. This research was supported by National Science Foundation BCS-0218338 and EPS-0346411 (MAB), National Institutes of Health RO1 GM59290 (MAB), and the State of Louisiana Board of Regents Support Fund (MAB).


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


1. Batzer MA, Deininger PL. Alu repeats and human genomic diversity. Nat Rev Genet. 2002;3:370–9. [PubMed]
2. Deininger PL, Batzer MA, Hutchison CA, 3rd, Edgell MH. Master genes in mammalian repetitive DNA amplification. Trends Genet. 1992;8:307–11. [PubMed]
3. Cordaux R, Hedges DJ, Batzer MA. Retrotransposition of Alu elements: how many sources? Trends Genet. 2004;20:464–7. [PubMed]
4. Cordaux R, Hedges DJ, Herke SW, Batzer MA. Estimating the retrotransposition rate of human Alu elements. Gene. 2006;373:134–7. [PubMed]
5. Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [PubMed]
6. Chen JM, Stenson PD, Cooper DN, Ferec C. A systematic analysis of LINE-1 endonuclease-dependent retrotranspositional events causing human genetic disease. Hum Genet. 2005;117:411–27. [PubMed]
7. Sen SK, et al. Human genomic deletions mediated by recombination between Alu elements. Am J Hum Genet. 2006;79:41–53. [PMC free article] [PubMed]
8. Callinan PA, et al. Alu Retrotransposition-mediated Deletion. J Mol Biol. 2005;348:791–800. [PubMed]
9. Bailey JA, Liu G, Eichler EE. An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet. 2003;73:823–34. [PMC free article] [PubMed]
10. Wang J, et al. dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum Mutat. 2006;27:323–9. [PMC free article] [PubMed]
11. Cordaux R, Lee J, Dinoso L, Batzer MA. Recently integrated Alu retrotransposons are essentially neutral residents of the human genome. Gene. 2006;373:138–44. [PubMed]
12. Batzer MA, et al. African origin of human-specific polymorphic Alu insertions. Proc Natl Acad Sci U S A. 1994;91:12288–92. [PMC free article] [PubMed]
13. Stoneking M, et al. Alu insertion polymorphisms and human evolution: evidence for a larger population size in Africa. Genome Res. 1997;7:1061–71. [PMC free article] [PubMed]
14. Batzer MA, et al. Genetic variation of recent Alu insertions in human populations. J Mol Evol. 1996;42:22–9. [PubMed]
15. Watkins WS, et al. Genetic variation among world populations: inferences from 100 Alu insertion polymorphisms. Genome Res. 2003;13:1607–18. [PMC free article] [PubMed]
16. Bamshad MJ, et al. Human population genetic structure and inference of group membership. Am J Hum Genet. 2003;72:578–89. [PMC free article] [PubMed]
17. Ray DA, et al. Inference of human geographic origins using Alu insertion polymorphisms. orensic Sci Int. 2005;153:117–24. [PubMed]
18. Carter AB, et al. Genome-wide analysis of the human Alu Yb-lineage. Hum Genomics. 2004;1:167–78. [PMC free article] [PubMed]
19. Wang J, et al. Whole genome computational comparative genomics: A fruitful approach for ascertaining Alu insertion polymorphisms. Gene. 2006;365:11–20. [PMC free article] [PubMed]
20. Bennett EA, Coleman LE, Tsui C, Pittard WS, Devine SE. Natural genetic variation caused by transposable elements in humans. Genetics. 2004;168:933–51. [PMC free article] [PubMed]
21. Otieno AC, et al. Analysis of the Human Alu Ya-lineage. J Mol Biol. 2004;342:109–18. [PubMed]
22. Roy AM, et al. Recently integrated human Alu repeats: finding needles in the haystack. Genetica. 1999;107:149–61. [PubMed]
23. Roy-Engel AM, et al. Active Alu element “A-tails”: size does matter. Genome Res. 2002;12:1333–44. [PMC free article] [PubMed]
24. Chimpanzee Sequencing and Analysis Consortium, Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. [PubMed]
25. Cann RL, Stoneking M, Wilson AC. Mitochondrial DNA and human evolution. Nature. 1987;325:31–6. [PubMed]
26. Underhill PA, et al. The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann Hum Genet. 2001;65:43–62. [PubMed]
27. Mulligan CJ, Hunley K, Cole S, Long JC. Population genetics, history, and health patterns in native americans. Annu Rev Genomics Hum Genet. 2004;5:295–315. [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • Nucleotide
    Published Nucleotide sequences
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...