Logo of ajhgLink to Publisher's site
Am J Hum Genet. Apr 2001; 68(4): 990–1018.
Published online Mar 16, 2001. doi:  10.1086/319510
PMCID: PMC1275652

An Extensive Analysis of Y-Chromosomal Microsatellite Haplotypes in Globally Dispersed Human Populations


The genetic variance at seven Y-chromosomal microsatellite loci (or short tandem repeats [STRs]) was studied among 986 male individuals from 20 globally dispersed human populations. A total of 598 different haplotypes were observed, of which 437 (73.1%) were each found in a single male only. Population-specific haplotype-diversity values were .86–.99. Analyses of haplotype diversity and population-specific haplotypes revealed marked population-structure differences between more-isolated indigenous populations (e.g., Central African Pygmies or Greenland Inuit) and more-admixed populations (e.g., Europeans or Surinamese). Furthermore, male individuals from isolated indigenous populations shared haplotypes mainly with male individuals from their own population. By analysis of molecular variance, we found that 76.8% of the total genetic variance present among these male individuals could be attributed to genetic differences between male individuals who were members of the same population. Haplotype sharing between populations, ΦST statistics, and phylogenetic analysis identified close genetic affinities among European populations and among New Guinean populations. Our data illustrate that Y-chromosomal STR haplotypes are an ideal tool for the study of the genetic affinities between groups of male subjects and for detection of population structure.


Variability at microsatellite or short tandem repeat (STR) loci is being used, in various species, for linkage analysis (Weissenbach et al. 1992), individual identification (Hammond et al. 1994), and population-genetic analyses (for an overview, see Bruford and Wayne 1993). Autosomal STR loci have also been successfully applied to reconstruct human evolutionary history (Bowcock et al. 1994; Goldstein et al. 1995a, 1995b; Mountain and Cavalli-Sforza 1997). The resulting phylogenetic trees reveal evolutionary relationships similar to those based on mtDNA sequence variation, and it has been a long-lasting wish to add to these trees the one strictly based on Y chromosome–specific markers. Approximately 5 years ago, the number of published STR loci on the human Y chromosome was <15 (Roewer et al. 1992; Chen et al. 1994; Mathias et al. 1994; Jobling and Tyler-Smith 1995), and only one Y chromosome– specific minisatellite was known (Jobling 1994). The number of known Y-chromosomal single-nucleotide polymorphisms (SNPs) was <10 (Nakahori et al. 1989; Jobling 1994; Mathias et al. 1994; Seielstad et al. 1994; Hammer 1995; Whitfield et al. 1995), and only one Alu-insertion polymorphisms had been discovered (Hammer 1994). This picture, however, has totally changed, owing to the recent introduction of many Y SNPs (Underhill et al. 1997, 2000) and STRs (White et al. 1999; Ayub et al. 2000), and additional new markers are to be expected.

Recently, Y-STR variability has been used both for the dating of SNP mutations, in order to draw conclusions about the origins and history of human populations (Underhill et al. 1996; Zerjal et al. 1997; Bianchi et al. 1998; Lahermo et al. 1999; Kayser et al. 2000a, 2001), and for human identification in forensic casework (Kayser et al. 1997; Prinz et al. 1997; Honda et al. 1999). Nevertheless, global studies of Y-STR variability are still rare (Deka et al. 1996; Seielstad et al. 1999; Jorde et al. 2000), and most of these analyses have been based on combined single-locus information. The major advantage of analyzing the nonrecombining part of the Y chromosome is that single-locus information can be used to construct compound haplotypes that allow male lineages to be characterized in a much more detailed fashion. It has been questioned whether the tracing of human migration history can be achieved solely on the basis of Y STRs (de Knijff et al. 1997). Intuitively, since STRs harbor a much higher degree of genetic variation than do SNPs, this appears to be possible. However, preliminary studies have revealed that, most likely because of the relatively high mutation frequency of Y STRs (Heyer et al. 1997; Kayser et al. 2000b), Y haplotypes can be shared identical by state (IBS) that are not identical by descent (IBD) (de Knijff 2000b). To explore the use of Y-STR haplotypes in more detail, we analyzed the genetic variance, at seven loci, among 986 male individuals from 20 globally dispersed human populations, some of which are closely related (e.g., Dutch, Germans, Swiss, and Italians) and others of which are distantly related (e.g., native South Americans, central African Pygmies, and Papua New Guineans). This enabled us to investigate the utility that these Y STRs have for different evolutionary time scales.

Subjects, Material, and Methods

DNA Samples

DNA samples were obtained from 986 male individuals from 20 different populations: 470 either from Europe or of European ancestry (88 Germans [GER], 88 Dutch [DUT], 64 Swiss [SWI], 100 Italians [ITA], 100 Buenos Aires Metropolitans [BAM], and 30 Basques [BAS]); 200 from Asia (36 Han-Chinese [CHI], 25 pooled samples form Indians [IND] including 15 Dora and 10 Reddi, 69 pooled samples from Indonesians [INO] including 56 samples from a rural area near Jakarta and 13 from Banjarmasin on South Borneo, 40 Khalkh-Mongolians [MON], and 30 pooled samples from Taiwanese Aborigines [TAW] including 13 Ami, 6 Atayal, 5 Bunun, and 6 Paiwan); 113 from New Guinea (26 Papua New Guineans [PNG], 12 Roro [RPN] from the southern coast of Papua New Guinea, 59 Trobiand Islanders [TRO], and 16 Tolai from New Britain [TNB]); 46 from America (pooled samples from native South Americans [NSA] including 6 Wichi, 12 Tehuelche and 16 Mapuche from Argentina, and 12 Yanomami from Brazil); 62 from the Arctic (Inuit [INU] from Nanortalik in southwestern Greenland; and 31 from Africa (central African Pygmies [CAP]). In addition, 54 Surinamese (SUR) and 10 male individuals from West Samoan (WSA) were included. Figure 1 illustrates the geographical location of all 20 populations. Paternal family history was studied, and care was taken to type unrelated male individuals only.

Figure  1
Approximate geographic locations of 20 study populations. The color codings differentiate the eight groups of populations used for AMOVA analyses.

Genetic Screening

All male individuals were genotyped for six tetranucleotide Y STRs—DYS19, DYS389I, DYS389II, DYS390, DYS391, and DYS393 and for one trinucleotide Y STR—DYS392—as described in detail elsewhere (Kayser et al. 1997). It may be noteworthy here that DYS394 is a synonym for DYS19 and refers to the same Y-STR locus amplified with a different primer set. DNA sequencing of a number of alleles amplified with the DYS394, DYS19, and alternative primers revealed that all nucleotide differences between the DNA sequences of DYS394 and DYS19 reported in GENBANK are due to sequencing errors (M.K., unpublished data). According to the recommendations of the International Society of Forensic Genetics, the alleles are designated in terms of the number of variable repeats that they contain (see “DNA Recommendations—1994 Report Concerning Further Recommendations of the DNA Commission of the ISFH Regarding PCR-Based Polymorphisms in STR (Short Tandem Repeats) Systems” 1995). Simultaneous running of allelic ladders for all loci ensured consistent allele designation in the different laboratories. Further information on the markers is available, on request, from the authors and can be found at the Forensic Laboratory for DNA Research and the Y-STR Haplotype Reference Database. Depending on the type of analysis, haplotypes were defined by either combining the number of variable repeats (i.e., the allele name) or using a coded version (i.e., calling the shortest observed allele of each locus “1” and numbering upward for each additional repeat unit), for each locus.

Statistical and Phylogenetic Analysis

For each of the 20 populations, locus-specific allele frequencies were estimated by simple gene counting. The standard error (SE) of allele frequencies was calculated as equation M1, where pi denotes the frequency of the ith allele at any given locus and N equals the total number of individuals screened at this locus.

The intrapopulation locus-specific variance, VL, was estimated by equation M2, where Xi is the size of the allele on the ith chromosome, equation M3 is estimated as equation M4, and n is the number of chromosomes sampled for the population. The intrapopulation genetic variance, VP, was subsequently estimated by averaging across m loci, as VP=(1/mmj=LVLj. Subsequently, a Y-STR haplotype comprising seven loci was constructed for each male, for all analyses, with the loci in the order DYS19-DYS389I-DYS389II-DYS390-DYS391-DYS392-DYS393. An unbiased estimate of haplotype diversity, h, and its variance, V(h), were calculated according to the method of Nei (1987, formulas 8.5 and 8.13 therein). The SE of h, SE(h), was calculated by taking the square root of V(h). Single-locus gene-diversity values where calculated in the same way. Numbers of shared haplotypes were determined for each of the 190 possible pairs of populations, by a simple counting scheme. The probability of identity, p, between these 190 population pairs (which reflects the haplotype-sharing index) was estimated, according to the method of Melton et al. (1995), as pni,jXiXj, where xi and xj are, respectively, the frequencies of a haplotype in populations i and j, summed over the n haplotypes in the two populations.

In an initial pairwise analysis, allele frequencies at the seven loci were compared between all pairs of populations, by the Fisher exact test–based, genic-comparison option included in GENEPOP (Raymond and Rousset 1995).

Genetic relationships between the different populations, based on the seven-locus Y-STR haplotypes, were further explored by analysis of molecular variance (AMOVA), as implemented in Arlequin (see the ARLEQUIN: A Software For Population Genetic Data Analysis website). AMOVA allows a hierarchic analysis of three genetic-variance components—those due to genetic differences (i) between individuals within populations, (ii) between populations within groups, and (iii) between groups (Excoffier et al. 1992; Excoffier and Smouse 1994). For AMOVA, the following eight groups, containing all 20 populations were defined: (1) all European populations (the GER, the DUT, the SWI, the ITA, the BAM, and the BAS); (2) all Asian populations (the CHI, the IND, the INO, the MON, and the TAW), excluding the New Guinean samples; (3) all mainland and island New Guinean samples (the PNG, the RPN, the TRO, and the TNB); (4) the NSA; (5) the INU; (6) the CAP; (7) the SUR; and (8) the WSA. The genetic structure among our population samples was analyzed with consideration for the molecular differences between individual haplotypes, in addition to differences in haplotype frequencies, resulting in estimates of ΦST (or RST), an FST analogue. Significance levels of the genetic-variance components as well as ΦST values were estimated by use of 10,000 permutations.

Pairwise genetic distances between populations were computed as a linearization of ΦST values; that is, DST/(1-ΦST) (Slatkin 1995). On the basis of these adjusted ΦST values, a neighbor-joining (NJ) tree was constructed by PHYLIP, version 3.57c. The resulting tree was visualized by TREEVIEW 1.6.1 (see Rod Page's Home Page). The same adjusted ΦST values were used for a multidimensional-scaling analysis (Kruskal 1964).

An NJ tree connecting all 598 different Y-STR haplotypes was constructed by means of PHYLIP. The resulting complex NJ tree was reduced to 17 clusters of related haplotypes, on the basis of their positions in this tree. For all 17 major clusters, the relative contribution of haplotypes from each of the eight population groups (as defined above for AMOVA analyses) was estimated. This allows a comparison of the distribution of region-specific haplotypes between the 17 clusters and, at the same time, shows the relative contribution of each of the 17 clusters to the total number of haplotypes. On the basis of all seven-locus haplotypes with a total frequency of 0.5% (i.e., observed in at least five male individuals) or higher (n=24), a modified reduced median network (Bandelt et al. 1995) was constructed. Comparisons of allele-frequency distributions between regions were performed by nonparametric exact-test procedures embedded in the program StatXact (Cytel-Software). Significance levels were estimated by the Monte-Carlo simulation mode with 10,000 randomizations.


Allele Frequencies and Haplotype Distributions

A global genetic survey was performed with respect to seven Y-STR loci. Human populations were chosen so as to encompass two subsets of closely related groups of male individuals and a number of genetically more distinct groups of male individuals. In total, we screened 986 male individuals from 20 different populations.

Across regions, the allele-frequency distribution differs for most loci (see Appendix A), although the results are not consistent between loci. Figure 2 illustrates the combined allele-frequency distribution for all seven loci. With the exception of DYS392, all loci have a unimodal distribution with one frequent allele and with the less-frequent alleles differing from the most-frequent allele by a single repeat unit. DYS392 is the only locus that clearly has a bimodal allele-frequency distribution.

Figure  2
Allele-frequency distributions of the seven Y-STR loci, among all 986 male individuals combined. For each locus, the allelic designation (in number of repeats) is indicated on the X-axis, and the observed frequency (in %) is shown on the Y-axis.

With regard to the possible 190 pairwise allele-frequency comparisons between populations, DYS19, DYS392, and DYS390 have highly significant differences (P<.01) in ~90% of all comparisons, whereas for DYS391 ~40% of all comparisons (n=75) were not significant (table 1).

Table 1
Analysis of Differentiation, by Fisher's Exact Test, between 20 Populations, Based on Single-Locus Allele Frequencies of Seven Y-STR Loci

Locus-specific genetic differences between populations are also reflected by the intrapopulation (locus-specific) genetic variances (fig. 3). Among the CAP, for example, only two of eight DYS392 alleles were observed, resulting in a markedly reduced genetic variance for this locus. In contrast, the same population harbors seven of the eight alleles at DYS389II—hence the high genetic variance for this locus. Similar intrapopulation locus-specific variance differences can be observed among the IND, the RPN, and the TNB. Among the European populations (except the BAM), the genetic variance is rather evenly distributed across all loci.

Figure  3
Estimated genetic variances for each the seven Y-STR loci, and the average across the seven loci, within each of the 20 different population samples.

A total of 598 different compound Y-STR haplotypes were observed among 986 individuals (see Appendix B). Haplotype-diversity values varied from .99, in the GER, the SWI, the ITA, the CHI, and the SUR, to .86, in the RPN (table 2). A total of 437 haplotypes (73.1%) were observed in just a single male in a single population (these haplotypes are designated “single unique”). A further 68 (11.4%) haplotypes were shared only by male individuals within a single population (these haplotypes are designated “multiple unique”). Single- and multiple-unique haplotypes combined (i.e., the “total unique”) are the number of haplotypes that are specific to a single population (table 2). The remaining haplotypes (n=93 [15.5%]) were observed in multiple male individuals in multiple populations (i.e., are nonunique) and thus are shared by populations. Not surprisingly, the number of single-unique haplotypes was not distributed evenly across the populations. Among the CHI and the TNB, a large number (79%) of male individuals displayed a single unique Y haplotype, whereas the INU (23%) and the BAS (29%) displayed the lowest level of single-unique haplotypes. The INU and the CAP displayed the highest rates of multiple-unique haplotypes (32% and 33%, respectively). High ([gt-or-equal, slanted]90%) frequencies of population-specific haplotypes were observed among the MON, the NSA, and the TNB. Low ([less-than-or-eq, slant]50%) frequencies were noted only in a number of European groups (the GER, the DUT, and the SWI). The BAS, with only 35% population-specific haplotypes, displayed the lowest number of unique haplotypes. For all population samples except the GER, the DUT, the SWI, and the BAS, the proportion of nonunique haplotypes was either higher than or nearly equal to that of unique haplotypes. Of the 190 pairwise population comparisons, 121 (63.7%) had no shared haplotype, 35 (18.4%) had one, 10 (5.3%) had two, 4 (2.1%) had three, 4 (2.1%) had four, and 16 (8.4%) had five or more (table 3).

Table 2
Y-STR Haplotype–Sharing Statistics
Table 3
Number of Shared Haplotypes (below the Diagonal) and Probability of Identity (above the Diagonal), for All 190 Possible Population Pairings

AMOVA and Genetic Distances

For AMOVA, populations from the same geographic region were pooled together to form eight groups: (1) all European populations (the GER, the DUT, the SWI, the ITA, the BAM, and the BAS); (2) all Asian populations (the CHI, the IND, the INO, the MON, and the TAW) except the New Guinean samples; (3) all mainland and island New Guinean samples (the PNG, the RPN, the TRO, and the TNB); (4) the NSA; (5) the INU; (6) the CAP; (7) the SUR; and (8) the WSA. Using AMOVA, we estimated the relative contribution to the total observed genetic variance of (i) the genetic variance between individuals within populations, (ii) the genetic variance between populations within groups, and (iii) the genetic variance between groups (table 4). The contribution of genetic variance between populations, in light of molecular differences between haplotypes and differences in haplotype frequencies combined (i.e., ΦST), was 23.2% ( P<.00001). Thus, by far, most (76.8%) of the genetic variance present among our male individuals could be explained by intrapopulation differences, whereas only 16.8% of the genetic variance was due to genetic differences between groups.

Table 4
AMOVA in 20 Populations, Based on Seven Y-STR Loci

The matrix of pairwise ΦST values is shown in table 5. The lowest interpopulation variances (ΦST<.05) were observed between some European populations (i.e., between the SWI and the DUT, between the SWI and the ITA, and between the BAM and the BAS) and between a number of Asian and Papua New Guinean populations—that is, between pairs of closely related populations; in contrast, high values (ΦST>.5) were observed between pairs of distinct populations, such as INU/WSA, INU/CAP, or IND/INU. For most of the pairwise population comparisons, the interpopulation differences were significant. Only nine population pairs have nonsignificant (P>.05) ΦST values; three of those pairs include only European populations, and the remaining six include only populations from Southeast Asia, mainly New Guinea.

Table 5
ΦST Values (below the Diagonal) and Their Significance Levels (above the Diagonal)[Note]

Phylogenetic Analysis

On the basis of the linearized ΦST distances, an NJ tree was drawn (fig. 4). When the tree topology and the ΦST significances (table 5) are compared, the NJ tree provides a reasonable “fit,” with most of the nonsignificant population differences corresponding to tight clusters, one containing the European population samples and the other containing the Asian and Papua New Guinean populations. Only the close clustering of the BAS with both the INU and the NSA and the close clustering of the MON with the GER seem counterintuitive. An almost identical picture was obtained by multidimensional-scaling analysis (fig. 5).

Figure  4
Unrooted NJ tree, based on linearized ΦST distances derived from AMOVA connecting all 20 population samples.
Figure  5
Genetic map based on multidimensional-scaling analysis and a matrix of the pairwise linearized ΦST distances derived from AMOVA.

An alternative way of presenting our results, instead of focusing on differences between populations, is to concentrate on the genetic relationships between haplotypes. An NJ tree (fig. 6) was constructed including all 598 different seven-locus Y-STR haplotypes observed among the 986 male individuals from all 20 populations. We defined 17 major clusters of related haplotypes, on the basis of their positions within the tree. Every major cluster was analyzed with respect to the geographic origin of the haplotypes that it contains. As was done for the AMOVA analyses, all 20 populations were grouped into eight classes. We find that the haplotypes observed among New Guineans, for example, are primarily restricted to a number of closely related haplotype clusters (1, 11, and 13–15). Also, the African haplotypes are preferentially present in just three clusters (4, 6, and 7), and cluster 4 also contains a large proportion of SUR haplotypes. Cluster 12 contains a very large proportion of Arctic haplotypes, together with a considerable number of Amerindian haplotypes. On the other hand, haplotypes observed among Europeans are present, in all 17 clusters, at low frequencies. A modified reduced median network connecting only those haplotypes that were observed at a frequency [gt-or-equal, slanted]0.5% is shown in figure 7. On the basis of this network, it appears that the majority of the INU-specific haplotypes and European haplotypes form two distinct clusters. It is also noteworthy that these haplotypes are connected by single mutation steps. In contrast, Asian and New Guinean haplotypes are separated by multiple mutation steps and are located in different parts of the network, possibly reflecting a more ancient origin of these groups.

Figure  6
Unrooted NJ tree connecting all 598 distinct seven-locus Y-STR haplotypes. The complex topology of the tree was reduced by delineation of 17 major clusters of related haplotypes. The relative contribution of haplotypes of each of the eight population ...
Figure  7
Modified reduced median network based only on haplotypes observed in at least five male individuals (0.5%). The diameter of each circle corresponds to a categorical absolute frequency (n=5–9, n=10–14, and n>14). Multicolored pies ...


The major advantage of analysis of polymorphic loci from the nonrecombining part of the Y chromosome is that they facilitate a simple construction of highly informative compound haplotypes that will characterize each distinct male lineage in detail. In addition, because of its (almost) strictly paternal mode of inheritance and lack of recombination, the Y chromosome is extremely sensitive to genetic drift. These two characteristics render the Y chromosome potentially very informative for the study of human evolution. Nevertheless, the tracing of human migration events solely on the basis of Y STRs has been questioned (de Knijff et al. 1997); in particular, because of the relatively high mutation frequency of Y STRs (Heyer et al. 1997; Kayser et al. 2000b), Y haplotypes can be shared IBS without being IBD (de Knijff 2000b). To explore the utility of Y-STR haplotypes in more detail, we analyzed the genetic variance of seven (six tetrameric and one trimeric) Y-STR loci among 986 male individuals from 20 globally dispersed human populations.

All loci used in this study are located in the nonrecombining part of the human Y chromosome and thus are completely linked, lacking any recombination. Nevertheless, we observed locus-specific differences in both the intra- and interpopulation genetic variance (fig. 1 and Appendix A). These observations can easily be explained by mutation-rate differences between the loci and by the different population histories and genetic drifts; for example, DYS390 has the highest variance in 10 of 20 populations (present study) and was found to have the highest mutation rate among 15 Y STRs, including the loci studied here (Kayser et al. 2000b). Locus-specific differences were also demonstrated by a population-differentiation test, in which all but 7 of the 190 population pairs could be significantly distinguished on the basis of DYS390 alone. In contrast, 75 of 190 pairwise comparisons were not significant for DYS391 (table 1). This supports the observations, by others (Jorde et al. 2000), that, especially among European populations, Y STRs are very powerful in the detection of genetic differences between populations, compared with autosomal STRs. This can be attributed to the greater sensitivity of nonrecombining Y-chromosomal markers to founder effects and genetic drift.

Haplotype Analyses

The marked genetic variation of Y-STR haplotypes across global populations is reflected in the haplotype-diversity values, which are >.98 in 13 of 20 populations (table 2); values >.99 in 3 of the 5 European populations indicate a heterogeneous European gene pool. Haplotype-diversity values were also >.99 in the CHI and in the SUR; in the CHI this might be explained by the large population size, in the SUR by the well-known admixture of four distinct populations (i.e., Dutch, African, Asian, and native South American) within the gene pool. Lower haplotype-diversity values were observed in samples from small and isolated populations such as the INU, the BAS, and the CAP. Elsewhere (Perez-Lezaun et al. 1997), a reduced haplotype diversity for the BAS (compared with that in Catalans) has also described, on the basis of a smaller number of Y microsatellites. The lowest haplotype diversity, .86 in the RPN, might due to small sample size and inbreeding. The relatively high (.93) haplotype diversity in the very small (n=10) sample of the WSA seems puzzling, but it is not without precedent (Flint et al. 1999); however, one could also argue that it is simply due to small sample size (n=10). Recently, for the same set of Y STRs that have been used in the present study, it has been shown that, compared with that in 17 eastern Asian/Southeast Asian, Melanesian, and Australian populations, the haplotype diversity in a sample (n=28) from the Cook Islands of Polynesia was low (.89) (Kayser et al. 2001). This was explained by a means of a scenario postulating a recent bottleneck during the colonization of the Pacific. Our results on haplotype diversity are in contrast with those of studies, using different types of markers, that reported the highest genetic diversity in Africa. Recently, this has also been reported for Y STRs, but such differences in non-Africans are not significant (Seielstad et al. 1999; Jorde et al. 2000). In both data sets, the number of African populations pooled to obtain the high diversity values was greater than that for other continents. Especially in the study by Seielstad et al. (1999), in which 25 African populations with small sample sizes were pooled and compared with three to seven populations from other continents, this might have strongly influenced the diversity values obtained. In both studies, diversity was calculated on the basis of combined single-locus data rather than on the basis of haplotype data. If we calculate single-locus gene-diversity values on the basis of our data, we obtain the highest average diversity, .63, for Africa (one population), compared with .60 for Asia (five populations pooled), .56 for Europe (five populations pooled), and .54 for New Guinea (four populations pooled).

Investigation of haplotype sharing (or identity) within populations (multiple-unique haplotypes) and of population-specific haplotypes (single- and multiple-unique haplotypes) allows some insight into population structure (tables (tables22 and and3).3). High amounts of haplotype sharing within populations and/or of population-specific haplotypes, such as those observed among the CAP, the INU, the IND, the TRO, and the NSA, indicate small and/or isolated population; on the other hand, high numbers of nonunique haplotypes and consequent haplotype sharing between populations indicate a close relationship between populations. This was found to be true for all European populations studied here. With our current knowledge, it is difficult to say what proportion of STR-haplotype sharing is due to recurrent mutation (i.e., IBS sharing) and what proportion is due to a genuine shared (recent) common ancestry (i.e., IBD sharing); however, it seems plausible to assume that the sharing of one or two haplotypes—as observed here, for example, between the CAP and the GER or between the TRO and the INU—reflects IBS sharing rather than IBD sharing. The analysis of additional Y STRs might reveal nonidentity; in contrast, the sharing of [less-than-or-eq, slant]20 haplotypes between European groups (including the BAS) is more likely to reflect IBD sharing. Interestingly, the BAM share five to eight haplotypes with European groups but share only three with the NSA. This probably reflects their recent European ancestry, combined with little or no admixture with the NSA, which has also been revealed by autosomal markers (Sala et al. 1997). Also, a more recent study of Brazilian populations reports no evidence for malAJHGv68p990.gmle Amerindian admixture, on the basis of Y-SNP analysis (Carvalho-Silva et al. 2001). Interestingly, the INU share either four or five haplotypes with each of the ITA, the SWI, the DUT, and the GER, which might indicate recent European gene flow into Greenland. Within our second group of closely related populations (those from mainland and island Papua New Guinea), we only sporadically observed haplotype sharing. This can be explained by the much lower sample sizes for the New Guinean groups compared with the European groups.

To study the interpopulation genetic affinities in more detail, we performed an AMOVA, based on Y STR–defined haplotypes (table 4). All but 9 of the 190 population pairs can be differentiated significantly on the basis of the seven-locus Y-STR haplotypes employed (table 5); most of these 9 population pairs for which a nonsignificant P value for ΦST was obtained include populations in which close relationship can be assumed—that is, among European populations or among New Guinean populations. The results of the haplotype sharing and ΦST analyses were not always correlated (tables (tables33 and and5).5). Of the 15 strictly European population pairs with high numbers of shared haplotypes, 12 have significant ΦST values, whereas population pairs with either no or one haplotype shared—for example, the New Guinean populations—have nonsignificant ΦST values. This is explainable on the basis of the number of mutational differences between nonidentical haplotypes. Interestingly, two of the three European pairs with nonsignificant ΦST values include the SWI, suggesting a strong Y-chromosome affinity between the SWI and other Europeans (probably because of their central geographic position and their multicultural composition), and the same observation was made for mtDNA (Pult et al. 1994); on the other hand, despite 20 haplotypes shared by the GER and the DUT, their ΦST value is statistically significant, which confirms earlier findings, which were based on four-locus Y-STR haplotypes (Roewer et al. 1996). Overall, we find that ~23% of the total genetic variance observed among the male individuals of this study was due to interpopulation differences. This is exactly the same value as was found by Poloni et al. (1997) in 58 globally distributed populations that were analyzed by use of the Y chromosome–based p49a,f/TaqI polymorphism.

Phylogenetic Analyses

The unrooted NJ tree, based on linearized pairwise ΦST values, shows a topology with most populations belonging to either of two groups of related populations—Europeans and New Guineans—forming distinct tight clusters (fig. 4). The New Guineans are located close to the CHI and the TAW. The grouping of the RPN outside the New Guinea cluster may be due to its small sample size. The BAS and the BAM are distant from the European cluster but group together, which may reflect their Iberian origin. Furthermore, the clustering of the BAM on the same branch with the NSA and the INU may indicate some but little Amerindian admixture within the Argentinean sample. The clustering of the BAS next to the NSA is hard to evaluate and needs further investigation; however, it has been suggested elsewhere (Ruhlen 1994) that the BAS and the Na-Dene are remnant members of the Dene-European language group, but this view is not shared by many linguists. The grouping of the CAP together with the SUR most likely reflects the relatively high African admixture in the Surinamese gene pool. It is also noteworthy that the first branch of the obtained trees does not separate Africans from non-Africans, as has been clearly observed in many phylogenetic trees using different genetic systems, including mtDNA (Cann et al. 1987; Vigilant et al. 1991), autosomal STRs (Bowcock et al. 1994), autosomal minisatellites (Armour et al. 1996), autosomal Alu-insertion polymorphisms (Stoneking et al. 1997), and, recently, Y STRs (Seielstad et al. 1999). This has often been interpreted as the indication for a recent common African origin of anatomically modern humans. The difference between the results of our study and those of all others could be the result of the our study's inclusion of only a single African population, whereas 25 pooled African populations were used by Seielstad et al. (1999). A multidimensional–scaling analysis clearly corroborates these conclusions, in that most population samples from the two groups of closely related populations are clustered (fig. 5).

Phylogenetic analysis of the 598 distinct haplotypes revealed a complex picture reflecting the high amount of diversity (fig. 6). However, we were able to identify a number of major clusters that preferentially contain certain AJHGv68p990.gmlpopulations. Cluster 12 contained most of the INU. Also, most of the CAP haplotypes were found in only 3 of 17 major clusters—that is, in clusters 4, 6, and 7, one of which (i.e., cluster 4) also contains the majority of SUR haplotypes. The INU and the CAP contain a large number of population-specific haplotypes, most probably because they are small and isolated populations. This is also reflected in the Y-STR haplotype tree. On the other hand, Europeans with small numbers of specific haplotypes appear in all of the 17 major clusters, which could be due to the overrepresentation (48% [470/986]) of European-origin male individuals in our study.

It could be argued that all haplotypes that are observed only once are not phylogenetically informative, since they represent recent migration or recurrent mutation events. Therefore, we performed a restricted analysis using only those haplotypes that were observed at a frequency [gt-or-equal, slanted]0.5% (fig. 7). The common clustering of most of the European haplotypes, as well as their connection by single mutation steps, reflects their close relationship. A similarly tight clustering was observed for common haplotypes observed among the INU, among Asians, and among New Guineans.

The present study demonstrates that, on their own, Y STRs are a powerful tool for the study of human evolutionary processes. A similar conclusion was reached recently by Forster et al. (2000), on the basis of a phylogenetic approach only. The use of Y STRs allows the simple construction of highly variable haplotypes. With these haplotypes, it is possible to analyze differences in population structure by a comparison of haplotype diversity and of the number of population-specific haplotypes. The use of Y STRs also allows the simultaneous analysis of closely related and distantly related populations, by haplotype-sharing analyses, AMOVA, and phylogenetic analysis. Because of their relatively high mutation rate, Y STRs are polymorphic in potentially all human populations and allow human migration processes to be traced on a historical timescale. Consequently, the number of Y-STR loci needed to obtain a reasonable amount of information can be substantially smaller than what is required when Y SNPs are used. On the other hand, because Y SNPs have an ~100,000×-lower mutation rate, they are ideal for the study of human migration at an evolutionary—rather than a historical—timescale (Rosser et al. 2000; Semino et al. 2000; Underhill et al. 2000). As has been suggested elsewhere (de Knijff et al. 1997, 2000a), it will be the dual approach—that using Y STRs as well as Y SNPs—that renders the maximum amount of information. This was recently demonstrated in reports on the colonization of the Pacific (Hurles et al. 1998; Kayser et al. 2000a, 2001) and of Iceland (Helgason et al. 2000). In the end, which type of Y-chromosomal polymorphism one chooses will depend on the timescale that one wishes to cover, a luxury that we only dreamed about only 5 years ago.


We would like to thank the original donors of the DNA samples, as well as Chris Tyler-Smith, Wulf Schiefenhövel, Erika Hagelberg, Gabriele Herzog-Schröder, and the late P. Mheera Khan for providing samples. L. L. Cavalli-Sforza kindly allowed one of us (L.F.B.) to use the CAP samples. Heike Zimdahl and Arpita Pandya are acknowledged for the DNA extractions, and Peter Henneman and Rene Mieremet are acknowledged for their expert technical assistance. L.E. was supported by Swiss National Science Foundation grant 31-56755.99. Finally, we want two acknowledge the constructive remarks of two anonymous reviewers of a previous version of the manuscript.

Appendix A

See Table A1

Table A1
Allele Frequencies of Seven Y-STR Loci, among Male Individuals from 20 Globally Dispersed Populations

Appendix B

See Table B1

Table B1
Absolute Frequencies of All Seven-Locus Y-STR Haplotypes, among Male Individuals from 20 Globally Dispersed Populations

Electronic-Database Information

The URLs for data in this article are as follows:

ARLEQUIN: A Software For Population Genetic Data Analysis, http://anthropologie.unige.ch/arlequin
Rod Page's Home Page, http://taxonomy.zoology.gla.ac.uk/rod/rod.html (for TREEVIEW 1.6.1)
Forensic Laboratory for DNA Research, http://www.medfac.leidenuniv.nl/fldo (for marker information)
Y-STR Haplotype Reference Database, http://ystr.charite.de/index_gr.html (for marker information)


Armour JA, Anttinen T, May CA, Vega EE, Sajantila A, Kidd JR, Kidd KK, Bertranpetit J, Pääbo S, Jeffreys AJ (1996) Minisatellite diversity supports a recent African origin for modern humans. Nat Genet 13:154–160 [PubMed]
Ayub Q, Mohyuddin A, Qamar R, Mazhar K, Zerjal T, Mehdi SQ, Tyler-Smith C (2000) Identification and characterisation of novel human Y-chromosomal microsatellites from sequence database information. Nucleic Acids Res 28:e8 [PMC free article] [PubMed]
Bandelt H-J, Forster P, Sykes BC, Richards MB (1995) Mitochondrial portraits of human populations using median networks. Genetics 141:743–753 [PMC free article] [PubMed]
Bianchi NO, Catanesi CI, Bailliet G, Martinez-Marignac VL, Bravi CM, Vidal-Rioja LB, Herrera RJ, López-Camelo JS (1998) Characterization of ancestral and derived Y-chromosome haplotypes of New World native populations. Am J Hum Genet 63:1862–1871 [PMC free article] [PubMed]
Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, Cavalli-Sforza LL (1994) High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368:455–457 [PubMed]
Bruford MW, Wayne RK (1993) Microsatellites and their application to population genetic studies. Curr Opin Genet Dev 3:939–943 [PubMed]
Cann RL, Stoneking M, Wilson AC (1987) Mitochondrial DNA and human evolution. Nature 325:31–36 [PubMed]
Carvalho-Silva DR, Santos FR, Rocha J, Pena SDJ (2001) The phylogeography of Brazilian Y-chromosome lineages. Am J Hum Genet 68:281–286 [PMC free article] [PubMed]
Chen H, Lowther W, Avramopoulos D, Antonarakis SE (1994) Homologous loci DXYS156 X and DXYS156Y contain a polymorphic pentanucleotide repeat (TAAAA)n and map to human X and Y chromosomes. Hum Mutat 4:208–211 [PubMed]
Deka R, Jin L, Shriver MD, Yu LM, Saha N, Barrantes R, Chakraborty R, Ferrell RE (1996) Dispersion of human Y chromosome haplotypes based on five microsatellites in global populations. Genome Res 6:1177–1184 [PubMed]
de Knijff P (2000a) Messages through bottlenecks: on the combined use of slow and fast evolving polymorphic markers on the human Y chromosome. Am J Hum Genet 67:1055–1061 [PMC free article] [PubMed]
——— (2000b) Y chromosomes shared by descent or by state. In: Renfrew C, Boyle K (eds) Archaeogenetics: DNA and the population prehistory of Europe. The McDonald Institute, Cambridge, pp 301–304
de Knijff P, Kayser M, Caglià A, Corach D, Fretwell N, Gehrig C, Graziosi G, et al (1997) Chromosome Y microsatellites: population genetic and evolutionary aspects. Int J Legal Med 110:134–140 [PubMed]
DNA Commission of the ISFH (1995) DNA recommendations—1994 report concerning further recommendations of the DNA Commission of the ISFH regarding PCR-based polymorphisms in STR (short tandem repeats) systems. Vox Sang 69:70–71 [PubMed]
Excoffier L, Smouse PE (1994) Using allele frequencies and geographic subdivision to reconstruct gene trees within a species: molecular variance parsimony. Genetics 136:343–359 [PMC free article] [PubMed]
Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance inferred from metric distances among mtDNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131:479–491 [PMC free article] [PubMed]
Flint J, Bond J, Rees DC, Boyce AJ, Roberts-Thomson JM, Excoffier L, Clegg JB, Beaumont MA, Nichols RA, Harding RM (1999) Minisatellite mutational processes reduce Fst estimates. Hum Genet 105:567–576 [PubMed]
Forster P, Röhl A, Lünnemann P, Brinkmann C, Zerjal T, Tyler-Smith C, Brinkmann B (2000) A short tandem repeat-based phylogeny for the human Y chromosome. Am J Hum Genet 67:182–196 [PMC free article] [PubMed]
Goldstein DB, Ruiz-Linares AR, Cavalli-Sforza LL and Feldman MW (1995a) An evaluation of genetic distances for use with microsatellite loci. Genetics 139:463–471 [PMC free article] [PubMed]
——— (1995b) Genetic absolute dating based on microsatellites and the origin of modern humans. Proc Natl Acad Sci USA 92:6723–6727 [PMC free article] [PubMed]
Hammer MF (1994) A recent insertion of an ALU element on the Y chromosome is a useful marker for human population studies. Mol Biol Evol 11:749–761 [PubMed]
——— (1995) A recent common ancestry for human Y chromosome. Nature 378:376–378 [PubMed]
Hammond HA, Jin L, Zhong Y, Caskey CT, Chakraborty R (1994) Evaluation of 13 short tandem repeat loci for use in personal identification applications. Am J Hum Genet 55:175–189 [PMC free article] [PubMed]
Helgason A, Sigurðardóttir S, Nicholson J, Sykes B, Hill EW, Bradley DG, Bosnes V, Gulcher JR, Ward R, Stefánsson K (2000) Estimating Scandinavian and Gaelic ancestry in the male settlers of Iceland. Am J Hum Genet 67:697–717 [PMC free article] [PubMed]
Heyer E, Puymirat J, Dieltjes P, Bakker E, de Knijff P (1997) Estimating Y chromosome specific microsatellite mutation frequencies using deep rooting pedigrees. Hum Mol Genet 6:799–803 [PubMed]
Honda K, Roewer L, de Knijff P (1999) Male DNA typing from 25-year-old vaginal swabs using Y chromosomal STR polymorphisms in a retrial request case. J Forensic Sci 44:868–872 [PubMed]
Hurles ME, Irven C, Nicholson J, Taylor PG, Santos FR, Loughlin J, Jobling MA, Sykes BC (1998) European Y-chromosomal lineages in Polynesians: a contrast to the population structure revealed by mtDNA. Am J Hum Genet 63:1793–1806 [PMC free article] [PubMed]
Jobling MA (1994) A survey of long-range DNA polymorphisms on the human Y chromosome. Hum Mol Genet 3:107–114 [PubMed]
Jobling MA, Tyler-Smith C (1995) Fathers and sons: the Y chromosome and human evolution. Trends Genet 11:449–456 [PubMed]
Jorde LB, Watkins WS, Bamshad MJ, Dixon ME, Ricker CE, Seielstad MT, Batzer MA (2000) The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data. Am J Hum Genet 66:979–988 [PMC free article] [PubMed]
Kayser M, Brauer S, Weiss G, Schiefenhövel W, Underhill PA, Stoneking M (2001) Independent histories of Y chromosomes from Melanesia and Australia. Am J Hum Genet 68:173–190 [PMC free article] [PubMed]
Kayser M, Brauer S, Weiss G, Underhill P, Roewer L, Schiefenhövel W, Stoneking M (2000a) Melanesian origin of Polynesian Y chromosomes. Curr Biol 10:1237–1246 [PubMed]
Kayser M, Caglià A, Corach D, Fretwell N, Gehrig C, Graziosi G, Heidorn F, et al (1997) Evaluation of Y chromosomal STRs: a multicenter study. Int J Legal Med 110:125–133 [PubMed]
Kayser M, Roewer L, Hedman M, Henke L, Henke J, Brauer S, Krüger C, Krawczak M, Nagy M, Dobosz T, Szibor R, de Knijff P, Stoneking M, Sajantila A (2000b) Characteristics and frequency of germline mutations at microsatellite loci from the human Y chromosome, as revealed by direct observation in father/son pairs. Am J Hum Genet 66:1580–1588 [PMC free article] [PubMed]
Kruskal JB (1964) Nonmetric multidimensional-scaling—a numerical-method. Psychometrika 29:115–129
Lahermo P, Savontaus M-L, Sistonen P, Béres J, de Knijff P, Aula P, Sajantila A (1999) Y chromosomal polymorphisms reveal founding lineages in the Finns and the Saami. Eur J Hum Genet 7:447–458 [PubMed]
Mathias N, Bayès M, Tyler-Smith C (1994) Highly informative compound haplotypes for the human Y chromosome. Hum Mol Genet 3:115–123 [PubMed]
Melton T, Peterson R, Redd AJ, Saha N, Sofro ASM, Martinson J, Stoneking M (1995) Polynesian genetic affinities with Southeast Asian populations as identified by mtDNA analysis. Am J Hum Genet 57:403–414 [PMC free article] [PubMed]
Mountain JL, Cavalli-Sforza LL (1997) Multilocus genotypes, a tree of individuals, and human evolutionary history. Am J Hum Genet 61:705–718 [PMC free article] [PubMed]
Nakahori Y, Tamura T, Yamada M, Nakagome Y (1989) Two 47z[DXYS5] RFLPs on the X and the Y chromosome. Nucleic Acids Res 17:2152 [PMC free article] [PubMed]
Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York
Perez-Lezaun A, Calafell F, Seielstad M, Mateu E, Comas D, Bosch E, Bertranpetit J (1997) Population genetics of Y-chromosome short tandem repeats in humans. J Mol Evol 45:265–270 [PubMed]
Poloni ES, Semino O, Passarino G, Santachiara-Benerecetti AS, Dupanloup I, Langaney A, Excoffier L (1997) Human genetic affinities for Y-chromosome p49a,f/TaqI haplotypes show strong correspondence with linguistics. Am J Hum Genet 61:1015–1035 [PMC free article] [PubMed]
Prinz M, Boll K, Baum H, Shaler B (1997) Multiplexing of Y chromosome specific STRs and performance for mixed samples. Forensic Sci Int 85:209–218 [PubMed]
Pult I, Sajantila A, Simanainen J, Georgiev O, Schaffner W, Pääbo S (1994) Mitochondrial DNA sequences from Switzerland reveal striking homogeneity of European populations. Biol Chem Hoppe Seyler 375:837–840 [PubMed]
Raymond M, Rousset F (1995) GENEPOP (version-1.2)—population-genetics software for exact tests and ecumenicism. J Hered 86:248–249
Roewer L, Arnemann J, Spurr NK, Grzeschik K-H, Epplen JT (1992) Simple repeat sequences on the human Y chromosome are equally polymorphic as their autosomal counterparts. Hum Genet 89:389–394 [PubMed]
Roewer L, Kayser M, Dieltjes P, Nagy M, Bakker E, Krawczak M, de Knijff P (1996) Analysis of molecular variance (AMOVA) of Y-chromosome specific microsatellites in two closely related human populations. Hum Mol Genet 5:1029–1033 [PubMed]
Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D, Amorim A, Amos W, et al (2000) Y-chromosomal diversity in Europe is clinal and influenced primarily by geography, rather than by language. Am J Hum Genet 67:1526–1543 [PMC free article] [PubMed]
Ruhlen M (1994) The origin of language: tracing the evolution of the mother tongue. John Wiley & Sons, New York
Sala A, Penacino G, Corach D (1997) VNTR polymorphism in the Buenos Aires, Argentina, metropolitan population. Hum Biol 69:777–783 [PubMed]
Seielstad MT, Hebert JM, Lin AA, Underhill PA, Ibrahim M, Vollrath D, Cavalli-Sforza LL (1994) Construction of human Y-chromosomal haplotypes using a new polymorphic A to G transition. Hum Mol Genet 3:2159–2161 [PubMed]
Seielstad M, Bekele E, Ibrahim M, Toure A, Traore M (1999) A view of modern human origins from Y chromosome microsatellite variation. Genome Res 9:558–567 [PMC free article] [PubMed]
Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, Beckman LE, De Benedictis G, Francalacci P, Kouvatsi A, Limborska S, Marcikiae M, Mika A, Mika B, Primorac D, Santachiara-Benerecetti AS, Cavalli-Sforza LL, Underhill PA (2000) The genetic legacy of paleolithic homo sapiens sapiens in extant Europeans: a Y chromosome perspective. Science 290:1155–1159 [PubMed]
Slatkin M (1995) A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457–462 [PMC free article] [PubMed]
Stoneking M, Fontius JJ, Clifford SL, Soodyall H, Arcot SS, Saha N, Jenkins T, Tahir MA, Deininger PL, Batzer MA (1997) Alu insertion polymorphisms and human evolution: evidence for a larger population size in Africa. Genome Res 7:1061–1071 [PMC free article] [PubMed]
Underhill PA, Jin L, Lin AA, Mehdi SQ, Jenkins T, Vollrath D, Davis RW, Cavalli-Sforza LL, Oefner PJ (1997) Detection of numerous Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography. Genome Res 7:996–1005 [PMC free article] [PubMed]
Underhill PA, Jin L, Zemans R, Oefner PJ, Cavalli-Sforza LL (1996) A pre-Columbian Y chromosome-specific transition and its implication for human evolutionary history. Proc Natl Acad Sci USA 93:196–200 [PMC free article] [PubMed]
Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman E, Bonne-Tamir B, Bertranpetit J, Francalacci P, Ibrahim M, Jenkins T, Kidd JR, Mehdi SQ, Seielstad MT, Wells RS, Piazza A, Davis RW, Feldman MW, Cavalli-Sforza LL, Oefner PJ (2000) Y chromosome sequence variation and the history of human populations. Nat Genet 26:358–361 [PubMed]
Vigilant L, Stoneking M, Harpending H, Hawkes K, Wilson AC (1991) African populations and the evolution of human mitochondrial DNA. Science 253:1503–1507 [PubMed]
Weissenbach J, Gyapay G, Dib C, Vignal A, Morissette J, Millasseau P, Vaysseix G, Lathrop M (1992) A second-generation linkage map of the human genome. Nature 359:794–801 [PubMed]
White PS, Tatum OL, Deaven LL, Longmire JL (1999) New, male-specific microsatellite markers from the human Y chromosome. Genomics 57:433–437 [PubMed]
Whitfield LS, Sulston JE, Goodfellow PN (1995) Sequence variation of the human Y chromosome. Nature 378:379–380 [PubMed]
Zerjal T, Dashnyam B, Pandya A, Kayser M, Roewer L, Santos FR, Schiefenhövel W, Fretwell N, Jobling MA, Harihara S, Shimizu K, Semjidmaa D, Sajantila A, Salo P, Crawford MH, Ginter EK, Evgrafov OV, Tyler-Smith C (1997) Genetic relationships of Asian and northern Europeans, revealed by Y-chromosomal DNA analysis. Am J Hum Genet 60:1174–1183 [PMC free article] [PubMed]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...