
Genetic Landscape of Eurasia and “Admixture” in Uyghurs
Hui Li
1Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
Kelly Cho
1Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
Judith R. Kidd
1Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
Kenneth K. Kidd
1Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
This document may be redistributed and reused, subject to certain conditions.
Main Text
To the Editor: In the papers1,2 by Xu and Jin, the genetic structure of Uyghurs was described by 8150 ancestry-informative markers (AIMs). These markers estimated the admixture rate of the Uyghur population to be around 50% East Asian ancestry by comparing Uyghurs to East Asians and Europeans. However, we suspect that the estimate of Xu and Jin may be considerably biased by insufficient reference population coverage. In their study, Xu and Jin used the STRUCTURE program3,4 as the major method for estimating the admixture rate; however, results from the STRUCTURE program are strictly the probabilities of assignment to different estimated clusters and therefore are influenced by both the marker selection and the populations used to estimate allele frequencies in those other clusters. The results of Xu and Jin are unlikely to be biased due to marker selection because they used a large number of AIMs selected to distinguish Europeans from East Asians. However, the population coverage in their analysis was relatively sparse. They only analyzed Japanese, northern Chinese, and a very small sample (n = 10) of Mongols for the East Asian reference. For Europeans, they included some southern European populations, Russians, and Adygei. It is doubtful that this sparse population coverage can yield precise results. Although the cluster assignments can be interpreted as admixture under many circumstances, these reference populations may not serve as relevant ancestral populations for an admixture analysis of Uyghurs. We also note that, for populations in the middle of a clinal distribution, STRUCTURE cannot readily distinguish recent admixture from a historical cline established by an expanding population as in the original spread of modern humans out of Africa and across Eurasia.5
In our study, we typed 68 AIMs on 1766 individuals from 34 populations representing all subdivisions of Eurasia (Figure 1 and Table S1 available online; allele frequency data also in ALFRED). All samples were collected with informed consent by participants using protocols approved by all relevant institutional review boards. The SNPs were selected at random from a group of ∼300 markers having a high global FST and already tested on many populations, including some reported here. However, the selection paid no attention to the pattern of variation across Eurasia. This “randomness” is illustrated by the Eurasian FST values for these 68 markers, which range from 0.028 to 0.681 in a highly skewed distribution with a median value of ∼0.15 (Figure S1). Our set of populations contained several population samples for those regions around Uyghurs, such as Kazakhs, Mongols, Tibetan Khams, etc. to the east of Uyghurs and Hazaras, Khanty, etc. to the west. This comprehensive population coverage allowed more reliable estimates of relationships among the populations. We used STRUCTURE 2.2 and SURFER 8.0 programs to illustrate the population similarities (Figure 1). The analyses showed a very clear west-east cline when K = 2. The median border estimated from these 68 SNPs divided Central Asia along the Ob River, the Kazakh highland, the western side of the Pamir Mountains, and the southwestern side of the Himalayas. Although the small number of markers examined cannot result in an accurate median border, the Uyghur population fell east of the median border with 31.2% assignment to the western cluster. Even the Hazaras and Khanty were to the east of the median border. This result matched the results of more anthropological studies on Eurasian population structures with the Hazaras around the median.6 Other anthropological studies also estimated Uyghurs to have ∼30% western proportions,7,8 closer to our estimate.

STRUCTURE Bar Views for K = 2–6 and Contour Plots for K = 2 and K = 6
The contour plots and genetic distance tree are color coded to correspond to the STRUCTURE plots. The significance levels for pairwise comparisons of clusters are given on the K = 6 contour plot (bottom) at the relevant borders.
The difference between the estimate of Xu and Jin (52%) and our estimate (31%) may stem from either the different population coverage or the sample size. We analyzed a different and larger sample of Uyghur individuals (n = 48) than that analyzed by Xu and Jin.2 Their small sample size may have contributed to their overestimation of the European component to admixture (i.e., to cluster assignment). However, the insufficient population coverage may be more responsible for the difference than the sample size or the number of markers. Concerning the number of markers, it is known that a relatively small but specifically selected number of AIMs can accurately predict ethnicity proportion.9 As the two papers of Xu and Jin have demonstrated, the estimated admixture rates reported did not change much regardless of whether they were using chromosome 21 data only or the whole genome, and thus a large number of markers may not be necessary to estimate the “admixture” rate of Uyghurs. When we analyzed only the 12 markers with the highest FST values in our samples (Figure S1), the Uyghurs had a 30.2% assignment at K = 2 to the Europe and Western Asia cluster. This estimate was not significantly different from the above 31.2% when using all 68 markers. We consider it unlikely that a different set of appropriately chosen SNPs would give a markedly different answer based on unpublished data on some of these same populations.
A fundamental problem with this estimate at K = 2 is the high improbability of the Uyghur population being admixed of two widely separated populations such as Europeans and eastern East Asians. Therefore, we also tested for additional subdivisions with STRUCTURE to see which populations were more closely related to Uyghurs. We observed that Central Asian populations including Uyghurs, Kazakhs, and Khanty did not form their own cluster until K = 6, indicating that the Central Asian cluster was not a completely distinct population group. From K = 3 to K = 5, the western part of East Asia (Kham, Baima, Qiang, Mongol) was distinguished from the eastern part of East Asia (Japanese, Korean, Hakka, Minnamese, Cantonese). Examining the “admixture” pattern of Uyghurs, it is clear that the proportion of western East Asia is much higher than that of eastern East Asia, especially when K = 5 (0.418 versus 0.128, respectively). Moreover, the South Asians and West Asians also contributed more than the Europeans to Uyghurs (0.180 versus 0.100, respectively). Including the western East Asians, South Asians, and West Asians when estimating the admixture rate of Uyghurs illustrates the difficulty with the concept of admixture for such intermediate populations—the estimate depends on the populations hypothesized to be admixed in the target population. Considering only the Europeans and the eastern East Asians may seem to indicate equal “contributions” to Uyghurs, which was the case in the study of Xu and Jin.
In addition, our study showed that Central Asia clearly formed a cluster with significant borders when K = 6. The border of the Central Asian cluster went along Lake Baikal, the A-erh-chin Mountains, the Kunlun Mountains, the Hindu Kush, the Caspian Sea, and the Ural Mountains, matching the traditional anthropological definition of Central Asia. We performed pairwise t tests for the six clusters to estimate the significance of the borders (p values are marked on the map at bottom in Figure 1). The only insignificant gap around the Central Asian cluster was in northern Siberia, where people led a nomadic hunting lifestyle. Another insignificant border was the southern border of the western East Asian cluster, where people also led a nomadic hunting lifestyle. That area is believed to have been a migration pathway into East Asia for early modern humans.10,11 Such a nomadic hunting lifestyle might have been the easiest way to blur borders arising among populations through admixture during more recent human history. Though nomadic lifestyles may have blurred distinctions, those nonsignificant cluster comparisons might also simply be due to a lack of power for this specific set of SNPs to distinguish differences. Other borders among all pairs of the six clusters were significantly distinct. It is reasonable that the Caucasus, the Anatolian plateau, and the Himalayas became the borders of the clusters by minimizing gene flow and allowing allele frequency differences to accumulate by drift.
Our analyses significantly divided East Asia into eastern and western parts, agreeing with the hypothesis that early modern humans entered East Asia in the south along two routes, the western route from Myanmar to Yunnan and the eastern route from Vietnam to Guangdong.10,12,13 The descendants of the migrants along these two routes would have accumulated significant genetic differences and subsequently would have had different effects on the gene pool of Uyghurs. Here we have demonstrated that the western East Asians are more closely related to Uyghurs than the eastern East Asians.
To confirm the relative genetic affinity between Uyghurs and the other Eurasian populations, we performed principal component analysis (PCA) with SPSS 13.0. The results of principal components 1 versus 2 are plotted in Figure 2. It is obvious that two clusters exist: a European-West Asian and a far East Asian. The South Asian and Central Asian populations are scattered between the clusters of Europeans and East Asians. Notably, Uyghurs are much closer to the East Asians than to the Europeans. The clinal pattern seen in the PCA is also suggested by the sequential subdividing of the groups of populations in the series of increasing K for the STRUCTURE analyses. The first two PCA components account for just over 80% of the total variance, probably because some of the SNPs have very large allele frequency differences between Europe and far East Asia (Table S1).

Principal Component Plot of Eurasian Populations
Just over 80% of the variance among these populations based on the 68 SNPs in this study can be explained by the first two components.
STRUCTURE cannot distinguish recent admixture from a cline of other origin, and these analyses cannot prove admixture in the Uyghurs; however, historical records indicate that the present Uyghurs were formed by admixture between Tocharians from the west and Orkhon Uyghurs (Wugusi-Huihu, according to present Chinese pronunciation) from the east in the 8th century CE.14 The Uyghur Empire was originally located in Mongolia and conquered the Tocharian tribes in Xinjiang. Tocharians such as Kroran have been shown by archaeological findings to appear phenotypically similar to northern Europeans,15 whereas the Orkhon Uyghur people were clearly Mongolians. The two groups of people subsequently mixed in Xinjiang to become one population, the present Uyghurs. We do not know the genetic constitution of the Tocharians, but if they were similar to western Siberians, such as the Khanty, admixture would already be biased toward similarity with East Asian populations.
In conclusion, we argue that the Uyghurs' genetic structure is more similar to East Asians than to Europeans, in contrast to the reports by Xu and Jin, whose work may have been affected by their sparse population coverage. The median line of the Eurasian genetic landscape appears to lie to the west of the Xinjiang Uyghur Autonomous Region of China. When we have collected more data on these 34 populations, we should be able to refine these estimates.
Supplemental Data
Supplemental Data include one figure and one table and can be found with this article online at http://www.cell.com/AJHG.
Supplemental Data
Acknowledgments
This study was funded in part by National Institutes of Health grant P01 GM057672 (J.R.K. and K.K.K.) and NIJ grants 2004-DN-BX-K025 and 2007-DN-BX-K197 (K.K.K.) awarded by the National Institute of Justice, Office of Justice Programs, United States Department of Justice. Points of view in this document are those of the authors and do not necessarily represent the official position or policies of the Department of Justice. We thank all of the colleagues who helped us assemble the population samples, including the National Laboratory for the Genetics of Israeli Populations at Tel Aviv University. We give special thanks to the many hundreds of individuals from the relevant populations who volunteered to give blood samples for studies such as this.
References
Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics