Mitochondrial and Y-chromosomal profile of the Kazakh population from East Kazakhstan

Aim To study the genetic relationship of Kazakhs from East Kazakhstan to other Eurasian populations by examining paternal and maternal DNA lineages. Methods Whole blood samples were collected in 2010 from 160 unrelated healthy Kazakhs residing in East Kazakhstan. Genomic DNA was extracted with Wizard® genomic DNA Purification Kit. Nucleotide sequence of hypervariable segment I of mitochondrial DNA (mtDNA) was determined and analyzed. Seventeen Y-short tandem repeat (STR) loci were studied in 67 samples with the AmpFiSTR Y-filer PCR Amplification Kit. In addition, mtDNA data for 2701 individuals and Y-STR data for 677 individuals were retrieved from the literature for comparison. Results There was a high degree of genetic differentiation on the level of mitochondrial DNA. The majority of maternal lineages belonged to haplogroups common in Central Asia. In contrast, Y-STR data showed very low genetic diversity, with the relative frequency of the predominant haplotype of 0.612. Conclusion The results revealed different migration patterns in the population sample, showing there had been more migration among women. mtDNA genetic diversity in this population was equivalent to that in other Central Asian populations. Genetic evidence suggests the existence of a single paternal founder lineage in the population of East Kazakhstan, which is consistent with verbal genealogical data of the local tribes.

In terms of population genetics, Central Asia is one of the least studied regions in the world.The studies conducted in the region, based on scarce genetic data, indicate that the Central Asia population is a mix of Eastern and Western populations (1,2).Kazakhstan is a vast country, which has throughout history been inhabited by different nomadic tribes such as the Argyn, Dughlat, Jalayir, Kerei, Kipchak, Madjar, Naiman, and others (3).The Kazakh ethnic group was formed in the 15th century under a huge infulence of the Mongol Empire (4).We expected the genetic profile of Kazakhs to be heterogeneous because of the different tribes and ethnicities (5).
The current study focused on the Kazakh population of the East Kazakhstan Province, because recently there have been many reports on the neighboring populations of Xinjiang Uyghur Autonomous Region and Altai regions.East Kazakhstan is populated by the Naiman tribe.Their genealogical narrative, "shezhire, " states that the Naiman people living in Tarbagatay region are descendants of one ancestor, named Toktar-kozha, who came from the territory of modern Uzbekistan and was a Sart by origin.Based on the data from "shezhire," we formed a hypothesis of uniform paternal descent of the Naiman tribe.The aim of this study was to better understand the origins and differentiation of the Kazakh ethnic group and to investigate the genetic relationship between this population and other Eurasian populations.

Participants and reference data
A total of 160 blood samples were collected from healthy adult individuals, 67 men and 93 women, during an expedition to Tarbagatay region, East Kazakhstan in 2010.Prior to the expedition, ethical approval was received from the Ethics Committee of the National Center for Biotechnology of the Republic of Kazakhstan (No.10, 14.02.2010).The Ethics Committee approved the informed consent form and questionnaire form designed specifically for the study.
The ethnic origin of sampled individuals was ascertained up to three generations.Blood was taken with the informed consent signed by all donors.In addition, all participants completed the questionnaire that included information on the geographic origin, nationality, maternal and paternal pedigree, and health issues.Related individuals were not included into sampling.In addition, mtDNA haplogroup data for 2701 individuals and Y-short tandem repeat (STR) haplotype data for 677 individuals of different ethnic backgrounds were retrieved from the literature to establish the genetic relationship of Kazakhs from East Kazakhstan with other populations (Supplementary Table 1 and Supplementary Table 2) (Figure 1).

Mitochondrial dNA analysis
Blood collection was performed with EDTA-containing evacuated blood collection tubes (Terumo®, Leuven, Belgium) and blood collecting needles (Terumo®).Blood samples were stored in portable refrigerators until DNA was extracted.DNA extraction and purification was performed with Wizard® genomic DNA Purification Kit (Promega, Madison, WI, USA), according to the manufacturer's instructions.
Polymerase chain reaction (PCR) amplification.Mitochondrial DNA polymorphisms were typed for 160 individuals by using conventional PCR, with primers specific to hypervariable segment I (HVS-I) (20).PCR reactions were carried out in 25-μL reaction volume, containing 3.2 pmol of each primer, 10 ng of genomic DNA, 0.2 units of Taq pol enzyme (Lytech, Moscow, Russia), 200 μM of each dNTP, 1 × PCR buffer, and 2.5 mM MgCl 2 (Lytech).The PCR protocol was: 94°C for 10 minutes, followed by 35 cycles of denaturation at 94°C for one minute, annealing at 55°C for one minute, extension at 72°C for one minute, and a final extension step at 72°C for 7 minutes.PCR products were initially checked in 1.5% agarose gel and sequenced on ABI 3730xl DNA analyzer (Applied Biosystems/Hitachi, Tokyo, Japan).
Sequence alignment and haplogroup analysis.The obtained sequences were compared with the revised Cambridge Reference Sequence (rCRS, NC_012920) to find polymorphisms by using SeqScape v 2.6 software (Applied Biosystems, Foster City, CA, USA).Identified HVS-I polymorphisms were used for assignment of mtDNA haplogroups.We also included published data on other Eurasian populations for comparative analysis (Supplementary Table 1).Data analysis.Mitochondrial DNA haplogroup frequencies and haplogroup diversity values were calculated as described elsewhere (21).The haplogroups were combined into 15 groups to compare East Kazakhstan data with the published data sets.In case of American populations, only A, B, C, and D mtDNA haplogroup frequency data were used for comparison (Supplementary Table 3).Population pairwise genetic distances were calculated from haplogroup frequencies of the Tarbagatay population and 32 other Eurasian populations with Statistica v.10 software (StatSoft, Tulsa, OK, USA).
Data analyses.Fragment sizes were determined using the GeneScan 3.1.2software (Applied Biosystems) and allele designations were performed using the Genotyper 2.5.2 software (Applied Biosystems).We also included published data on Altaian Kazakhs, Kalmyks, Kazakhs, Kyrgyzs, Mongolians, Uighurs, and Uzbeks (Supplementary Table 2).Haplotype diversity was calculated as described in the literature (22).Discrimination capacity was calculated as D = N diff /N, where N diff is the number of different haplotypes of the population.Basic parameters of molecular diversity and population genetic structure, including Slatkin's Rst matrices for pairwise genetic distances were calculated using the software package Arlequin 3.5.1.2(University of Bern, Bern, Switzerland).The statistical significance (P-values) was estimated by permutation analysis, using 10100 permutations.The STATISTICA package (StatSost, Tulsa, OK, USA) was used for multidimensional scaling (MDS) analysis.Y predictor by Vadim Urasin, v.1.5.0 was used for Y-STR haplogroup prediction (http://predictor.ydna.ru/).

RESulTS
The study of mtDNA haplogroup variation in the population of East Kazakhstan region revealed that the majority of maternal lineages were distributed among haplogroups that were common in Central Asian region (Table 1).
Comparison of 33 Eurasian populations based on genetic distances was performed and the results are presented as a multi-dimensional scaling plot (Figure 2).The populations genetically closest to the Tarbagatay population were Mongolians from Mongolia (MN), Altaian Kazakhs from South Siberia, Russia (AK), and Kazakhs from Xinjiang, China (KZ2).
In fact, all of them had equal haplogroups A, B, C, D, F, G, H, and M, suggesting that these lineages were in the common maternal gene pool from which these dif-ferent lineages had emerged.However, there were some notable differences between them.For example, Kazakhs from Xinjiang had higher frequencies of West Eurasian lineages H, T, U, and Z than Mongolians, Altaian Kazakhs, or Kazakhs from East Kazakhstan.Kazakhs from East Kazakhstan had greater diversity of haplogroups present in their mtDNA gene pool than the groups from South Siberia, Xinjiang, or Mongolia.Since recent data showed a common ancestry of indigenous Altaians with Native Americans (23), we compared East Kazakhstan population with the Native Amerindians.A genetic-distance analysis indicated divergence from the Native American populations (Supplementary Table 3).
As a result of this reduction, 26 Altaian Kazakh haplotypes collapsed into 22 unique haplotypes, and the number of shared haplotypes increased.
Rst values showed that population of East Kazakhstan remained distinctive even with a reduced number of Y-STRs.

diSCuSSioN
This study found a high degree of genetic differentiation on the level of mitochondrial DNA, but very low genetic diversity of Y-STR data.
Genetically close subpopulations of Kazakhs from Altai, Kazakhstan, and Xinjiang showed a similar mtDNA composition consisting of mainly East Eurasian haplogroups.
Affinities among these populations may result from their common origin or a recent admixture resulting from geographic proximity.Genetic distances between populations can be related to geographic distances, according to a model of isolation by distance (22).Comparably high frequencies of European lineages are consistent with the intermediate position of the East Kazakhstan region in Eurasia but some inconsistent features were also present in the distribution of frequencies in mtDNA lineages.www.cmj.hr1880, 70% of them belonging to the genus Naiman and 20% to the genus Kerey.
Our results indicate that the studied subpopulation of Kazakhs has low paternal genetic diversity, and share the common paternal ancestry.The existence of a common ancestor is supported by the predominance of a single haplotype in the population.It is also supported by "shezhire, " which states that the Naiman tribe has a common single ancestor from Uzbekistan.These verbal genealogical data are consistent with the results of the MDS plot.An alternative explanation for our results is provided by historian Sultanov (25), who suggested that 240-360 thousand of nomads, including Naimans, migrated to the territory modern Uzbekistan in the beginning of the 16th century.
Further population studies are required to compare the genetic profiles of Naimans from Central and South-Eastern regions of Kazakhstan.In addition, the Naimans also inhabit the territory of Mongolia (Bayan-Ölgii province), Russia (Altai Republic), and China (Xinjiang Uygur Autonomous Region).According to unofficial estimates, the population of Naimans in 1917 was over 800 thousand people.Unfortunately, population studies of the Kazakh tribes are currently not being conducted.
In addition, we genotyped 99 male individuals from South Kazakhstan region populated by a different tribe (accession number YA003729, www.yhrd.com).Not a single 17 Y-STR haplotype was found to be common between South and East Kazakhstan region.This implies different paternal origin of the tribes and genetic substructuring among Kazakhs.
The comparison with the central "star cluster" profile, described by Zerjal et al (4), showed that only one haplotype from East Kazakhstan can possibly be assigned to the "genetic legacy of the Mongols." It may be concluded that the influence of Genghis Khan's Y-chromosomal lineage was insignificant in spite of the two centuries long rule of Genghis Khan and his descendants over the Naimans (mid-13th to mid-15th century) (4,(26)(27)(28).
The studied subpopulation represents a genetically isolated group with a single paternal founder lineage different from the "star cluster" lineage.The presented results reveal different migration patterns in East Kazakhstan region, showing more migration among women.Partially, this paternal genetic uniformity could be explained by local traditions, such as exogamy, that were strictly followed in the past and played a crucial role in conservation of the unique genetic properties.The population of Madjar from Torgay area had been following same traditions in the past aiming to avoid inbreeding (3).Nonetheless, mtDNA genetic diversity in this population is equivalent to that in other Central Asian populations.
Our finding on active migration of maternal DNA requires additional explanation and further support from complementary sources, especially since the studied nomadic tribe is scarcely described in historical sources.Apart from additional genotyping of the Kazakh population, further cultural and historical studies are needed to gain more knowledge on the cultural processes that have greatly influenced genetic variability of this and other populations.

FiguRE 3 .
FiguRE 3. Multi-dimensional scaling plot of Rst distances based on Y-short tandem repeats (STR) haplotypes in Eurasian populations.

TAblE 1 .
Mitochondrial dNA haplogroup frequencies in the selected populations of Eurasia* A Funding received from the Ministry of Education and Science of the Republic of Kazakhstan, grant No. 1.04.01.This study was also partially funded by Russian Foundation for Basic Research, grant 12-04-90915.Ethical approval received from the Ethics Committee of the National Center for Biotechnology of the Republic of Kazakhstan, Astana, Kazakhstan (No.10, 14.02.2010).declaration of authorship PVT and EVZ contributed to study design, sample collection, lab work, and manuscript preparation.ARA contributed to sample collection.ZMN contributed to sample collection and lab work.ZMS contributed to manuscript preparation.TKR contributed to sample collection and study design.EMR contributed to study design.Competing interests All authors have completed the Unified CompetingInterest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous 3 years; no other relationships or activities that could appear to have influenced the submitted work.