Y-chromosomal diversity of the Valachs from the Czech Republic: model for isolated population in Central Europe

Aim To evaluate Y-chromosomal diversity of the Moravian Valachs of the Czech Republic and compare them with a Czech population sample and other samples from Central and South-Eastern Europe, and to evaluate the effects of genetic isolation and sampling. Methods The first sample set of the Valachs consisted of 94 unrelated male donors from the Valach region in northeastern Czech Republic border-area. The second sample set of the Valachs consisted of 79 men who originated from 7 paternal lineages defined by surname. No close relatives were sampled. The third sample set consisted of 273 unrelated men from the whole of the Czech Republic and was used for comparison, as well as published data for other 27 populations. The total number of samples was 3244. Y-short tandem repeat (STR) markers were typed by standard methods using PowerPlex® Y System (Promega) and Yfiler® Amplification Kit (Applied Biosystems) kits. Y-chromosomal haplogroups were estimated from the haplotype information. Haplotype diversity and other intra- and inter-population statistics were computed. Results The Moravian Valachs showed a lower genetic variability of Y-STR markers than other Central European populations, resembling more to the isolated Balkan populations (Aromuns, Csango, Bulgarian, and Macedonian Roma) than the surrounding populations (Czechs, Slovaks, Poles, Saxons). We illustrated the effect of sampling on Valach paternal lineages, which includes reduction of discrimination capacity and variability inside Y-chromosomal haplogroups. Valach modal haplotype belongs to R1a haplogroup and it was not detected in the Czech population. Conclusion The Moravian Valachs display strong substructure and isolation in their Y chromosomal markers. They represent a unique Central European population model for population genetics.

Y-chromosomal variation of Central European populations and the possible appearance of genetic isolates in these populations are of increasing interest to forensic and human population geneticists.
Y-chromosomal data for the population of the Czech Republic is still fractional. Kráčmarová et al published a short report on paleolithic and neolithic Y chromosomal haplogroups in the Czech population (1) and Luca et al performed a refined study of the same data (2). Zastera et al published a major study on Czech Y-chromosomal data (3). Other authors have also reported on Czech Y-chromosomal variation, usually with other population data from Europe (4)(5)(6)(7). A recent study compared Czechs with other West Slavic populations (8). In this range of reports regarding, genetic variation of possible or confirmed genetic isolates within Central European populations is virtually absent. Here we present the intra-population diversity of such an isolated population, the Moravian Valachs.
So far, a limited number of studies that illustrate the variety of Y-chromosomal polymorphisms in the countries and populations supposedly connected or similar to the Moravian Valachs -the supposed isolate -have been published. Rebala et al (9) focused on the Slavic population from Eastern and Central Europe. As historical sources suggest, immigration from Slavic populations was one of the major sources for the emergence of the Valach population of the Czech Republic, therefore the study of Rebala et al (9) is certainly of great interest to us, as well as other studies on southern European Slavic populations (10). Bosch et al (11) analyzed paternal (and maternal) lineages of the Aromuns and other surrounding Balkan populations, thus offering excellent material for their comparison with the Valachs. They clearly documented the differences between Aromuns (ie, isolated populations) and the major populations that surround them, not only in haplogroup and haplotype lineages, but also in intra-population genetic variability.
The Valachs (or Wallachs/Vlachs as they are sometimes called) are one of the most distinct ethnographic and cultural subpopulations of Central Europe. Today, they can be found not only in the Czech Republic -in its eastern border mountain ranges and highlands (Beskydy in Moravia) -but also in south-southeast Poland and several parts of Slovakia (far western, northern, and central region). Originally, this group spread from the Maramures region of Romania, roughly following the Carpathian Mountain range. The arrival of the Valachs to the area of today's Czech Republic took place at the very end of the 15th or beginning of the 16th century (12). The migration was not spontaneous, but rather encouraged and subsidized by the local nobility, and it lasted at least until the end of the 18th century, with immigrants supposedly coming not only from Romania, but also from Ukraine, Poland, and Slovakia (13).
Until the beginning of the 20th century, the Moravian Valachs' way of life was similar to other Romanian ethnic groups in the Balkans, especially the Aromuns (seasonal mountain sheep herding, production of cheese, wool, and leather products). An admixture of the newly-arrived Valachs with autochthonous (Slavic and German) Moravian population also began soon after the arrival of the first immigrants -so we can assume a steady genetic and cultural flow between these two populations. Nonetheless, the core of the Valach settlement was located in a previously uninhabited high altitude region, neighboring with the indigenous population from lowlands. The result of the admixture process was a complete merging of both populations, and the disappearance of any distinction between "new" Valachs and indigenous Moravians during the 18th century, and the creation of one ethnogeographic region with all its properties and people -the Moravian Valachs.
Demographic data (13,14) show only a small increase in the Valach population during the 17-18th century. In combination with population depression during and after the Thirty Years' War (1618-1648), the conditions in the Valach population favored inbreeding, an effect reinforced by isolation-by-distance from the surrounding populations.
To investigate how severe this isolation effect was on Y chromosomal polymorphisms in the Moravian Valachs and whether it is still detectable in modern Valach population is the main aim of our study. Another topic of interest was how the intra-population variability and the sampling bias can affect forensic and population analyses performed on these data.

MatErial anD MEthoDS
Hundred and seventy-three DNA samples of male Valachs from the Czech Republic were analyzed. These samples were divided into two groups because of the important differences in sampling procedure and are consistently referred to separately throughout this article.
The first group consisted of 94 samples of unrelated donors (code: VALACH, Moravian Valachs). All donors identified themselves as belonging to the Valach ethnic group in a short interview that was held immediately before DNA sampling in the form of mouth swabs. Only donors whose paternal lineage was present in the region of the Valach country for at 3 generations were included into the study. Informed consent was provided by the donors and no other data (including name, address, etc.) were gathered. The data were rendered fully anonymous.
The second Valach sample set consisted of 79 samples (code: VLIN, Moravian Valachs lineages). The sampling process in this case differed significantly from the VA-LACH sample set. VLIN sample set came from 7 Valach paternal lineages. These were defined primarily by surname, as well as by geographic localization in the Valach region and self-identification of the donors. Although the samples came from broad families, no first, second, third, and fourth degree relatives were included in the study, virtually making this Valach sample set composed of unrelated, non-randomly selected Valachs, carrying 7 different surnames.
The control sample set consisted of 273 unrelated male donors from the whole Czech Republic. Donors did not identify themselves as having the Valach origin, however, there was no other information gathered about their ethnicity or origin. Data are available on request and will be submitted to the Y-chromosome haplotype reference database (http://www.yhrd.org/) database.
We gathered published samples for Y-short tandem repeat (STR) loci from other populations, concentrating on Eastern European and Balkan populations. Our total set, Moravian Valachs included, consisted of 30 populations encompassing 3244 individuals (Table 1). Due to the limitation of the published data, only minimal haplotype loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, and DYS385a/b) were used for the analysis of intra-population statistics computing and the comparison between populations. For detailed analysis, 12 loci haplotypes were utilized, which also included all extended haplotype loci (minimal haplotype loci + DYS437, DYS438, DYS439). Using the Y-STR information, we estimated also the Y chromosomal haplogroups in our samples by the free internet software tool 'Haplogroup Predictor' by Whit Athey (http:// www.hprg.com/hapest5/) (15,16). We were aware of the issues present in estimating Y-chromosomal haplogroups from Y-STR frequencies (17), thus for the subsequent analy- ses (median networks) we used only the samples with Hg estimate probability higher than 90%.
The multidimensional scaling analysis was performed in Statistica 9.0 software (StatSoft Inc., Tulsa, OK, USA).

rESultS
The Moravian Valachs of the Czech Republic showed remarkably low values of intra-population genetic diversity. This low differentiation, as compared with other populations in our study, is shown by the haplotype (gene) diversity values ± standard deviation (0.9792 ± 0.0075), the aver- age gene diversity per locus (0.476607 ± 0.268081), and the mean number of pairwise differences (3.812857 ± 1.936064) ( Table 2). This is especially true if we compare the Valachs to population samples from adjacent populations, ie, Czechs, Slovaks, Saxons, and both Polish samples. In this comparison, the Valachs' Y-chromosomal variability was lower, and their haplotypes were more similar to each other. Our second Valach data set of Valach lineages (VLIN) showed even more extreme values of haplotype diversity -the 5th lowest value from our population set (0.9335 ± 00192) -which can be expected given that they come from paternal lineages. The average diversity per locus of the VLIN data set (0.573799 ± 0.315593) and the mean number of pairwise differences (4.590393 ± 2.278308) were still lower than the cross-population average, but not as extreme as their haplotype diversity.
Distribution of Y chromosomal haplogroups in the VA-LACH, VLIN, and CZE populations was not uniform ( Figure  2). While our Czech population sample well reflected the Central European Y haplogroup pool, the Valach sample set showed some deviation from the expected frequencies of the Y-haplogroups. This was especially noticeable in the VLIN sample set, with the overrepresentation of haplogroup I2a and N, each of them being a dominant haplogroup in one of the 7 paternal lineages sampled.
Variation within selected haplogroups is displayed in Figure  3. Haplogroups R1a, N, I2a, and E1b1 were chosen because of their major representation in different paternal lineages of VLIN sample set. Marked isolation of Valach haplotypes within the haplogroups can be seen in R1a, I2a and E1b1b networks. This reflects the substructure of the examined populations. The effect of sampling can be seen when we compare the distribution of the VLIN and VALACH haplotypes ( Figure 3). Closely related paternal lineages of the VLIN sample set demonstrate as clustered, low diversity, branches within the networks. Unrelated VALACH samples, while they still form almost exclusive Valach branches of the network, are separated by more mutation steps and exhibit higher diversity. VLIN in N and I2a come each from a different surname-defined paternal lineage. VLIN samples belonging to E1b1b originated in two paternal lineages. The correspondent clusters are clearly seen in Figure 3D. VLIN samples in R1a are from 3 paternal lineages with an intermingled substructure.
R1a is the most prevalent and the most diverse haplogroup in the Valach and Czech population. Therefore, we selected this haplogroup for further analysis and identification of its modal haplotype.

DiSCuSSion
We found the traces of isolation and substructure in the Moravian Valachs' Y-chromosomal genetic variation. Studies with well defined Y-chromosomal data for Central Europe are scarce. Previously mentioned studies of Czech population Y-STR variability reported no inner differentiation of the population (1,3). A substructure of Y-chromosomal lineages was reported in the Brabant region of Belgium and the Netherlands (25). Also, a strong Y-chromosomal breakpoint in Romanian population, based on ethnic origin, was demonstrated (26). Petrejčíková et al (27) have analyzed men from Eastern Slovakia and found nonsignificant separation from the surrounding Slavic populations. In Belarus population, a limited population substructure was observed (28), although only detectable when using 7-12 Y-STR haplotypes. With 17 Y-STR haplotypes (Yfiler loci), the substructure was no longer detectable. The population of Moravian Valachs analyzed in our study displayed signs of isolation and substructure, which are noticeable in 9, 12, and 17 Y-STR haplotypes. The isolation of the Valach population, low effective population size and, thus, the faster operation of genetic drift are expressed in low haplotype diversity in the Valachs. These effects of isolation are also evident in the average diversity values over   Forensic analysis of Y-chromosome loci requires the highest possible haplotype diversity. Some studies have shown (34) that commercially available kits like PowerPlex Y or Yfiler cannot provide a sufficient discrimination power to discern the haplotypes inside broad families, and a definition of a region-specific set of Y-STR loci is a must. The results of haplotype diversity of the population samples tested within this study (VALACH, VLIN, and CZE) revealed that the sampling in one close geographic region or inside a group of people that recognize themselves as members of a certain "clan" brings lower diversity. However, this phenomenon enables us to find a founding haplotype for that group. The modal haplotype within VALACH population sample belonging to the haplogroup R1a can be of high forensic importance as it defines a relatively large group of individuals that can be identified through Y-chromosome STR analysis of modality. Defining such modal haplotype is only possible in the case of detailed knowledge of the genetic (sub)structure of the population in question.
Besides the genetic or forensic aspects, the ethical aspects of data gathering and presentation should be predominant, especially if we want to investigate not-so-distant paternal lineages. Moreover, genealogical, linguistic, and historical information is also of foremost interest to the researcher. The optimal number of Y-chromosomal STR polymorphic loci to be used in such a study differs vastly according to the objectives. We confirmed that 17 loci of Yfiler kit could be insufficient for forensic application if we were to analyze a cluster of related paternal lineages. On the other hand, for population genetic applications, a set of a few precisely defined core Y-STR could very well describe the population and could be used for interpopulation comparison. Properties of the sample set under examination are also strongly influenced by the sampling procedure, which is not always known or properly administered. We demonstrated this sampling effect on the differences between our Moravian Valach sample sets (VALACH vs VLIN).
Our analyses confirmed that the Moravian Valachs represent a unique population data set from the Czech Republic and the whole region of Central Europe due to their ethnographic coherence and isolation that is clearly detectable in their Y-chromosomal diversity. The VMH can be used for further studies on Valach migration or in the evaluation of forensic analysis results.  Ethical approval No ethical approval was demanded by our institutions (Faculty of Science, Charles University in Prague; Institute of Criminalistics, Prague) or by the financing bodies (Charles University in Prague; Ministry of Education, Youth, and Sports of the Czech Republic; Ministry of the Interior of the Czech Republic). Nevertheless, informed consent was gathered from the donors and we otherwise followed our institutional guidelines regarding working with the human samples.
Declaration of authorship EE provided a part of the data, performed the analysis, and wrote the majority of the manuscript. DV performed the research and wrote the manuscript. VS provided part of the data and participated in the analysis and proof reading of the manuscript. VV was a member of the research team -common grant project with the first author, and participated in the proof-reading of the manuscript.