Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Aug 18, 2009; 106(33): 13903–13908.
Published online Aug 3, 2009. doi:  10.1073/pnas.0902129106
PMCID: PMC2728993

Complex population structure in African village dogs and its implications for inferring dog domestication history


High genetic diversity of East Asian village dogs has recently been used to argue for an East Asian origin of the domestic dog. However, global village dog genetic diversity and the extent to which semiferal village dogs represent distinct, indigenous populations instead of admixtures of various dog breeds has not been quantified. Understanding these issues is critical to properly reconstructing the timing, number, and locations of dog domestication. To address these questions, we sampled 318 village dogs from 7 regions in Egypt, Uganda, and Namibia, measuring genetic diversity >680 bp of the mitochondrial D-loop, 300 SNPs, and 89 microsatellite markers. We also analyzed breed dogs, including putatively African breeds (Afghan hounds, Basenjis, Pharaoh hounds, Rhodesian ridgebacks, and Salukis), Puerto Rican street dogs, and mixed breed dogs from the United States. Village dogs from most African regions appear genetically distinct from non-native breed and mixed-breed dogs, although some individuals cluster genetically with Puerto Rican dogs or United States breed mixes instead of with neighboring village dogs. Thus, African village dogs are a mosaic of indigenous dogs descended from early migrants to Africa, and non-native, breed-admixed individuals. Among putatively African breeds, Pharaoh hounds, and Rhodesian ridgebacks clustered with non-native rather than indigenous African dogs, suggesting they have predominantly non-African origins. Surprisingly, we find similar mtDNA haplotype diversity in African and East Asian village dogs, potentially calling into question the hypothesis of an East Asian origin for dog domestication.

Keywords: Canis familiaris, microsatellites, principal component analysis, single nucleotide polymorphisms

In many respects, dogs have a unique relationship to humans. They were the first domesticated species, serve as valuable companions and service animals, and have been bred to exhibit more phenotypic diversity than any other mammal (13). Dogs were probably domesticated from Eurasian wolves at least 15,000–40,000 years ago (46), although the process by which domestication took place, including the specific selected traits and the manner in which selection was performed, is very poorly understood (7, 8).

After domestication somewhere in Eurasia, dogs quickly spread throughout the continent and into Africa, Oceania and the Americas (9). These early dogs, like modern day “village dogs” (7), almost certainly lived as human commensals that were not subject to the same degree of intense artificial selection and closed breeding practices that characterize modern dog breeds. Like ancient human populations, these ancient dog populations developed genetic signatures characteristic of their geographic locale. These signatures would persist in both modern day village dog populations that descend from these ancient populations and in dog breeds that were founded from them. We refer to such dogs as “indigenous” in the sense that they carry characteristic genetic signatures appropriate for their geographic region.

Today, semiferal village dogs are nearly ubiquitous around human settlements in much of the world, and such animals comprise a large proportion of the global dog population (7). However, the popularity of modern breeds has led to the widespread transport of mostly European-derived breed dogs into many areas containing village dogs, so it is likely that many modern village dogs are not derived solely from indigenous ancestors. We refer to village dogs that descend from these foreign dogs as “non-native” and expect that genetic markers can differentiate these village dogs from indigenous dogs. We believe most of these dogs will be complex mixtures of several non-native breeds and/or mixtures of both non-native breeds and indigenous village dogs (“intermediate” ancestry).

The distinction between indigenous and non-native dogs is important because indigenous, but not non-native, village dogs are likely to contain genetic variants that are not found in any of today's >400 recognized dog breeds. Furthermore, they are expected to be more informative regarding dog population history and are likely to be more adapted to local environmental conditions and more genetically related to the first prebreed domestic dogs than breed or breed-admixed individuals. To our knowledge, the degree to which village dogs consist of indigenous versus non-native individuals has not been quantified.

In one of the most comprehensive surveys of village and breed dogs to date, Savolainen et al. (6) examined mtDNA diversity in a global panel of 654 dogs. Their results confirmed previous mtDNA evidence of dog domestication from Eurasian wolves (5), showed that East Asian dogs had the highest mtDNA diversity of any region, suggesting an East Asian origin of domestication. However, subsequent work by Pires et al. (10) has shown that mtDNA does not show significant population structure in village dogs. Because Savolainen et al. included many East Asian village dogs but few village dogs from other regions, their conclusion of high levels of East Asian diversity is likely a consequence of high levels of mitochondrial diversity in village dogs and not necessarily an indication of East Asian domestication.

Other genetic markers have been shown to exhibit significant population structure in village dogs. Microsatellites and MHC types both separate Bali street dogs from New Guinea singing dogs, dingoes, and breed dogs (11, 12). Both studies demonstrated high diversity in the Bali dogs, consistent either with an indigenous, prebreed ancestry or with a complex admixture history from a large number of breeds. Therefore, given a large enough sample of village and breed dogs, microsatellite and single nucleotide polymorphism (SNP) markers seem well suited to studying population structure and the possibility breed admixture in village dogs.

In this study, we analyzed mtDNA, microsatellite, and SNP markers in 318 African village dogs to characterize population structure and genetic diversity. In addition, we analyzed 16 Puerto Rican street dogs, 102 known mixed-breed dogs from the United States, and several hundred dogs from 126 breeds, including 129 dogs from five African and Middle Eastern breeds, to determine the degree of non-native admixture in African village dogs. Our sampling effort concentrated on seven regions from three geographically separated African countries (Fig. 1): Egypt: We sampled three distinct locales: a Giza animal shelter, a Luxor animal shelter and surrounds, and a rural desert oasis (Kharga). Although the geographic distance between Giza and Luxor is greater than that between Kharga and Luxor, we hypothesized that the desert would be a strong barrier to gene flow, making the latter populations more genetically distinct.

Fig. 1.
Map of village dog sampling locations. Colors denote each distinct region and dots show approximate range of sampling within each region. See Table S1 for full description.

Uganda: We sampled >100 dogs from a cluster of villages east of Kampala and 30 dogs from three neighboring isles of the Kome Island group in Lake Victoria. Despite the islands being close to each other and the mainland (<20 km), we expected the lake might act as a dispersal barrier.

Namibia: We sampled from over a dozen villages and urban areas in the northern and central parts of the country. No natural dispersal barriers existed between sampling locations, although a cordon fence is maintained to keep livestock diseases out of the southern part of the country. Dogs are permitted to be taken across the cordon and likely have little difficulty getting through the fence themselves, but the cordon is significant in that it demarcates the extent of European colonization influence in the country [with southern and central Namibia colonization history being roughly similar to that of South Africa while northern Namibia resembles the rest of sub-Saharan Africa (13)]. We sampled dogs within 100 km of both sides of the cordon, including populations within 10–20 km of the barrier.

For comparison, we also sampled from two shelters in Puerto Rico, known mixed-breed dogs (see Methods) from the United States, and dogs from 126 breeds, including five African and near-African breeds (putative origin in parentheses): Afghan hounds (Sinai, Egypt), Basenjis (Congo), Pharaoh hounds (near Mediterranean), Rhodesian ridgebacks (Zimbabwe), and Salukis (Iraq).


Inference of Population Structure and Degree of Breed Admixture in African Village Dogs.

A subset of 223 unrelated African village dogs from seven African locales were typed on a panel of 89 microsatellite markers or 300 SNP markers (206 village dogs, 15 Puerto Rican dogs, and two United States mixed-breed dogs were typed on both panels). Using the Bayesian clustering program STRUCTURE (14), we found that Puerto Rican street dogs clustered with the mixed-breed dogs from the United States, indicating these dogs are all breed admixtures. STRUCTURE analysis at K = 5 consistently showed the same five groupings: Egyptian dogs, Ugandan mainland dogs, Kome Island dogs, Northern Namibian dogs, and admixed dogs (including all Puerto Rican and U.S. dogs, nearly all Central Namibian dogs, and a few other African village dogs; Fig. 2 and Fig. S1). At K = 4, STRUCTURE clustered Ugandan dogs together (mainland and Kome Islands), and at K >5, STRUCTURE subdivided Ugandan dogs further, although these clusters were inconsistent (Fig. S2).

Fig. 2.
STRUCTURE analysis across 389 SNP and microsatellite loci in African village and American mixed breed dogs.

We quantified admixture in each village dog as the mean proportion of the genome assigned to the American (United States + Puerto Rico) cluster by STRUCTURE across 10 runs at K = 5 (admixture estimates using K = 4 or K = 6 mean proportions were nearly identical; R2 = 0.984 and 0.992, respectively). In total, 84% of African village dogs outside of central Namibia showed little or no evidence of non-native admixture (estimated admixture proportion <25% in 152 of 181 dogs), whereas all central Namibian dogs had >25% admixture, and most had >60% (24 of 25; Table 1). Principal component analysis showed a clear separation of Egyptian from sub-Saharan populations in PC1 and separation between Ugandan and Namibian populations in PC2 for indigenous African village dogs for both SNP and microsatellite markers (Fig. 3). When admixed African and American dogs were included, PCA, like STRUCTURE, always clustered them together, and the interpretation of the principal components became more complicated (Fig. S2).

Table 1.
Number of indigenous (<25% inferred admixture), uncertain (25%–60% inferred admixture) and breed admixed (>60% inferred admixture) village dogs by region from the 223 unrelated genotyped dogs
Fig. 3.
Principal component analysis of indigenous African village dogs. (A) PCA with the 89 microsatellite loci (n = 152). (B) PCA with the 300 SNP loci (n = 126).

To clarify the relationship between the Puerto Rican and African dogs that clustered with the two known mixed-breed dogs genotyped on the full 389 marker panel, we ran STRUCTURE on the 300-SNP dataset with an additional 100 known breed-admixed dogs from the United States that were genotyped on this SNP panel (Fig. S3). The groupings of African dogs and the inference of non-native admixed individuals are highly consistent with the earlier analyses until K = 5, when STRUCTURE starts to detect groupings within the admixed individuals. The substructure found within admixed individuals may be a consequence of different ancestral breeds in different individuals; STRUCTURE analysis of the village dogs and dogs from 126 breeds shows that the putatively indigenous village dogs cluster with ancient breeds (specifically Basenjis) while the putatively non-native dogs cluster with modern breed groups in various proportions (Fig. S4).

Fst calculations confirm that central Namibian dogs show virtually no genetic differentiation from American dogs (pairwise Fst based on SNP markers = 0.011; microsatellite Fst = 0.0025). The pairwise Fst between Egyptian dogs from Giza and Luxor was also low (SNP Fst = 0.0024; microsatellite Fst = 0.0057), whereas other village dog populations had pairwise Fst values of 0.025–0.133 (Table 2). Dogs from Kharga were the most distinct (Fst of 0.0735–13.3) whereas dogs from mainland Uganda and northern Namibia (≈2,900 km apart) show only moderate differentiation (Fst = 0.0237–0.0254). Heterozygosity was high across all genetic marker types in all village dog populations except those of the Kharga oasis and the Kome islands and low in all of the breed dogs (Table 3).

Table 2.
Pairwise Fst in village dogs between regions based on 300 SNPs
Table 3.
Gene diversity (expected heterozygosity) at 89 microsatellite markers, 300 SNP markers, and the mitochondrial D-loop in African village dogs and five breeds

Origin of Putatively African Breeds.

We included individuals from five breeds with presumed African or Middle Eastern ancestry in our principal component analyses to see whether this approach could detect which sampled village dog populations are closest to the founding population for each breed. For the SNP loci, PC1 and PC2 differentiated three breed groups—Basenjis, Salukis/Afghan hounds, and Rhodesian ridgebacks/Pharaoh hounds—while village dogs were clustered closer to the origin (Fig. 4). Notably, the village dog cluster still exhibited geographical structuring with Egyptian village dogs lying closest to the Saluki/Afghan hound cluster, indigenous Namibian and Ugandan dogs lying closest to the Basenji cluster, and breed-admixed Namibian and American dogs lying closest to the Rhodesian ridgeback/Pharaoh hound cluster. PCA of the microsatellite loci revealed the same clustering affinities (Egyptian village dogs nearest to Salukis/Afghan hounds, etc.) as the SNP PCA although the breed clusters were less well defined (Fig. S5).

Fig. 4.
Principal component analysis of village dogs and dogs from 5 putatively African and Middle Eastern breeds across 300 SNP markers in 186 village dog and 105 breed dogs.

Analysis of Mitochondrial Diversity.

We sequenced 680 bp of the mitochondrial D-loop, including the 582-bp region described in ref. 6. We found 47 haplotypes in the African dogs as well as 9 haplotypes in the Puerto Rican dogs, two of which were also found in the sampled United States mixed breed dogs (see Table S1 and Table S3). All haplotypes were in the A (33 African haplotypes), B (6 African haplotypes), or C (8 African haplotypes) clades (Fig. S6), the clades that are believed to contain >95% of domestic dogs (6). Over the region sequenced in (6) and ignoring indels, we found 18 African haplotypes that were not described by (6); 14 in A clade [one of which was found in Africa by (10)], one in B clade, and three in C clade. The Puerto Rican and United States mixed-breed dogs had 8 A clade and one B clade haplotypes (only one haplotype, a Puerto Rican A clade haplotype, ws not previously described in ref. 6).

Surprisingly, local mtDNA diversity did not differ systematically between African regions and similarly sized regions in East Asia, the purported origin of domestic dogs. Across the 582-bp region analyzed in refs. 6 and 10, and this study, the number of haplotypes observed in a region closely matches the neutral expectation (Fig. 5). Differences in regional haplotype diversity appear to be driven by sampling artifacts rather than by distance from an hypothetical domestication origin, with the highly sampled and fractionated subpopulations of Japan exhibiting the most diversity, and nearby Sichuan (China) probably exhibiting the least (Fig. 5). Neither Africa nor East Asia appears to contain private haplogroups (haplotypes that are highly differentiated from those found on other continents; Fig. S6).

Fig. 5.
Number of haplotypes (excluding indels) versus number of dogs sampled within Africa and East Asian geographic regions. Note log scale of x axis. East Asian samples from (6); African samples from this study or by (10). See Table S4 for a list of the areas ...


This study analyzed a large number of genetic markers to characterize the level of non-native admixture in a geographically widespread set of semiferal village dog populations. African village dogs exhibit complex population structure because of the effects of geography, gene flow barriers, and the presence of non-indigenous dogs in some populations. Notably, the vast majority of the African village dogs could be classified as indigenous (<25% non-African ancestry) or non-native (>60% non-African ancestry), with only 7% showing intermediate levels of African ancestry (Table 1). Classification of individuals as indigenous versus non-native was consistent between runs, and remained consistent even when the number of mixed-breed dogs included in the analysis was substantially increased (Fig. S3).

With two exceptions, African village dogs did not exhibit a region-specific level of non-African admixture, but rather contained dogs with completely indigenous ancestry (or nearly so) that were often intermingling with a few highly admixed individuals. The lack of consistent levels of admixture within regions suggests that non-indigenous dog genes are quickly removed from village dog populations, or that admixture with non-indigenous dogs is a very recent phenomenon in these areas. The two exceptions were central Namibia, where every dog had significant levels of non-indigenous admixture (see below), and Giza, where all dogs showed some, usually low, level of admixture. This background level of admixture in Giza could reflect older mixing with breed dogs around this ancient city, or it could simply reflect the relative proximity of Giza to Eurasia, the ancestral home of most modern breed dogs. STRUCTURE analyses including dogs from 126 breeds suggest it is the latter—Egyptian dogs cluster partially with ancient (mostly Asian) breeds and the sub-Saharan (Basenji + village dog) cluster and do not appear to cluster significantly with any of the (mostly European) modern breed groups (Fig. S4).

Dispersal barriers significantly affected population structure. The 230 km of desert separating the Kharga oasis from Luxor led to much stronger population differentiation (Fst = 0.084) than the 500 km Nile corridor between Luxor and Giza (Fst = 0.0024). Likewise, the Kome islands which lie 10–20 km from the mainland in Lake Victoria were much more differentiated from mainland Uganda than were northern Namibian populations 2,900 km away (Fst = 0.051 vs. Fst = 0.033). Most surprising, the 20–100 km distance between northern and central Namibian populations that coincided with that country's Red Line veterinary cordon fence represented a stark population boundary—dogs north of the cordon averaged 87% indigenous African ancestry while those south of the cordon were only 9% African. The cordon has separated the indigenous human populations (to the north) from white settlement areas (to the south) for the last 100 years and is currently used to restrict livestock (but not humans or dogs) from crossing southward (13). During this time, indigenous dogs have apparently been extirpated from central Namibia, and the selective pressures on dogs in each region must be strong and disparate enough to maintain a sharp genetic boundary along this porous chain-link fence. That Puerto Rico also seems to contain few, if any, indigenous dogs highlights the degree to which colonization history affects dog populations.

STRUCTURE and principal component analysis revealed strikingly similar patterns of genetic variation—indigenous Africian dogs clearly clustered by country and away from non-indigenous dogs in each analysis (Figs. 224). PCA showed slight differences between the SNP and microsatellite results: SNP but not microsatellite markers led to PC1 separating out dogs based on admixture (Fig. S2), although PCA with only indigenous African dogs resulted in the same axes of variation in both sets (Fig. 3). Breeds were clustered more cleanly with the SNP dataset than the microsatellite dataset, although this result could be an effect of the larger number of breed dogs that were typed on the SNP panel rather than a consequence of using SNPs versus microsatellites per se (Fig. 4 and Fig. S5). Nevertheless, both marker sets clustered Salukis and Afghan hounds nearest to Egyptian village dogs and Basenjis nearest to indigenous Ugandan and Namibian dogs, as expected by each breed's history. In contrast, Rhodesian ridgebacks and Pharaoh hounds clustered nearest to admixed dogs, suggesting these breeds have been recreated from admixture with non-African dogs. These results are consistent with the STRUCTURE results from (15, 16), showing that Salukis, Afghan hounds, and Basenjis cluster with ancient, non-European breeds, while Pharaoh hounds and Rhodesian ridgebacks do not. Although this coarse sampling (3 countries) is suitable for detecting truly indigenous versus reconstituted ancestry in putatively African breeds, analysis including village dogs from more regions will be necessary to better localize the ancestral origins of these breeds.

Village dog populations had higher levels of diversity than purebred dogs across all markers (see (17) for purebred mtDNA diversity estimates), although for SNP markers, non-native/admixed dogs had even higher diversity estimates. The high heterozygosity found in breed-admixed dogs is likely because of SNP ascertainment; by preferentially genotyping SNPs that are highly polymorphic in breed dogs, inferences based on SNP diversity in village dogs may be biased. Microsatellite ascertainment bias is less likely to have this effect since even microsatellites that are highly polymorphic in breeds can exhibit new alleles when genotyped in other populations. This suggests that careful control of ascertainment, or a denser SNP marker set that enables haplotype-based inference, is desirable for SNP markers. However, the high degree of concordance of SNP and microsatellite markers in both PCA and STRUCTURE analyses shows that these methods are robust to these effects.

African village dogs exhibited a similar level of mitochondrial D-loop diversity to that of the dogs sampled by (6) in East Asia, the putative site of dog domestication. Although we do not suggest that Africa is actually the site of dog domestication, we do believe that an East Asian origin of dogs should be further scrutinized, especially as Africa also has numerous private haplotypes and East Asia has no private haplogroups, with the possible exception of clade E, which is poorly represented numerically (1 haplotype, 3 individuals) and is rather similar to clade C. The data appear consistent with a rapid spread of dogs after original domestication and high effective population sizes and gene flow between continents, as there is no clear signal of decreasing haplotype diversity away from any origin.

Interestingly, Ugandan and northern Namibian populations that appear relatively undifferentiated using nuclear markers also have large overlap in their mitochondrial sequences. Thus, long-distance gene flow may be occurring, leading to a lower total number of haplotypes in these areas, whereas areas in Egypt with less chance for gene flow between them may harbor more diversity in the aggregate. This underscores the need to design a sampling and interpretation scheme to compare populations as opposed to coarse geographic areas. These areas could have features such as islands and deserts that may increase the number of haplotypes found only because one is sampling multiple populations.

Besides the discovery of 18 haplotypes, we have also expanded the geographic range of some previously reported dog mtDNA haplotypes. For example, we found haplotype A29, the predominant mtDNA haplotype of Australian dingoes, in a Puerto Rican dog even though this haplotype has never been reported in a dog outside of East Asia or the American Arctic (18). Either Puerto Rican dogs descend from some non-European (probably Asian) dogs that still carry this haplotype, or this is an indigenous New World haplotype that has persisted in Puerto Rico despite widespread historical European admixture.

Our results clearly demonstrate the need for further research with indigenous village dogs. Indigenous dog populations can be largely eliminated, as in Puerto Rico and central Namibia, by European colonization, and it is unclear the degree to which other populations will be able to maintain their genetic identity and persist in the face of modernity. The dog, although certainly a species uniquely suited as a model organism for genomics, can also serve as an invaluable organism for comparative studies of evolution and adaptation. Like other domesticated animals (e.g., cats, horses, and pigeons), dogs consist of breeds intensely selected for specific traits and feral populations that have been left to adapt to local conditions with “random” breeding. Dense genotyping and resequencing in these species should reveal genes underlying domestication in random-bred populations, instead of just those that have been under strong artificial selection in breed animals, and whether the relaxation of selective constraint observed in these species (19) is a product of recent breeding practices or domestication per se. Resequencing in indigenous village dogs will also be necessary to obtain markers free of ascertainment bias to estimate the amount of genetic variation in dogs that is absent in existing modern breeds, and the degree to which present-day indigenous village dogs represent populations that have been randomly breeding since dog domestication versus remnants of ancient, indigenous breeds.

Mitochondrial sequencing alone does not seem well-suited to determining the timing and location of domestication. Dog mitochondrial haplogroups seem more or less cosmopolitan, and inferences based on mtDNA diversity statistics can be easily skewed by sampling effort and misled by the inability to distinguish indigenous from non-native dogs. In the absence of finding multiple highly diverged and highly localized mitochondrial haplogroups, genome-wide autosomal markers will be needed to unravel the story of the first domesticated species.

Materials and Methods

Sampling Protocol.

Dogs were sampled from animal shelters or were brought to the researchers for sampling by owners and villagers. In accordance with Cornell IACUC protocol 2007–0076, 3–5 mL of blood drawn from the cephalic or lateral saphenous vein into K2-EDTA blood collection tubes. At the field site, blood cells were lysed with an ammonium chloride solution and spun at 1,100 × g with a portable centrifuge. After discarding the supernatant, cell pellets were resuspended in an EDTA-Tris-SDS solution for transport to the DNA Bank at Cornell Baker Institute for Animal Health. DNA was isolated from the lysate using ammonium acetate and alcohol and was suspended in Tris-EDTA buffer. Concentrations were determined by A260 on a NanoDrop ND1000 spectrophotometer. Stock DNA was stored in −20 °C freezers by the Cornell Medical Genetics Archive. Dilutions were made from a 200 μg/mL working stock as needed for sequencing and genotyping. A similar protocol was followed for the 102 United States dogs, except that we also verified that they were mixtures of several different breeds by using the Wisdom MX breed test (Mars Inc.).

Microsatellite Genotyping.

Two hundred twenty-seven village dogs were typed on a 96-microsatellite panel described in (15, 16). Microsatellites were amplified individually in the presence of a fluorescently labeled universal primer and were combined post-PCR into sets of 1 to 4 markers for capillary electrophoresis on an ABI3730xl (ABI). Standard PCR conditions have been described in ref. 15 while adjustments made to individual markers are listed in Table S5. Each 96-well plate of samples included a previously genotyped control sample for size verification and binned using GeneMapper 4.0. All genotype calls were checked manually and markers were scanned individually for the appearance of new alleles outside the existing bins. After genotyping, 7 markers were excluded on the basis of high missing rates (>20%) or heterozygote deficits (P < 0.01) in a majority of the 8 regional populations because this suggests the presence of null alleles at these loci. These data were combined with dogs from 126 breeds previously genotyped for breed structure studies (15, 16).

SNP Genotyping.

One hundred sixty-eight village dogs, 102 mixed-breed dogs, and dogs from 126 breeds were genotyped using the sequenom iPLEX platform on a 321-SNP panel described in ref. 20. For each sample, 2 μL of dog genomic DNA was aliquoted into 13 separate microtiter wells for PCR amplification. Each genomic aliquot was amplified in a total volume of 10 μL >45 cycles with up to 28 primer pairs. Each reaction was treated with shrimp alkaline phosphatase for 40 min before heat inactivation. Primer extension reactions were carried out in a standard thermocycler according to the sequenom iPLEX gold protocol. Each reaction was desalted before spotting and shooting a SpectroChip on the Compact MassARRAY system (Sequenom). Results were interpreted automatically using cluster plots with the Histogram tabular view active in SpectroTyper-TyperAnalyzer (Sequenom). SNP genotypes were loaded into Plink version 1.0.4 (21) and 15 SNPs with high missingness (>20%) and 1 SNP with an extreme heterozygote deficiency (P < 10−7 below Hardy-Weinberg equilibrium) were removed from further analysis.

Mitochondrial Sequencing.

A 680-bp fragment of the mitochondrial D-loop was amplified in two overlapping reactions. Region-1 was amplified using forward primer H15422: 5′-CTCTTGCTCCACCATCAGC-3′, and reverse primer L15781: 5′-GTAAGAACCAGATGCCAGG-3′. Region-2 was amplified using forward primer H15693 5′-AATAAGGGCTTAATCACCATGC-3′ and reverse primer L16106: 5′-AAACTATATGTCCTGAAACC-3′ (primer names correspond to 3′ most position of primer, relative to the published dog mitochondrial genome as in (6)). PCR was carried out under the following protocol using 10 ng genomic DNA: Denaturation: 94 °C (40 s); annealing: 54 °C (1 min); amplification: 72 °C (1 min) for 35 total cycles followed by a 5 min final annealing step at 72 °C. Sequencing reactions were carried out on an ABI 3730 sequencer using BigDye Terminator chemistry using the Region-1 reverse primer and Region-2 forward primer. Any reads with ambiguous bases were rerun in the opposite direction. Sequences were edited, assembled, and aligned with Sequencher 4.8 (Gene Codes Corporation) and submitted to GenBank with Sequin (http://www.ncbi.nlm.nih.gov/Sequin/).

Statistical Analyses.

We used two approaches—principal component analysis with Eigensoft v2.0 (22) and clustering analysis with STRUCTURE v2.2 (14)—to classify individuals as indigenous or non-native and to describe the genetic structure of indigenous African village dogs and their relationship to dogs from putatively African breeds. We relied primarily on STRUCTURE to determine the proportion of non-African admixture present in each village dog because structure allows for probabilistic assignment of individuals to classes and explicit modeling of admixture (22). In contrast, PCA makes no assumptions regarding discrete versus clinal population structure and is well suited for describing the principal axes of genetic variation between populations. In practice, STRUCTURE and PCA usually reveal very similar patterns of genetic variation (22).

Before running these clustering methods, we removed markers in high LD with other markers [r2>0.5, see (23)] using Arlequin v3.11 (24) and removed 9 village dogs that showed high relatedness to another dog in the genotyping panel (πhat > 0.3). All STRUCTURE runs were done using the admixture model with correlated allele frequencies, no prior population information, and default parameter settings with a burnin period of 100,000 iterations followed by 500,000 MCMC repetitions, with 10 runs per K, and averaged using CLUMPP v1.1.2 (25). In contrast, PCA was carried out separately for the SNP and microsatellite markers. Microsatellite loci with n > 2 alleles were recorded as n-1 biallelic loci before running PCA in Eigensoft.

Expected heterozygoisty (h) was calculated in Arlequin after removing 10 dogs that appeared to be r approximately 0.5 related. Fst based on SNP loci was computed with a custom C++ implementation of Eq. 6 from (26); microsatellite Fst was computed using Arlequin. Unless otherwise noted, statistical tests were performed in R v2.6.2 (27). STRUCTURE results were plotted using Distruct v1.1 (28).

Supplementary Material

Supporting Information:


We thank numerous volunteers and animal shelters for their assistance in gathering samples, including Leonard Kuwale, Ahmed Samaha, Kazhila Chinsembu, Animal Care in Egypt (Luxor), Animal Friends Shelter (Giza), Albergue de Animales Villa Michelle (Mayaguez), and Albergue La Gabriella (Ponce); Jason Mezey, Fengfei Wang, Katarzyna Bryc, and Andy Reynolds for their assistance with lab and computational resources; Bob Wayne, Niels Pedersen, Ben Sacks, Sarah Brown, and Peter Savolainen for helpful comments and discussion; and the intramural program of the National Human Genome Research Institute. This work supported by the Center for Vertebrate Genomics, Department of Clinical Sciences and Baker Institute of Animal Health, Cornell University; National Institutes of Health Center for Scientific Review and R24 research grant program; National Science Foundation Grant 0516310; and a Sloan Foundation research fellowship.


Conflict of interest statement: For some of this project, we utilized the Wisdom MX product (MARS Inc.) for detecting breed-admixed ancestry. P.G.J. was as employee of MARS overseeing Wisdom development, C.D.B. was paid consultant for MARS during its development, and E.A.O. is a licenser of the patent.

This article is a PNAS Direct Submission.

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. GQ375164GQ375213).

This article contains supporting information online at www.pnas.org/cgi/content/full/0902129106/DCSupplemental.


1. Wayne R. Consequences of domestication: Morphological diversity of the dog. In: Ruvinsky A, Sampson J, editors. The Genetics of the Dog. Oxon, UK: CABI Publishing; 2001. pp. 43–60.
2. Clutton-Brock J. Origins of the dog: Domestication and early history. In: Serpell J, editor. The Domestic Dog, Its Evolution, Behavior and Interactions with People. CUP, Cambridge: 1995. pp. 7–20.
3. Vilà C, Maldonado J, Wayne R. Phylogenetic relationships, evolution, and genetic diversity of the domestic dog. J Hered. 1999;90:71–77. [PubMed]
4. Germonpré M, et al. Fossil dogs and wolves from Palaeolithic sites in Belgium, the Ukraine and Russia: Osteometry, ancient DNA and stable isotopes. J Arch Sci. 2009;36:473–490.
5. Vilà C, et al. Multiple and ancient origins of the domestic dog. Science. 1997;276:1687–1689. [PubMed]
6. Savolainen P, Zhang Y, Luo J, Lundeberg J, Leitner T. Genetic evidence for an East Asian origin of domestic dogs. Science. 2002;298:1610–1613. [PubMed]
7. Coppinger R, Coppinger L. Dogs: A Startling New Understanding of Canine Origin, Behavior and Evolution. New York: Scribner; 2001.
8. Dobney K, Larson G. Genetics and animal domestication: New windows on an elusive process. J Zool. 2006;269:261–271.
9. Miklosi A. Dog Behaviour, Evolution, and Cognition. Oxford: Oxford Univ Press; 2008. p. 304.
10. Pires A, et al. Mitochondrial DNA sequence variation in Portuguese native breed dogs: diversity and phylogenetic affinities. J Hered. 2006;97:318–330. [PubMed]
11. Irion D, Schaffer A, Grant S, Wilton A, Pedersen N. Genetic variation analysis of the Bali street dog using microsatellites. BMC Genet. 2005;6:6. [PMC free article] [PubMed]
12. Runstadler J, Angles J, Pedersen N. Dog leucocyte antigen class II diversity and relationships among indigenous dogs of the island nations of Indonesia (Bali) Australia and New Guinea Tissue Antigens. 2006;68:418–426. [PubMed]
13. Police Zone. Encyclopædia Britannica. 2008. Online Ed.
14. Pritchard J, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–949. [PMC free article] [PubMed]
15. Parker HG, et al. Genetic structure of the purebred domestic dog. Science. 2004;304:1160–1164. [PubMed]
16. Parker H, et al. Breed relationships facilitate fine-mapping studies: A 7.8-kb deletion cosegregates with Collie eye anomaly across multiple dog breeds. Genome Res. 2007;2007:1562–1571. [PMC free article] [PubMed]
17. Gundry R, et al. Mitochondrial DNA analysis of the domestic dog: Control region variation within and among breeds. J Forensic Sci. 2007;52:562–572. [PubMed]
18. Savolainen P, Leitner T, Wilton A, Matisoo-Smith E, Lundeberg J. A detailed picture of the origin of the Australian dingo, obtained from the study of mitochondrial DNA. Proc Natl Acad Sci USA. 2004;101:12387–12390. [PMC free article] [PubMed]
19. Björnerfeldt S, Webster M, Vilà C. Relaxation of selective constraint on dog mitochondrial DNA following domestication. Genome Res. 2006;16:990–994. [PMC free article] [PubMed]
20. Jones P, et al. Single-nucleotide-polymorphism-based association mapping of dog stereotypes. Genetics. 2008;179:1033–1044. [PMC free article] [PubMed]
21. Purcell S, et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–575. [PMC free article] [PubMed]
22. Patterson N, Price A, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190. [PMC free article] [PubMed]
23. Kaeuffer R, Réale D, Coltman D, Pontier D. Detecting population structure using STRUCTURE software: Effect of background linkage disequilibrium. J Hered. 2007;99:374–380. [PubMed]
24. Excoffier L, Schneider S. Arlequin ver. 3.0: An integrated software package for population genetics data analysis. Evol Bioinform Online. 2005;1:47–50. [PMC free article] [PubMed]
25. Jakobsson M, Rosenberg N. CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics. 2007;23:1801–1806. [PubMed]
26. Weir B, Cockerman C. Estimating F-statistics for the analysis of population structure. Evolution. 1984;38:1358–1370.
27. R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2008.
28. Rosenberg N. DISTRUCT: A program for the graphical display of population structure. Mol Ecol Notes. 2004;4:137–138.
29. Ewens W. The sampling theory of selectively neutral alleles. Theor Pop Biol. 1972;3:87–112. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • EST
    Published EST sequences
  • MedGen
    Related information in MedGen
  • Nucleotide
    Published Nucleotide sequences
  • PopSet
    Published population set
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...