• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of molbiolevolLink to Publisher's site
Mol Biol Evol. Apr 2010; 27(4): 921–933.
Published online Dec 2, 2009. doi:  10.1093/molbev/msp298
PMCID: PMC2877540

The Divergence of Chimpanzee Species and Subspecies as Revealed in Multipopulation Isolation-with-Migration Analyses

Abstract

The divergence of bonobos and three subspecies of the common chimpanzee was examined under a multipopulation isolation-with-migration (IM) model with data from 73 loci drawn from the literature. A benefit of having a full multipopulation model, relative to conducting multiple pairwise analyses between sampled populations, is that a full model can reveal historical gene flow involving ancestral populations. An example of this was found in which gene flow is indicated between the western common chimpanzee subspecies and the ancestor of the central and the eastern common chimpanzee subspecies. The results of a full analysis on all four populations are strongly consistent with analyses on pairs of populations and generally similar to results from previous studies. The basal split between bonobos and common chimpanzees was estimated at 0.93 Ma (0.68–1.54 Ma, 95% highest posterior density interval), with the split among the ancestor of three common chimpanzee populations at 0.46 Ma (0.35–0.65), and the most recent split between central and eastern common chimpanzee populations at 0.093 Ma (0.041–0.157). Population size estimates mostly fell in the range from 5,000 to 10,000 individuals. The exceptions are the size of the ancestor of the common chimpanzee and the bonobo, at 17,000 (8,000–28,000) individuals, and the central common chimpanzee and its immediate ancestor with the eastern common chimpanzee, which have effective size estimates at 27,000 (16,000–44,000) and 32,000 (19,000–54,000) individuals, respectively.

Keywords: chimpanzee, bonobo, divergence population genetics, coalescent, gene flow, speciation

Introduction

Today, wild chimpanzees still live in several forested regions in the lowest latitudes of sub-Saharan Africa (fig. 1). Because they are our own species’ closest living relatives and because divergence among chimpanzee species and subspecies appears to be early in the speciation process, the divergence among chimpanzee taxa has frequently been a focus of phylogenetic and population genetic research (Miyamoto et al. 1987; Morin et al. 1992; Kaessmann, Wiebe, and Paabo 1999; Deinard and Kidd 2000; Stone et al. 2002; Yu et al. 2003; Won and Hey 2005; Fischer et al. 2006; Becquet and Przeworski 2007; Becquet et al. 2007; Caswell et al. 2008).

FIG. 1.
Geographic distribution of chimpanzee species and subspecies (Schwartz 1934; Hill 1969; Gonder et al. 2006).

The largest taxonomic distinction among chimpanzees lies between the gracile chimpanzee, or bonobo (Pan paniscus), and the robust or common chimpanzee, Pan troglodytes, and is based on both morphological and genetic data (Coolidge 1933; Ferris et al. 1981; Shea and Coolidge 1988; Gagneux et al. 1999; Kaessmann, Heissig, et al. 1999; Deinard and Kidd 2000). Within P. troglodytes, three subspecies have been recognized for some time (Groves 2001): the western common chimpanzee, P. troglodytes verus in West Africa; the central common chimpanzee, P. troglodytes troglodytes in Central Africa; and the eastern common chimpanzee, P. troglodytes schweinfurthii in East Africa. Although these subspecies are geographically separated, data supporting their distinction as subspecies are limited. Gene tree estimates are far from being monophyletic for subspecies (Yu et al. 2003), even for X chromosomal loci (Kaessmann, Wiebe, and Paabo 1999) and mitochondrial genes (Morin et al. 1994; Gagneux et al. 1999; Gonder 2000). At the morphometric level, there is also some overlap among subspecies, particularly between the central and eastern populations (Shea and Coolidge 1988; Lockwood et al. 2004; Pilbrow 2006).

In genetic studies of diverging populations, very often, a key question is the role that gene flow plays in the divergence process (Millicent and Thoday 1961; Maynard Smith 1966; Endler 1977; Felsenstein 1981; Rice and Hostert 1993; Barton 2001). Because moderate levels of gene flow can prevent divergence, at least in a model of selective neutrality, a finding that divergence has occurred despite gene flow can be a signal that natural selection is driving the divergence process. By contrasting the patterns of variation within and among species, for the various genes, it can be possible to develop a demographic model of the divergence process including, possibly, the movement of genes between populations. Isolation-with-migration (IM) models, which include parameters for population sizes, gene exchange, and time of population splitting, have become a common framework for statistical analyses of divergence (Nielsen and Wakeley 2001; Hey and Machado 2003; Hey and Nielsen 2004; Hey 2006; Noor and Feder 2006; Becquet and Przeworski 2007; Hey and Nielsen 2007; Beaumont 2008; Nosil 2008; Nosil et al. 2009). However, until recently, analyses have been limited to pairs of populations. A two-population IM analysis necessarily assumes that no gene exchange has occurred between the two populations under investigation and other populations, and it assumes that the ancestral population had a constant size indefinitely into the past. An IM model with multiple populations and a population phylogeny can allow for complex histories that violate the assumptions of a two-population model.

This study applies a recently developed method for the analysis of divergence of multiple closely related populations to the bonobo and the three subspecies of the common chimpanzee (Hey, 2010). The method requires a phylogenetic tree with population splitting events ordered in time. Recent studies of chimpanzee divergence using genomic-scale data or large numbers of loci indicate that the phylogeny represented as a Newick string, for these four populations, is: (((eastern, central), western), bonobo) (Becquet et al. 2007; Caswell et al. 2008). This is also the phylogeny that was supported by mitochondrial (Morin et al. 1994) and Y-chromosomal loci (Stone et al. 2002).

Recently, a fourth population or subspecies of common chimpanzee, which lives in eastern Nigeria and western Cameroon north of the Sanaga River, has been recognized on the basis of mitochondrial and dental evidence (Gonder et al. 1997; Kormos et al. 2003; Gonder et al. 2006; Pilbrow 2006). This population had been called P. troglodytes vellerosus (Gonder 2000; Gonder et al. 2006); however, a recent reexamination of the collection records for the type specimen of P. troglodytes vellerosus indicates that it came from Gabon and not from the north of the Sanaga (Oates 2006). Oates et al. (2009) suggest the name P. troglodytes ellioti for the population in eastern Nigeria and western Cameroon. So far, the only published genetic data for this population come from the mitochondria (Gonder et al. 1997; Gonder et al. 2006), and it has not been included in this study.

Methods

Data

In addition to the demographic assumptions of an IM model, several assumptions are made of the data to which the model is applied:

  • Individuals are sampled at random from the populations.
  • Patterns of genetic variation follow a neutral model in which mutations are neutral or deleterious (Kimura 1983). Under this model, the overall substitution rate will be the neutral mutation rate and, if recombination rates are high between loci, polymorphism levels within populations will be proportional to the neutral mutation rate (Charlesworth et al. 1993).
  • Individual loci have not experienced intralocus recombination in the history of the species under investigation.
  • Separate loci are freely recombining with respect to each other.

Data from several studies that had reported DNA sequence data for the study of chimpanzee divergence were used for the present study. The large majority of loci are for noncoding regions of the genome, and none of the loci showed evidence of natural selection, as reported in the original papers. Yu et al. (2003) sequenced 50 noncoding autosomal loci from 9 bonobos and 17 common chimpanzees (6 western, 5 central, and 2 eastern). Fischer et al. (2006) sequenced an additional 19 noncoding autosomal regions from 18 bonobos and 20 individuals from each of the 3 subspecies of the common chimpanzee. Fischer et al. also extended the sequenced region for seven of the loci studied by Yu et al. (2003), and for these loci, we used the data of Fischer et al. (2006).

Because the data for these 69 loci were obtained by sequencing DNA amplified from diploid individuals, they often include multiple heterozygous positions. To estimate two separate sequences in these cases, the PHASE program (Stephens et al. 2001) was run, assuming no recombination, on each population, in each case estimating two haplotypes for each individual at each locus. After estimating haplotypes, loci were examined for evidence of recombination. For those loci that showed evidence of recombination since the common ancestor of the chimpanzee sequences, as revealed by the four-gamete test (Hudson and Kaplan 1985), the largest portion of the data that did not reveal evidence of recombination was used (Hey and Nielsen 2004).

Five other loci for which multiple individuals of most of the chimpanzee taxa were available were also included: portions of the apolipoprotein B (APOB) and HOXB6 loci (Deinard and Kidd 2000); a portion of the X-linked locus Xq13.3 (Kaessmann, Wiebe, and Paabo 1999); a portion of the nonrecombining portion of the Y chromosome (Stone et al. 2002); and the ND2 gene from the mitochondria (Stone et al. 2002). These individual X-linked, Y-linked, and mitochondrial loci were assigned inheritance scalars of 0.75, 0.25, and 0.25, respectively (Hey and Nielsen 2004).

One of the loci of Yu et al. (2003) revealed no variation and so was excluded. In total, there were 73 loci with an average total sequence length per individual of 45,276 bp. The average number of variable sites per locus is 7.72, excepting the ND2 gene, which had 96 variable positions. In analyses with just two closely related populations, some loci had zero variation, in which case they were excluded from that analysis. Because each locus receives its own mutation rate scalar (Hey and Nielsen 2004), the effect on the analysis of excluding a locus with zero variation depends primarily on the prior distribution of mutation rate scalars. In other words, if the prior were such that, had the locus been included, the results indicate the locus is expected to show more variation than was observed, then excluding the locus would bias the results. Because most loci in this study had low amounts of variation and because the prior distribution is uniform on a log scale over eight orders of magnitude (Hey and Nielsen 2004), excluding loci with zero variation should have negligible affect.

Working with Population Migration Rates

Migration rate parameters in IM analyses are scaled by the mutation rate, that is, m = M/u, where M is the migration rate per generation per gene copy. However, it is often easier to think of migration in units of the effective number of migrant gene copies per generation (i.e., the population migration rate) rather than the actual mutation rate per gene copy or per mutation event. For example, one way to estimate the rate at which population 1 has received migrants from population 2 is to calculate the quantity 2N1M21=(4N1u×M21/u)/2 using the estimated values of the parameters 4N1u and M21/u. A better way to assess 2NM, which permits likelihood-ratio tests and estimates of confidence intervals (CIs), is to estimate the marginal posterior density for 2NM by an appropriate integration over the joint posterior density for the population size and migration parameters (Hey, 2010).

Exponential Prior Distributions for Migration Parameters

Nielsen and Wakeley (2001) originally developed their method using uniform (i.e., constant) parameter priors for each of the population size, migration, and splitting time parameters, leaving the investigator to select an upper bound for each parameter (and setting the lower bound at zero). Uniform priors are simple and they lead to posterior densities that are directly proportional to the likelihood over the range of the prior distribution, thus opening the door to likelihood-based analyses such as likelihood-ratio tests of nested models (Nielsen and Wakeley 2001; Hey and Nielsen 2007). In the paper describing IM analyses for multiple populations, exponential distributed priors for migration were introduced (Hey, 2010). Exponential distributions proceed from zero to positive infinity and have their highest density at zero. One reason for considering an exponential prior is that, because divergence is not expected unless gene flow is low, IM analyses on populations that already exhibit some divergence begin with prior evidence of limited gene flow. A second reason is that many analyses with limited data and high upper bounds on migration and splitting time tend to return estimates suggestive of an island model with gene flow and splitting time estimates at the upper limit of the prior distribution. An exponential prior with a mean value for the mutation-scaled migration rate, An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp298fx1_ht.jpg, set to 0.5 was used as a prior distribution in a four-population model, and the results were compared with those for uniform priors on m.

Parameter Conversions

Converting estimates of the splitting time parameter t = Tu, to a time estimate in years, requires the geometric mean of the substitution rate for all or some of the loci used in the study (Hey and Nielsen 2004; Won and Hey 2005). All sequences were aligned and compared with their human counterparts and the substitution rate estimated assuming 6 My since the time of splitting of the ancestral species (Chen and Li 2001; Glazko and Nei 2003; Wildman et al. 2003). It is possible the actual divergence was more recent (Hobolth et al. 2007) or closer to 7–8 My (Brunet et al. 2002; Vignaud et al. 2002; Lebatard et al. 2008), in which case the time estimates obtained here can be rescaled accordingly. For estimating the effective population sizes from the population size parameter estimates, a generation time is also required. In a previous paper, 15 years per generation was assumed for the chimpanzees (Won and Hey 2005); however, this is probably an underestimate, and so, here a value of 20 years is used, consistent with estimates from the wild (Gage 1998). This is also the value used in most recent population genetic studies involving chimpanzees (Wooding et al. 2005; Fischer et al. 2006; Caswell et al. 2008).

Computations

To assess how results change as more populations are added to the model, the program for multipopulation IM analyses was run first on pairs of populations, then on three populations, and finally on all four sampled chimpanzee populations. Based on a previous study (Won and Hey 2005), upper bounds on population size parameters were set to 4.0 and for the oldest population splitting time, to 1.0. For the migration parameters, the upper bound can have a large affect on the analyses in cases where the true history has included migration and where there is not a very large amount of data (Hey, 2010). Analyses were begun with an upper bound of the migration parameter, m′, of 1.0 that should allow estimation of fairly high population migration rates (i.e., the product of 2NM for the maximal values of the population size and migration parameters is 2.0, i.e., 4.0 × 1.0/2). Higher values for m′ were also considered for three-population and four-population models.

Ensuring adequate mixing of the Markov chain is sometimes difficult, particularly for large data sets and particularly for histories that include gene exchange. For the analyses reported here, adequate mixing was ensured by using large numbers (between 40 and 120) of heated Metropolis-coupled Markov chains (Geyer 1991; Hey and Nielsen 2004) for each run and by allowing runs to proceed for sufficient durations to the point where individual runs appeared to have achieved stationarity and where multiple independent runs gave very similar results. Within runs, stationarity was assessed by 1) using autocorrelations of splitting time terms over the course of the run; 2) comparing parameter estimates generated using genealogies sampled in the first and second halves of the run; and 3) visually inspecting trend plots for splitting time terms. Each analysis was based on genealogies sampled from multiple (two to four) independent runs. Table 1 shows the burn-in duration, heating parameters used, and runtimes for each of the analyses.

Table 1.
Runtime Information.

Simulations

Simulated data sets were used to assess the overall quality of fit between the actual chimpanzee data and the IM model that was estimated using those data. For the analysis of all four chimpanzee populations, 200 data sets were simulated using the estimated parameter values, each identical in number of loci, sample sizes, and mutation models to the original data set. To compare real and simulated data, counts were made of each of the four types of polymorphic site for each pair of species: shared polymorphisms, fixed differences, polymorphisms restricted to one population, and polymorphisms restricted to the second population (Wakeley and Hey 1997). Counts of these four types of polymorphisms, taken together, are known to be sensitive to divergence history, including gene flow (Wakeley and Hey 1997; Wang et al. 1997; Becquet and Przeworski 2007). For the four-population data set, there are six species pairs, for a total of 24 statistics. To measure the overall distance of a data set from the mean pattern, the mean of each summary statistic was calculated for the 200 simulated data sets. A chi-square statistic was used to indicate the overall distance of a data set from the mean:

An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp298fx2_ht.jpg

The distance of the actual data from the mean of the simulated data sets was then compared with the distribution of distances found for the simulated data sets.

These simulated data sets provide a simplified kind of posterior predictive check of the fit between data and model. Under a full posterior predictive check, the parameter values that are used for each simulated data set are a random draw from the estimated joint posterior density. However, a four-population IM model with 73 loci has a total of 101 parameters (i.e., 7 population size parameters, 18 migration rate parameters, 3 splitting times, and 73 mutation rate scalars), and only the density for the population size and migration parameters can be estimated jointly (Hey and Nielsen 2007). We cannot know for sure what the effect will be of basing simulations on parameter estimates from the marginal posterior densities; however, because of the lack of variance in parameter values used, the variance among simulated data sets will probably be lower than would be observed under a full posterior predictive check.

Results

Two-Population Analyses

The chimpanzee analysis was begun by first examining all six pairs of species in a two-population IM model. In order to summarize results in a visually accessible way, a computer program was written to scan the output files of the IM analyses and to generate a diagram of the estimates and CIs of the model parameters. Figure 2 shows the results in graphical form for all six pairs of populations, with the pairs of common chimpanzee subspecies in the top row (AC) and comparisons involving the bonobo on the bottom row (D–F). These figures show parameter estimates and CIs (95% highest posterior density estimates) for population sizes and splitting times. For population migration rates (i.e., 2NM), an arrow is depicted if a rate of zero is rejected at the level of P < 0.05 or less. These are likelihood-ratio tests proposed by Nielsen and Wakeley (2001) and that were shown to be useful, albeit fairly conservative, for 2NM (Hey, 2010). Summarizing some of the main points that emerge:

  • The divergence time estimates are fairly consistent with each other and with the reported phylogenetic tree for these four populations (((eastern, central), western), bonobo) (Becquet et al. 2007; Caswell et al. 2008).
  • The divergence times between the central and western, and between these and the bonobo, are quite similar to estimates previously obtained under the IM model with a data set of 48 of the loci used here. Won and Hey (2005) estimated splitting times between the bonobo and the western and central common chimpanzees at 859,000 and 899,000 years, respectively, and between the western and central common chimpanzees at 422,000 years.
  • Migration is indicated between the central and western populations in both directions, with a higher estimated rate and a higher log-likelihood-ratio (LLR) statistic for gene flow (forward in time) from the western into the central. In a previous IM analysis, only the latter migration was detected (Won and Hey 2005). It is also important to recognize that estimated values of 2NM are nonzero in a number of the analyses; however, in these other cases, the estimated probability that the migration rate was zero was also fairly high, and the LLR statistic had a low value (full results are given in Supplementary Material online).
  • The larger estimated size for the central population in the analysis with the eastern, relative to the size estimate in the analysis with the western population, is in the direction expected in a two-population analysis where one population has received genes from a third population. Under this interpretation, the central population appears larger when analyzed with the eastern because it has received genes from the western population and because in the analysis with the eastern population, there are no parameters to account for this gene flow.
  • Population size estimates are smaller for the central and western populations and the bonobo than were found in similar analyses with fewer loci (Won and Hey 2005). However, this is explained by the fact that the earlier study used a generation time of 15 years, whereas this study uses 20 years. In other respects, the estimated population sizes in the two studies are similar.
  • Population sizes vary but are mostly consistent across the different comparisons. The width of the boxes in figure 2 are all scaled in the same way, and so by comparing these widths for a population in each of the three contrasts in which it appears, we gain an impression of the effect of imposing a two-population model on the estimation process. Consistently, the central population is estimated to have the largest population size.
FIG. 2.
Histories for all six population pairs are represented as boxes (for sampled and ancestral populations), horizontal lines (for splitting times) and curved arrows (for migration). Time is represented on the vertical axis in each figure, with the sampled ...

Common Chimpanzee Three-Population Analyses

Figure 3 shows the results for a three-population model of the common chimpanzee populations considered under different priors for migration. The upper panel shows the results for an upper bound on migration of m′ = 1.0, whereas the lower panel shows the results for a migration rate upper bound of m′ = 2.0:

  • The population size and splitting time estimates are similar in both parts of figure 3 and similar to those in figure 2.
  • The central population and the ancestor of the eastern and central populations are estimated to have been roughly three times larger than other populations.
  • Given the similar sizes for the central population and its immediate ancestor, it appears as if the population we currently recognize as the central population has persisted as a large population since before the origin of the eastern population and that this large population may have given rise to the eastern population by a founder event.
  • The CIs on population sizes, and especially splitting times, are larger when the upper bound on migration is higher (fig. 3B).
  • Both panels A and B indicate nonzero migration from the western population into the ancestor of the eastern and central populations and from the western population into the eastern population, something that was not observed in the two-population analysis. In addition, statistically significant migration is estimated from the central to the eastern population when the migration rate upper bound is higher (panel B).
FIG. 3.
IM analyses for the three subspecies of common chimpanzee. Results are shown for an upper bound on the migration parameter, m, of 1.0 (A) and an upper bound of 2.0 (fig. 3B). See figure 2 for further explanation of the meaning of symbols.

The evidence of migration from the western to the eastern population seems unlikely given the present day geography (fig. 1). However, most of the pairs of sampled populations showed some evidence of migration, and migration rate estimates for these three-population models are clearly sensitive to the upper bound of the migration rate. Figure 4 shows the posterior densities for all six m and 2NM terms for period 1, for two different upper bounds on the migration rate. Five of the six curves, for both m and 2NM regardless of the prior on m, have nonzero peaks, and two of the curves for m have peaks at the upper bound of m. Note that whereas the curves for 2NM fall well within the plotted range, this is partly a result of the well-defined posterior densities for the population size parameters. None of the migration parameters have estimated posterior probabilities for m that approach zero as m approaches 1, and the posteriors for two of the migration rates suggest an every increasing relationship with the upper bound on m (i.e., western→eastern and central→eastern). These are also the two migration parameters for which a rate of zero is rejected for 2NM when the upper bound on m is 2.0 (fig. 3).

FIG. 4.
Estimated marginal posterior densities for m and 2NM for period 1 in three-population models for the common chimpanzee. Curves for m are shown on the left and for 2NM on the right. Curves generated under a uniform prior with an upper bound of m ...

Four-Population Analyses

The results of a four-population IM analysis with m′ = 1 are summarized in figure 5. The overall picture for splitting times and population sizes is quite consistent with the histories described in figures 2 and and3.3. For migration, the only significant value of 2NM in the four-population model (of 18 population migration rates) is for migration from the western into the ancestor of the central and eastern populations, as observed for the three-population analyses.

FIG. 5.
Four populations in IM analyses with an upper bound on the migration parameter, m′ = 1. See figure 2 for further explanation of the meaning of symbols.

Figure 6 shows the results for four populations with three different types of prior distributions on the migration rates (6A, m′ = 2; 6B, m′ = 5; and 6C, an exponential prior with An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp298fx3_ht.jpg). Unlike the case with the three-population model, in which increasing the upper bound on migration had a moderate effect on the estimated history, for four populations, increasing the migration rate upper bound changes things quite a lot. In figure 6A relative to figure 5, the splitting time estimates have increased, the CIs for splitting times and population sizes have increased, and the single significant population migration rate has been replaced by two others. In figure 6B with a much higher upper bound on migration, the estimated model has little resemblance to those generated with smaller upper bounds. The Markov chain Monte Carlo (MCMC) mixing was very poor under this model, and in order to obtain estimates, the upper bound on splitting times was reduced from 1.0 to 0.7 and the estimate of the oldest splitting time falls at this upper bound (which is why there is no upper CI for this splitting time in fig. 6B). Figure 6A and particularly 6B offer a tale of caution with regard to the dependency that migration prior distributions can have on the results. In cases where data are limiting and do not dominate the posteriors for migration—which will often be the case with models of multiple populations—the choice of migration priors can have a large effect.

FIG. 6.
Results for four-population models under different prior distributions for m. (A) Uniform prior with m′ = 2. (B) Uniform prior with m′ = 5. (C) Exponential prior with An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp298fx3_ht.jpg. See figure 2 for further explanation of the meaning of symbols.

When an exponential prior distribution for migration is used, the estimated model is similar to that for m′ = 1; however, the CIs are wider, and one additional population migration rate is found to be statistically significant. This is migration from the central to the western population, which had also been found to be significant in a two-population model with m′ = 1 (fig. 2C).

The Quality of Fit between Data and Model

Based on the four-population analysis summarized in figure 5, 200 data sets were simulated using the estimated values of the demographic and mutation scalar parameters. Each simulated data set included 73 loci and was the same size and used the same mutation models as the actual data. Twenty-four summary statistics were measured for the data and for each of the simulated data sets (table 2). To assess the degree to which the simulated data resemble the real data, a chi-square statistic of departure from the mean of the simulations was calculated for each data set. The value of this statistic was 44.9, which placed it at position 81 in the distribution of 200 simulated values. In other words, 40% of the simulations had a lower chi-square statistic (better fit) than that for the real data, whereas 59% had a higher value (worse fit).

Table 2.
Counts of Wakeley and Hey (1997) Statistics and Results of Simulations of a Four-Population Modela.

Discussion

The different subspecies of the common chimpanzee share much of their genetic variation (Deinard and Kidd 1999; Kaessmann, Wiebe, and Paabo 1999; Deinard and Kidd 2000; Yu et al. 2003; Fischer et al. 2006) so that until recently, the branching history of the common chimpanzee subspecies has not been well resolved. With much larger data sets, including a data set of over 300 microsatellite loci (Becquet et al. 2007) and some shotgun resequencing of genomes (Caswell et al. 2008), it is now clear that the most closely related of the studied populations are the central and eastern populations of the common chimpanzee. This splitting event was followed, back in time, by the separation of the ancestor of these two populations and the western common chimpanzee population. Finally, the oldest split is, as expected, between the bonobo and the common chimpanzee. This history, with multiple splitting events that are well separated by hundreds of thousands of years, is well reflected in all the analyses shown here, including the multiple pairwise analyses and the multipopulation analyses.

For population sizes and splitting times, the major picture that emerges from the two-, three-, and four-population analyses is a consistent one. Indeed, it is not difficult to imagine estimating the phylogeny for all four populations simply on the basis of the pairwise analyses shown in figure 2. The general portrait that emerges is that the chimpanzee radiation dates to roughly 900,000 years ago (this changes to 1.05 Ma if a human/chimpanzee speciation time of 7 Ma is used for mutation rate calibration) and that effective population sizes have mostly been between 5,000 and 10,000 individuals. The exceptions to this range of population sizes are larger estimates for the central population (≈27,000), the ancestor it shares with the eastern population (≈32,000), and the ancestor of all four populations (≈17,000).

Gene Flow

In many contexts where gene flow and divergence are studied together, the populations being investigated are sympatric or parapatric. In the case of chimpanzees, the borders between populations are primarily large rivers and it is possible, given that chimpanzees are poor swimmers (Angus 1971) and notwithstanding their adjacent geographies, that chimpanzee populations have diverged as essentially allopatric populations. However, the multipopulation analyses suggest that there has been gene flow from the western population into the ancestor of the eastern and central populations. This gene flow parameter is statistically significant in the three- and four-population models (figs. 3 and and5),5), and it seems likely that the signal of gene flow identified in the western to the central population in a two-population analysis (fig. 2C) reflects this same history. When the upper bound on the migration prior is set to 1.0, the estimate of 2NM for genes moving from the western population into the ancestor of the eastern and central populations (as time moves forward) is consistently about 0.4. Using Wright’s formula relating Fst to 2NM in a diploid population, this value corresponds to an Fst of 0.556 (Wright 1951). This parameter also appears as significant when an exponential prior is used with an estimated value of 0.85 (fig. 6C). However, the finding of statistical significance for this particular gene flow parameter is also clearly sensitive to the model and to the migration prior that is being used. The level of significance varies among analyses (e.g., P < 0.05 in fig. 3A and P < 0.01 in fig. 5), and for higher upper bounds in a four-population model (fig. 6A and B), this term is not statistically significant, although it is close to significance when the upper bound on migration is 2.0 as in figure 6A (results not shown).

Other migration rate terms show even more variability among models, such as in the case of gene flow from western to central in a two-population model (fig. 2C), which does not appear in other models, or gene flow from western to eastern and central to eastern that appears in a three-population model (fig. 3) but not a four-population model (fig. 5). Some of this can be attributed to statistical significance being a threshold observation. Thus, for example, gene flow from western to eastern, which is significant at the P < 0.05 level in three-population models (fig. 3), is nearly significant in a four-population model (results not shown). However, models that differ in the number of sampled populations also differ greatly in the number of parameters, and there is probably considerable potential for correlations among parameters to change depending on the number of populations, and these could be contributing to the variability in findings of statistically significant migration rates.

In general, the migration results are less clear than for population sizes and splitting times, not only in terms of wider CIs for parameter estimates but also in terms of sensitivity to prior distributions. Given the population size estimates, the choice of an upper bound on migration of 1.0 (as used in analyses for figs. 2, ,3A3A and and5)5) is sufficient to obtain estimates of moderate migration rates. For example, estimates of 4Nu range from about 0.2 to 1.5 (see Supplementary Material online), in light of which an upper bound on migration of 1.0 corresponds roughly to an upper bound on 2NM in the range of 0.1–0.75. Such values would represent substantial gene flow but not “high” gene flow (e.g., 2NM ≥ 1 would be considered fairly high because it is at this level where divergence is considerably limited in the absence of selection, Wright 1931). However, to be able to make clearer statements on the history of gene flow during chimpanzee divergence and to adequately investigate multipopulation IM models that include histories with higher rates of gene flow (i.e., higher upper bounds) will require substantially more data than were used here.

One potential way to handle the difficulties that arise in selecting a migration prior is to use an exponential prior on migration. An exponential prior should make tests of migration even more conservative and will shift estimated migration rates to lower values unless data really dominate the prior distribution, but they do offer a way to consider high migration rates even when there are not a lot of data. In the case explored, with a mean value of migration on the prior distribution of An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp298fx3_ht.jpg, a strong signal of gene flow was found from the western to the ancestor of the eastern and central populations. Interestingly, significant gene flow was also observed (fig. 6C) from the central to the western population, something that was also indicated in the pairwise analyses (fig. 2C).

Comparisons with Other Studies

Table 3 compares the estimates reported here with those of the previous studies of Won and Hey (2005), Becquet and Przeworski (2007), and Caswell et al. (2008). All numbers in table 3 are scaled assuming a human/chimpanzee divergence time of 6 Ma and a generation time of 20 years. The numbers in table 3 for the present study are the same as those used for figure 6 and are similar to those reported by Won and Hey (2005) based on pairwise studies using a subset of 48 of the loci used for the present study.

Table 3.
Splitting Time, Effective Population Size Estimates and CIs in Different Studies.

Becquet and Przeworski (2007) developed a method for studying an IM model for two populations that share a single symmetric migration parameter and that use summary statistics of data from multiple loci. Appropriate data are those that fit an infinite-sites mutation model (Kimura 1969), and the summary statistics are those of Wakeley and Hey (1997) (the same that are used here to check the quality of the fit of the four-population model using simulated data). The method of Becquet and Przeworski assigns mutation scalars for the MCMC simulation directly from relative levels of outgroup divergence rather than allowing them to vary as parameters as is the case in the method used here (Hey and Nielsen 2004). However, their method has the advantage of being applicable to loci that have had histories of intralocus recombination, unlike the present method or those in the other studies included in table 3. The estimates of Becquet and Przeworski are qualitatively similar to those found here, although their estimated population sizes tend to be larger and their estimated time for the most recent population split is over twice what is reported here (table 3). They also found evidence of gene flow (significantly nonzero on the basis of reported CIs) between all three pairs of common chimpanzees. Much of the data that Becquet and Przeworski (2007) used for their chimpanzee study were the same as those used for the present study (Yu et al. 2003; Fischer et al. 2006), and so, it seems likely that the differences in the estimates are a function of the differences in the methods of analysis.

Caswell et al. (2008) collected genomic shotgun sequences from a bonobo and an eastern chimpanzee and considered these together with previously reported genomic data on western and central chimpanzees. They generated alignments for many short regions of the genome, each with data from four or five species and then estimated population sizes and splitting times using a series of moment estimators on branch length estimates. Their study includes a large amount of data; however, their method for estimating demographic history does not include migration parameters; and their approach is very different than the likelihood-based method used here or the approximation to likelihood that was used by Becquet and Przeworski (2007). Results of Caswell et al. resemble those found here and those of the other studies in table 3, particularly when CIs are considered. However, they report a splitting time for the bonobo and common chimpanzee, which is about 20% higher than that found here (1.1 Ma, after adjusting for the fact that Caswell et al. used a mutation rate based on a 7-My divergence between humans and chimpanzees), and their population size estimates are consistently larger than those reported here. In particular, estimated size of Caswell et al. of the central population is over four times the estimate reported here, and the CIs of the two studies for this population do not overlap (table 3). This contrast is noteworthy given our estimate that the immediate ancestor of this population had experienced gene flow from the eastern population. If a sampled or ancestral population had been receiving genes in a way that was not accounted for by the model, then we expect that the estimated sizes of that populations would be elevated by the additional unaccounted for variation that was introduced by that gene flow (Beerli 2004; Slatkin 2005; Won et al. 2005). In a separate analysis, Caswell et al. did find evidence of gene flow from central to western (as observed in the two-population analyses in fig. 2C) using simulations and patterns of differential single nucleotide polymorphism sharing among populations.

Some portion of the difference in effective population size estimates, between the current study and the studies of Caswell et al. and Becquet and Przeworski, may be due to a tendency of the current method to underestimate effective population sizes. Particularly for ancestral population sizes, and for smaller data sets, the current method exhibits a bias toward underestimates of population sizes in simulation studies (Hey, 2010).

Considering Intragenic Recombination

The assumption of zero intragenic recombination, within the genealogy of the sampled loci, is required by the methodology and yet is probably false for many data sets. Here, we have applied the usual practice of pruning the data for a locus to conform to a bifurcating genealogy by deleting all but one incongruent haplotype block (Hey and Nielsen 2004). This practice is widespread when preparing data for IM analyses, and yet it necessarily leads to a biased sample of loci. Recently, Strasburg and Rieseberg (2009) assessed the performance of the IMa program (Hey and Nielsen 2007) in the face of failed assumptions, including that of intragenic recombination. They simulated data with intragenic recombination and then applied the four-gamete criterion to identify haplotype blocks for inclusion in the IM analysis. As expected, data simulated with recombination, but then pruned to apparently nonrecombining blocks, lead to estimates of population sizes for the sampled populations, and especially the ancestral population, which are biased downwards. Splitting times and migration rate estimates were not much affected by this type of data pruning (Strasburg and Rieseberg 2009). The bias found by Strasburg and Rieseberg for ancestral population sizes may explain why the estimates obtained here are consistently lower than those obtained by Becquet and Przeworski (table 3) using a method that does not assume zero intragenic recombination.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

Supplementary Material

[Supplementary Data]

Acknowledgments

This work was supported by the United States National Oceanic and Atmospheric Administration Northwest Fisheries Science Center and by the National Institutes of Health (GM078204). This work benefited considerably from discussions with Rasmus Nielsen, Sang Chul Choi, and Yong Wang.

References

  • Angus S. Water-contact behavior of chimpanzees. Folia Primatol. 1971;14:51–58. [PubMed]
  • Barton NH. The role of hybridization in evolution. Mol Ecol. 2001;10:551–568. [PubMed]
  • Beaumont M. Joint determination of topology, divergence time, and immigration in population trees. In: Matsumura S, Forster P, Renfrew C, editors. Simulation, genetics, and human prehistory. Cambridge, United Kingdom: McDonald Institute for Archaeological Research; 2008. pp. 135–154.
  • Becquet C, Patterson N, Stone AC, Przeworski M, Reich D. Genetic structure of chimpanzee populations. PLoS Genet. 2007;3:e66. [PMC free article] [PubMed]
  • Becquet C, Przeworski M. A new approach to estimate parameters of speciation models with application to apes. Genome Res. 2007;17:1505–1519. [PMC free article] [PubMed]
  • Beerli P. Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations. Mol Ecol. 2004;13:827–836. [PubMed]
  • Brunet M, Guy F, Pilbeam D, et al. (38 co-authors) A new hominid from the Upper Miocene of Chad, Central Africa. Nature. 2002;418:145–151. [PubMed]
  • Caswell JL, Mallick S, Richter DJ, Neubauer J, Schirmer C, Gnerre S, Reich D. Analysis of chimpanzee history based on genome sequence alignments. PLoS Genet. 2008;4:e1000057. [PMC free article] [PubMed]
  • Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular evolution. Genetics. 1993;134:1289–1303. [PMC free article] [PubMed]
  • Chen FC, Li WH. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am J Hum Genet. 2001;68:444–456. [PMC free article] [PubMed]
  • Coolidge HJ. Pan paniscus: pygmy chimpanzee from south of the Congo River. Am J Phys Anthropol. 1933;18:2–57.
  • Deinard AS, Kidd K. Evolution of a HOXB6 intergenic region within the great apes and humans. J Hum Evol. 1999;36:687–703. [PubMed]
  • Deinard AS, Kidd K. Identifying conservation units within captive chimpanzee populations. Am J Phys Anthropol. 2000;111:25–44. [PubMed]
  • Endler JA. Geographic variation, speciation, and clines. Princeton (NJ): Princeton University Press; 1977. [PubMed]
  • Felsenstein J. Skepticism towards Santa Rosalia, or why are there so few kinds of animals. Evolution. 1981;35:124–138.
  • Ferris SD, Brown WM, Davidson WS, Wilson AC. Extensive polymorphism in the mitochondrial DNA of apes. Proc Natl Acad Sci USA. 1981;78:6319–6323. [PMC free article] [PubMed]
  • Fischer A, Pollack J, Thalmann O, Nickel B, Paabo S. Demographic history and genetic differentiation in apes. Curr Biol. 2006;16:1133–1138. [PubMed]
  • Gage TB. The comparative demography of primates: with some comments on the evolution of life histories. Ann Rev Anthropol. 1998;27:197–221. [PubMed]
  • Gagneux P, Wills C, Gerloff U, Tautz D, Morin PA, Boesch C, Fruth B, Hohmann G, Ryder OA, Woodruff DS. Mitochondrial sequences show diverse evolutionary histories of African hominoids. Proc Natl Acad Sci USA. 1999;96:5077–5082. [PMC free article] [PubMed]
  • Geyer CJ. Markov chain Monte Carlo maximum likelihood. In: Keramidas EM, editor. Computing science and statistics, Proceedings of the 23rd Symposium on the Interface. Seattle (WA): Interface Foundation of North America; 1991. pp. 156–163.
  • Glazko GV, Nei M. Estimation of divergence times for major lineages of primate species. Mol Biol Evol. 2003;20:424–434. [PubMed]
  • Gonder MK. Evolutionary genetics of chimpanzees in Nigeria and Cameroon. New York: Department of Anthropology, City University of New York; 2000.
  • Gonder MK, Disotell T, Oates J. New genetic evidence on the evolution of chimpanzee populations and implications for taxonomy. Int J Primatol. 2006;27:1103–1127.
  • Gonder MK, Oates JF, Disotell TR, Forstner MR, Morales JC, Melnick DJ. A new west African chimpanzee subspecies? Nature. 1997;388:337. [PubMed]
  • Groves C. Primate taxonomy. Washington, DC: Smithsonian Institution Press; 2001.
  • Hey J. Recent advances in assessing gene flow between diverging populations and species. Curr Opin Genet Dev. 2006;16:592–596. [PubMed]
  • Hey J. The divergence of chimpanzee species and subspecies as revealed in multi-population isolation-with-migration analyses. Mol Biol Evol. 2010 Advance access published December 2, 2009, doi:10.1093/molbev/msp298. [PMC free article] [PubMed]
  • Hey J, Machado CA. The study of structured populations—new hope for a difficult and divided science. Nat Rev Genet. 2003;4:535–543. [PubMed]
  • Hey J, Nielsen R. Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics. 2004;167:747–760. [PMC free article] [PubMed]
  • Hey J, Nielsen R. Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc Natl Acad Sci USA. 2007;104:2785–2790. [PMC free article] [PubMed]
  • Hill WCO. The nomenclature, taxonomy and distribution of chimpanzees. In: Bourne GH, editor. The chimpanzee. New York: Karger; 1969. pp. 22–49.
  • Hobolth A, Christensen OF, Mailund T, Schierup MH. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 2007;3:e7. [PMC free article] [PubMed]
  • Hudson RR, Kaplan NL. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics. 1985;111:147–164. [PMC free article] [PubMed]
  • Kaessmann H, Heissig F, von Haeseler A, Paabo S. DNA sequence variation in a non-coding region of low recombination on the human X chromosome. Nat Genet. 1999;22:78–81. [PubMed]
  • Kaessmann H, Wiebe V, Paabo S. Extensive nuclear DNA sequence diversity among chimpanzees. Science. 1999;286:1159–1162. [PubMed]
  • Kimura M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 1969;61:893–903. [PMC free article] [PubMed]
  • Kimura M. The neutral theory of molecular evolution. Cambridge, United Kingdom: Cambridge University Press; 1983.
  • Kormos R, Boesch C, Bakarr MI, Butynski TM. West African chimpanzees: status survey and conservation action plan. Cambridge, United Kingdom: IUCN Publication Unit; 2003.
  • Lebatard AE, Bourlès DL, Duringer P, Jolivet M, Braucher R, Carcaillet J, Schuster M, Arnaud N, Monié P, Lihoreau F. Cosmogenic nuclide dating of Sahelanthropus tchadensis and Australopithecus bahrelghazali: Mio-Pliocene hominids from Chad. Proc Natl Acad Sci USA. 2008;105:3226. [PMC free article] [PubMed]
  • Lockwood CA, Kimbel WH, Lynch JM. Morphometrics and hominoid phylogeny: support for a chimpanzee-human clade and differentiation among great ape subspecies. Proc Natl Acad Sci USA. 2004;101:4356–4360. [PMC free article] [PubMed]
  • Maynard Smith J. Sympatric speciation. Am Nat. 1966;100:637–650.
  • Millicent E, Thoday JM. Effects of disruptive selection. Heredity. 1961;16:199–217.
  • Miyamoto MM, Slightom JL, Goodman M. Phylogenetic relations of humans and African apes from DNA sequences in the psi eta-globin region. Science. 1987;238:369–373. [PubMed]
  • Morin PA, Moore JJ, Chakraborty R, Jin L, Goodall J, Woodruff DS. Kin selection, social structure, gene flow, and the evolution of chimpanzees. Science. 1994;265:1193–1201. [PubMed]
  • Morin PA, Moore JJ, Woodruff DS. Identification of chimpanzee subspecies with DNA from hair and allele-specific probes. Proc R Soc Lond B. 1992;249:293–297. [PubMed]
  • Nielsen R, Wakeley J. Distinguishing migration from isolation. A Markov chain Monte Carlo approach. Genetics. 2001;158:885–896. [PMC free article] [PubMed]
  • Noor MAF, Feder JL. Speciation genetics: evolving approaches. Nat Rev Genet. 2006;7:851–861. [PubMed]
  • Nosil P. Speciation with gene flow could be common. Mol Ecol. 2008;17:2103–2106. [PubMed]
  • Nosil P, Funk DJ, Ortiz-Barrientos D. Divergent selection and heterogeneous genomic divergence. Mol Ecol. 2009;18:375–402. [PubMed]
  • Oates J. Is the chimpanzee, Pan troglodytes, an endangered species? It depends on what “endangered” means. Primates. 2006;47:102–112. [PubMed]
  • Oates J, Groves CP, Jenkins PD. The type locality of Pan troglodytes vellerosus (Gray, 1862), and implications for the nomenclature of West African chimpanzees. Primates. 2009;50:78–80. [PubMed]
  • Pilbrow V. Population systematics of chimpanzees using molar morphometrics. J Hum Evol. 2006;51:646–662. [PubMed]
  • Rice WR, Hostert EF. Laboratory experiments on speciation: what have we learned in 40 years. Evolution. 1993;47:1637–1653.
  • Schwartz E. On the local races of the chimpanzee. Ann Mag Nat Hist Lond. 1934;13:576–583.
  • Shea BT, Coolidge HJ. Craniometric differentiation and systematics in the genus Pan. J Hum Evol. 1988;17:671–685.
  • Slatkin M. Seeing ghosts: the effect of unsampled populations on migration rates estimated for sampled populations. Mol Ecol. 2005;14:67–73. [PubMed]
  • Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001;68:978–989. [PMC free article] [PubMed]
  • Stone AC, Griffiths RC, Zegura SL, Hammer MF. High levels of Y-chromosome nucleotide diversity in the genus Pan. Proc Natl Acad Sci USA. 2002;99:43–48. [PMC free article] [PubMed]
  • Strasburg JL, Rieseberg LH. How robust are “Isolation with Migration” analyses to violations of the IM model? A simulation study. Mol Biol Evol. 2009 Advance access published September 30, 2009, doi:10.1093/molbev/msp233. [PMC free article] [PubMed]
  • Vignaud P, Duringer P, Mackaye HT, et al. (21 co-authors) Geology and palaeontology of the Upper Miocene Toros-Menalla hominid locality, Chad. Nature. 2002;418:152–155. [PubMed]
  • Wakeley J, Hey J. Estimating ancestral population parameters. Genetics. 1997;145:847–855. [PMC free article] [PubMed]
  • Wang RL, Wakeley J, Hey J. Gene flow and natural selection in the origin of Drosophila pseudoobscura and close relatives. Genetics. 1997;147:1091–1106. [PMC free article] [PubMed]
  • Wildman DE, Uddin M, Liu G, Grossman LI, Goodman M. Implications of natural selection in shaping 99.4% nonsynonymous DNA identity between humans and chimpanzees: enlarging genus Homo. Proc Natl Acad Sci USA. 2003;100:7181–7188. [PMC free article] [PubMed]
  • Won YJ, Hey J. Divergence population genetics of chimpanzees. Mol Biol Evol. 2005;22:297–307. [PubMed]
  • Won YJ, Sivasundar A, Wang Y, Hey J. On the origin of Lake Malawi cichlid species: a population genetic analysis of divergence. Proc Natl Acad Sci USA. 2005;102:6581–6586. [PMC free article] [PubMed]
  • Wooding S, Stone AC, Dunn DM, Mummidi S, Jorde LB, Weiss RK, Ahuja S, Bamshad MJ. Contrasting effects of natural selection on human and chimpanzee CC chemokine receptor 5. Am J Hum Genet. 2005;76:291–301. [PMC free article] [PubMed]
  • Wright S. Evolution in Mendelian populations. Genetics. 1931;16:97–159. [PMC free article] [PubMed]
  • Wright S. The genetical structure of populations. Ann Eugen. 1951;15:323–354. [PubMed]
  • Yu N, Jensen-Seaman MI, Chemnick L, Kidd JR, Deinard AS, Ryder O, Kidd KK, Li WH. Low nucleotide diversity in chimpanzees and bonobos. Genetics. 2003;164:1511–1518. [PMC free article] [PubMed]

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...