Population interconnectivity over the past 120,000 years explains distribution and diversity of Central African hunter-gatherers

Significance We combined ethnographic, archaeological, genetic, and paleoclimatic data to model the dynamics of Central African hunter-gatherer populations over the past 120,000 years. We show, against common assumptions, that their distribution and density are explained by changing environments rather than by a displacement following recent farming expansions, and that they have maintained large population sizes and genetic diversity, despite fluctuations in niche availability. Our results provide insights into the evolution of genetic and cultural diversity in Homo sapiens.


Supplementary Information Text Environmental Niche Modelling of distribution of CAHG using Favourability function
Besides their simplicity, GLMs are extremely powerful for modelling the distribution of species (or in our case populations) as a function of environmental conditions and have been previously shown to be useful in this particular study setting (1). They also allow us to calculate the favourability function (2)(3), which is based on the output of GLMs but cancels out uneven proportions of presences and absences in the modelled data. It thus assesses the extent to which the environmental conditions change the probability of occurrence of an organism with respect to its overall prevalence in the study area. This is necessary because when the proportions of presences and absences are not equal within the sample, the logistic regression output within the function's domain is not symmetrical, but rather deviates towards the extreme that has a greater number of cases (4).
In order to obtain predicted favourability values across our study area, we obtained the probability of presence (P) across the region using a forward-backward stepwise logistic regression. Subsequently, the favourability value (F) for each square was calculated applying the function described by Real  where n1 and n0 are the numbers of presences and absences, respectively. This function yields values ranging from 0 to 1, which are levelled in relation to the prevalence of the species and hence are different from probability or suitability values (5). For the extrapolation of F values, the following equation was used (4): where y is the logit link of the logistic regression equation, and e is the base of Napierian logarithms.
Predicted favourability values were converted to presence/absence predictions using a threshold value of F=0.5, which has been shown to be biologically meaningful, whilst also approximating the value that maximises sensitivity and specificity (4)( Table S10).

Estimation of camp and CAHG population density in the present and past using Favourability
Following other studies that have found a positive relationship between population density and environmental favourability (1,6), we also examined the association between hunter-gatherer population density and environmental favourability in the 50 grid cells (N = 75 camps) for which camp-size data were available.
After calculating population densities at each of the grid cell, removing outliers, and confirming the expected wedge-shaped relationship (see Methods, Main text, Fig.S4), we fitted linear quantile regressions to the 50 th , 55 th , 60 th , 65 th , 70 th , 75 th , 80 th , 90 th , 95 th and 99 th percentiles, and the R1 measure (weighted sum of absolute residuals) was calculated in each percentile as a local measure of goodness-of-fit (7).
Then, we estimated metapopulation sizes by first dividing the range of environmental favourability into three distinct categories, unfavourable: <0.2, medium: 0.2-0.5, favourable: >0.5. We then calculated the average CAHG population size empirically observed in grid cells for which population sizes were available (after removing outliers). Using these figures, we then calculated the potential population size (PPS) for every grid cell in the study area, according to their favourability values. Finally, we summed all PPS values for the entire study area, but applied the following correction to take territoriality into account: where the metapopulation is the net potential population size; GPPS is the gross potential population size resulting from the sum of the PPS values; GCS is the size of a grid cell (i.e. 123 km2); and ASA is the average subsistence area estimated for CAHG (i.e. 1,079 km2) (Fig.S19).

Estimation of past distributions of CAHGs from Favourability GLMs and validation with archaeological assemblages
As for suitability, low values of favourability indicate that conditions are environmentally unsuitable for the presence of hunter-gatherer camps, whereas high values indicate that conditions are suitable. Hence, we also projected our Favourability GLM into each of the 1000-or 2000-year time slice from the present to 120,000BP to obtain probability maps in our area of interest for each time period.
However, it is important to note that it is more theoretically justified to extrapolate suitability values than favourability values to the past as the purpose of such extrapolation is to characterise past environments according to how suitable they are for hosting hunter-gatherer populations (i.e. how likely they are to have presences) rather than designing a discriminatory algorithm to discern where these population would or would have not been at specific points in time.
We then determined whether our estimated probability landscapes in the past matched areas actually occupied by hunter-gatherer populations in the same way as for our favourability model but using the threshold of F=0.5 to binarize favourability values at the cells containing archaeological assemblages, and therefore to determine the number of archaeological sites located in cells with predicted presences.

Assessment of the relationship between ecology and rural population density
Any observed relationship between bioclimatic or ecological variables and CAHG presence and/or density could be the product of environmental contraints acting specifically on huntergatherers populations or instead of more general constraints on human adaptability that apply to all populations regardless of their subsistence style.
To contrast these possibilities, we ran a general additive model to determine the relationship between rural population density and the predictors used in our ENM models to predict the presence and density of Central African Hunter Gatherers. We chose K=4 knots to prevent overfitting (8) and applied smoothing splines to all continuous parameters and included "Biome" as parametric term.

Additional datasets of 14 C dates
To assess the robusticity of our results concerning the ability of our ENMs to predict the location and date of archaeological sites to particular choices regarding the inclusion criteria of particular 14 C dates, we created two additional datasets.
First, although the 14 C dates included from the sites located at Gombe Point and Rivière Denis are generally regarded as reliable and have been included in recent published surveys of Late Stone Age material in central Africa (9)(10)(11), some authors in the past expressed concerns due to them coming from bulk samples and potentially having been subjected to vertical disturbances (12). We therefore created an additional dataset excluding all dates from Gombe Point as well as Rivière Denis from the final dataset described in the Main Text (N=159).
Although worldwide the fabrication and use of ceramic pots by hunter-gatherers is both widespread and ancient (see for example Budja  We then applied the same filtering procedures as described in the main text to these two additional datasets to aggregate multiple dates from each grid cell and 1000-or 2000-year time interval.

Private allelic diversity within and across CAHG groups
To determine the extent of gene-flow between hunter-gatherer groups across Central Africa, we also calculated, for each population its allelic richness (number of distinct alleles in the population), private allelic richness (number of alleles private to the population) and the mean number of private alleles shared with every other CAHG population using ADZE (18). Since ADZE is particularly designed to standardise datasets in which different groups are unevenly sampled, we ran our analyses for maximum standardized sample size values between 2-9. See also Schlebusch et al. (19) for an example of the use of this same method to evaluate private allele sharing within and between Khoe-San populations.

Dataset S1. (separate file)
Hunter-gatherer camps included in our study.

Dataset S2. (separate file)
Hunter-gatherer archaeological sites included in our study.

Dataset S3. (separate file)
Initial list of all 14 C dates from our area of interest from the Late Pleistocene to the present.

Dataset S4. (separate file)
Final list of archaeological sites included in our study after merging multiple dates within the same grid cell and 1000-or 2000-year time interval.