**Grid of similarity coefficients**. Each square in the grid represents a similarity coefficient[] between 0 and 1 (0 corresponds to white, 1 to black) that measures the similarity between a *structure *run with a subset of the autosomal dataset and the entire dataset used by Rosenberg *et al. *[] The subsets varied in the number of loci used (labelled on the vertical axis) and the number of individuals used per region (on the horizontal axis). For region *i*, *N *individuals per region corresponds to min(*N*, *S*_{i}) individuals, where *S*_{i }is the total sample size of region *i*. Thus, five individuals per region corresponds to 35 individuals overall, ten per region to 70, 15 per region to 105, 20 per region to 140, 25 per region to 175, 35 per region to 245, 50 per region to 339, 75 per region to 489, 100 per region to 639 and 200 per region to 1,005 individuals total. Ten runs of each subset of the data were performed and the median similarity coefficient between the best subset run and ten runs of the entire dataset was used to generate a given square. Those values below the white line have a similarity coefficient of 50% or higher with the entire dataset. Using 150 loci or more, and 200 or more individuals per region, runs had similarity coefficients ranging from 0.87 to 0.98.

## PubMed Commons