Demographic history and rare allele sharing among human populations.
Altshuler DL, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Collins FS, De la Vega FM, Donnelly P, Egholm M, Flicek P, Gabriel SB, Gibbs RA, Knoppers BM, Lander ES, Lehrach H, Mardis ER, McVean GA, Nickerson DA, Peltonen L, Schafer AJ, Sherry ST, Wang J, Wilson RK, Gibbs RA, Deiros D, Metzker M, Muzny D, Reid J, Wheeler D, Wang J, Li J, Jian M, Li G, Li R, Liang H, Tian G, Wang B, Wang J, Wang W, Yang H, Zhang X, Zheng H, Lander ES, Altshuler DL, Ambrogio L, Bloom T, Cibulskis K, Fennell TJ, Gabriel SB, Jaffe DB, Shefler E, Sougnez CL, Bentley DR, Gormley N, Humphray S, Kingsbury Z, Koko-Gonzales P, Stone J, McKernan KJ, Costa GL, Ichikawa JK, Lee CC, Sudbrak R, Lehrach H, Borodina TA, Dahl A, Davydov AN, Marquardt P, Mertes F, Nietfeld W, Rosenstiel P, Schreiber S, Soldatov AV, Timmermann B, Tolzmann M, Egholm M, Affourtit J, Ashworth D, Attiya S, Bachorski M, Buglione E, Burke A, Caprio A, Celone C, Clark S, Conners D, Desany B, Gu L, Guccione L, Kao K, Kebbel A, Knowlton J, Labrecque M, McDade L, Mealmaker C, Minderman M, Nawrocki A, Niazi F, Pareja K, Ramenani R, Riches D, Song W, Turcotte C, Wang S, Mardis ER, Wilson RK, Dooling D, Fulton L, Fulton R, Weinstock G, Durbin RM, Burton J, Carter DM, Churcher C, Coffey A, Cox A, Palotie A, Quail M, Skelly T, Stalker J, Swerdlow HP, Turner D, De Witte A, Giles S, Gibbs RA, Wheeler D, Bainbridge M, Challis D, Sabo A, Yu F, Yu J, Wang J, Fang X, Guo X, Li R, Li Y, Luo R, Tai S, Wu H, Zheng H, Zheng X, Zhou Y, Li G, Wang J, Yang H, Marth GT, Garrison EP, Huang W, Indap A, Kural D, Lee WP, Leong WF, Quinlan AR, Stewart C, Stromberg MP, Ward AN, Wu J, Lee C, Mills RE, Shi X, Daly MJ, DePristo MA, Altshuler DL, Ball AD, Banks E, Bloom T, Browning BL, Cibulskis K, Fennell TJ, Garimella KV, Grossman SR, Handsaker RE, Hanna M, Hartl C, Jaffe DB, Kernytsky AM, Korn JM, Li H, Maguire JR, McCarroll SA, McKenna A, Nemesh JC, Philippakis AA, Poplin RE, Price A, Rivas MA, Sabeti PC, Schaffner SF, Shefler E, Shlyakhter IA, Cooper DN, Ball EV, Mort M, Phillips AD, Stenson PD, Sebat J, Makarov V, Ye K, Yoon SC, Bustamante CD, Clark AG, Boyko A, Degenhardt J, Gravel S, Gutenkunst RN, Kaganovich M, Keinan A, Lacroute P, Ma X, Reynolds A, Clarke L, Flicek P, Cunningham F, Herrero J, Keenen S, Kulesha E, Leinonen R, McLaren WM, Radhakrishnan R, Smith RE, Zalunin V, Zheng-Bradley X, Korbel JO, Stütz AM, Humphray S, Bauer M, Cheetham RK, Cox T, Eberle M, James T, Kahn S, Murray L, Chakravarti A, Ye K, De la Vega FM, Fu Y, Hyland FC, Manning JM, McLaughlin SF, Peckham HE, Sakarya O, Sun YA, Tsung EF, Batzer MA, Konkel MK, Walker JA, Sudbrak R, Albrecht MW, Amstislavskiy VS, Herwig R, Parkhomchuk DV, Sherry ST, Agarwala R, Khouri HM, Morgulis AO, Paschall JE, Phan LD, Rotmistrovsky KE, Sanders RD, Shumway MF, Xiao C, McVean GA, Auton A, Iqbal Z, Lunter G, Marchini JL, Moutsianas L, Myers S, Tumian A, Desany B, Knight J, Winer R, Craig DW, Beckstrom-Sternberg SM, Christoforides A, Kurdoglu AA, Pearson JV, Sinari SA, Tembe WD, Haussler D, Hinrichs AS, Katzman SJ, Kern A, Kuhn RM, Przeworski M, Hernandez RD, Howie B, Kelley JL, Melton SC, Abecasis GR, Li Y, Anderson P, Blackwell T, Chen W, Cookson WO, Ding J, Kang HM, Lathrop M, Liang L, Moffatt MF, Scheet P, Sidore C, Snyder M, Zhan X, Zöllner S, Awadalla P, Casals F, Idaghdour Y, Keebler J, Stone EA, Zilversmit M, Jorde L, Xing J, Eichler EE, Aksay G, Alkan C, Hajirasouliha I, Hormozdiari F, Kidd JM, Sahinalp SC, Sudmant PH, Mardis ER, Chen K, Chinwalla A, Ding L, Koboldt DC, McLellan MD, Dooling D, Weinstock G, Wallis JW, Wendl MC, Zhang Q, Durbin RM, Albers CA, Ayub Q, Balasubramaniam S, Barrett JC, Carter DM, Chen Y, Conrad DF, Danecek P, Dermitzakis ET, Hu M, Huang N, Hurles ME, Jin H, Jostins L, Keane TM, Le SQ, Lindsay S, Long Q, MacArthur DG, Montgomery SB, Parts L, Stalker J, Tyler-Smith C, Walter K, Zhang Y, Gerstein MB, Snyder M, Abyzov A, Balasubramanian S, Bjornson R, Du J, Grubert F, Habegger L, Haraksingh R, Jee J, Khurana E, Lam HY, Leng J, Mu XJ, Urban AE, Zhang Z, Li Y, Luo R, Marth GT, Garrison EP, Kural D, Quinlan AR, Stewart C, Stromberg MP, Ward AN, Wu J, Lee C, Mills RE, Shi X, McCarroll SA, Banks E, DePristo MA, Handsaker RE, Hartl C, Korn JM, Li H, Nemesh JC, Sebat J, Makarov V, Ye K, Yoon SC, Degenhardt J, Kaganovich M, Clarke L, Smith RE, Zheng-Bradley X, Korbel JO, Humphray S, Cheetham RK, Eberle M, Kahn S, Murray L, Ye K, De la Vega FM, Fu Y, Peckham HE, Sun YA, Batzer MA, Konkel MK, Walker JA, Xiao C, Iqbal Z, Desany B, Blackwell T, Snyder M, Xing J, Eichler EE, Aksay G, Alkan C, Hajirasouliha I, Hormozdiari F, Kidd JM, Chen K, Chinwalla A, Ding L, McLellan MD, Wallis JW, Hurles ME, Conrad DF, Walter K, Zhang Y, Gerstein MB, Snyder M, Abyzov A, Du J, Grubert F, Haraksingh R, Jee J, Khurana E, Lam HY, Leng J, Mu XJ, Urban AE, Zhang Z, Gibbs RA, Bainbridge M, Challis D, Coafra C, Dinh H, Kovar C, Lee S, Muzny D, Nazareth L, Reid J, Sabo A, Yu F, Yu J, Marth GT, Garrison EP, Indap A, Leong WF, Quinlan AR, Stewart C, Ward AN, Wu J, Cibulskis K, Fennell TJ, Gabriel SB, Garimella KV, Hartl C, Shefler E, Sougnez CL, Wilkinson J, Clark AG, Gravel S, Grubert F, Clarke L, Flicek P, Smith RE, Zheng-Bradley X, Sherry ST, Khouri HM, Paschall JE, Shumway MF, Xiao C, McVean GA, Katzman SJ, Abecasis GR, Blackwell T, Mardis ER, Dooling D, Fulton L, Fulton R, Koboldt DC, Durbin RM, Balasubramaniam S, Coffey A, Keane TM, MacArthur DG, Palotie A, Scott C, Stalker J, Tyler-Smith C, Gerstein MB, Balasubramanian S, Chakravarti A, Knoppers BM, Abecasis GR, Bustamante CD, Gharani N, Gibbs RA, Jorde L, Kaye JS, Kent A, Li T, McGuire AL, McVean GA, Ossorio PN, Rotimi CN, Su Y, Toji LH, Tyler-Smith C, Brooks LD, Felsenfeld AL, McEwen JE, Abdallah A, Juenger CR, Clemm NC, Collins FS, Duncanson A, Green ED, Guyer MS, Peterson JL, Schafer AJ, Abecasis GR, Altshuler DL, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA.
Abstract
High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.
Fig. 1.
The two-population joint SFS from panels of Chinese individuals from Beijing (CHB) and Yoruba individuals from Ibadan, Nigeria (YRI) for variants occurring in less than 15 of 100 sequenced chromosomes in both panels. Of the 3,366 variants in the overlap of the two panels, all but 194 sites are private to a single population.
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-11988.
Fig. 2.
Joint allele SFSs (all sites) for selected pairs of populations from the exome sequencing panel (Left) compared with expected spectra under site by site population label permutation (Center). Shown are sites occurring in, at most, 15 of 100 chromosomes. All population pairs, including two different panels of Chinese individuals sampled in Beijing, China and Denver, Colorado (CHB and CHD), as well as two groups of European origin (CEU and Tuscans from Italy, or TSI), show substantial residuals for rare variants (Right), consistent with reduced sharing. White bins contain less than one count.
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-11988.
Fig. 3.
The probability that two individuals carrying an allele of given minor frequency come from different populations, normalized by the expected frequency in a panmictic population, using the seven panels of the exome capture dataset. Sharing decreases dramatically as frequency approaches zero. The reduction in sharing at 50% frequency in some population pairs is caused by low overall numbers in that bin, and a single site (rs6662929) that exhibits inconsistent calls between different calling platforms and most likely has an incorrect homozygous reference call in some populations. Sites were binned by frequency: dots indicate the center of each bin, and solid lines are to guide the eye. Note that singletons are not shown, because there can be no sharing for such sites.
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-11988.
Fig. 4.
An illustration of the inferred demographic model, with line width corresponding to population size and time flowing from left to right. The width of the red arrows is proportional to the migration intensity. Model details are provided in and ref. .
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-11988.
Fig. 5.
Observed and projected numbers of synonymous and nonsynonymous variants in CEU, CHB + JPT, and YRI as a function of the sample size (two times the number of individuals sequenced). Long and short dashes correspond to jackknife and model-based projections for synonymous sites, respectively. The dotted lines are jackknife projections for nonsynonymous sites. Discrepancies between projections are accounted for by the difference between the model prediction and observed number of singletons in the data.
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-11988.
Publication type
MeSH terms
Grant support
Full Text Sources
Other Literature Sources
Medical