• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jul 19, 2011; 108(29): 11983–11988.
Published online Jul 5, 2011. doi:  10.1073/pnas.1019276108
PMCID: PMC3142009
Genetics

Demographic history and rare allele sharing among human populations

Simon Gravel,a Brenna M. Henn,a Ryan N. Gutenkunst,b Amit R. Indap,c Gabor T. Marth,c Andrew G. Clark,d Fuli Yu,e Richard A. Gibbs,e The 1000 Genomes Project,e and Carlos D. Bustamantea,1
aDepartment of Genetics, Stanford University School of Medicine, Stanford, CA 94305-5120;
bDepartment of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721;
cDepartment of Biology, Boston College, Chestnut Hill, MA 02467;
dDepartment of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853; and
eHuman Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030
David L. Altshuler, Co-Chair, Richard M. Durbin, Co-Chair, Gonçalo R. Abecasis, David R. Bentley, Aravinda Chakravarti, Andrew G. Clark, Francis S. Collins, Francisco M. De La Vega, Peter Donnelly, Michael Egholm, Paul Flicek, Stacey B. Gabriel, Richard A. Gibbs, Bartha M. Knoppers, Eric S. Lander, Hans Lehrach, Elaine R. Mardis, Gil A. McVean, Debbie A. Nickerson, Leena Peltonen, Alan J. Schafer, Stephen T. Sherry, Jun Wang, Richard K. Wilson, Richard A. Gibbs, Principal Investigator, David Deiros, Mike Metzker, Donna Muzny, Jeff Reid, David Wheeler, Jun Wang, Principal Investigator, Jingxiang Li, Min Jian, Guoqing Li, Ruiqiang Li, Huiqing Liang, Geng Tian, Bo Wang, Jian Wang, Wei Wang, Huanming Yang, Xiuqing Zhang, Huisong Zheng, Eric S. Lander, Principal Investigator, David L. Altshuler, Lauren Ambrogio, Toby Bloom, Kristian Cibulskis, Tim J. Fennell, Stacey B. Gabriel, Co-Chair, David B. Jaffe, Erica Shefler, Carrie L. Sougnez, David R. Bentley, Principal Investigator, Niall Gormley, Sean Humphray, Zoya Kingsbury, Paula Koko-Gonzales, Jennifer Stone, Kevin J. McKernan, Principal Investigator, Gina L. Costa, Jeffry K. Ichikawa, Clarence C. Lee, Ralf Sudbrak, Project Leader, Hans Lehrach, Principal Investigator, Tatiana A. Borodina, Andreas Dahl, Alexey N. Davydov, Peter Marquardt, Florian Mertes, Wilfiried Nietfeld, Philip Rosenstiel, Stefan Schreiber, Aleksey V. Soldatov, Bernd Timmermann, Marius Tolzmann, Michael Egholm, Principal Investigator, Jason Affourtit, Dana Ashworth, Said Attiya, Melissa Bachorski, Eli Buglione, Adam Burke, Amanda Caprio, Christopher Celone, Shauna Clark, David Conners, Brian Desany, Lisa Gu, Lorri Guccione, Kalvin Kao, Andrew Kebbel, Jennifer Knowlton, Matthew Labrecque, Louise McDade, Craig Mealmaker, Melissa Minderman, Anne Nawrocki, Faheem Niazi, Kristen Pareja, Ravi Ramenani, David Riches, Wanmin Song, Cynthia Turcotte, Shally Wang, Elaine R. Mardis, Co-Chair, Co-Principal Investigator, Richard K. Wilson, Co-Principal Investigator, David Dooling, Lucinda Fulton, Robert Fulton, George Weinstock, Richard M. Durbin, Principal Investigator, John Burton, David M. Carter, Carol Churcher, Alison Coffey, Anthony Cox, Aarno Palotie, Michael Quail, Tom Skelly, James Stalker, Harold P. Swerdlow, Daniel Turner, Anniek De Witte, Shane Giles, Richard A. Gibbs, Principal Investigator, David Wheeler, Matthew Bainbridge, Danny Challis, Aniko Sabo, Fuli Yu, Jin Yu, Jun Wang, Principal Investigator, Xiaodong Fang, Xiaosen Guo, Ruiqiang Li, Yingrui Li, Ruibang Luo, Shuaishuai Tai, Honglong Wu, Hancheng Zheng, Xiaole Zheng, Yan Zhou, Guoqing Li, Jian Wang, Huanming Yang, Gabor T. Marth, Principal Investigator, Erik P. Garrison, Weichun Huang, Amit Indap, Deniz Kural, Wan-Ping Lee, Wen Fung Leong, Aaron R. Quinlan, Chip Stewart, Michael P. Stromberg, Alistair N. Ward, Jiantao Wu, Charles Lee, Principal Investigator, Ryan E. Mills, Xinghua Shi, Mark J. Daly, Principal Investigator, Mark A. DePristo, Project Leader, David L. Altshuler, Aaron D. Ball, Eric Banks, Toby Bloom, Brian L. Browning, Kristian Cibulskis, Tim J. Fennell, Kiran V. Garimella, Sharon R. Grossman, Robert E. Handsaker, Matt Hanna, Chris Hartl, David B. Jaffe, Andrew M. Kernytsky, Joshua M. Korn, Heng Li, Jared R. Maguire, Steven A. McCarroll, Aaron McKenna, James C. Nemesh, Anthony A. Philippakis, Ryan E. Poplin, Alkes Price, Manuel A. Rivas, Pardis C. Sabeti, Stephen F. Schaffner, Erica Shefler, Ilya A. Shlyakhter, David N. Cooper, Principal Investigator, Edward V. Ball, Matthew Mort, Andrew D. Phillips, Peter D. Stenson, Jonathan Sebat, Principal Investigator, Vladimir Makarov, Kenny Ye, Seungtai C. Yoon, Carlos D. Bustamante, Co-Principal Investigator, Andrew G. Clark, Co-Principal Investigator, Adam Boyko, Jeremiah Degenhardt, Simon Gravel, Ryan N. Gutenkunst, Mark Kaganovich, Alon Keinan, Phil Lacroute, Xin Ma, Andy Reynolds, Laura Clarke, Project Leader, Paul Flicek, Co-Chair, DCC, Principal Investigator, Fiona Cunningham, Javier Herrero, Stephen Keenen, Eugene Kulesha, Rasko Leinonen, William M. McLaren, Rajesh Radhakrishnan, Richard E. Smith, Vadim Zalunin, Xiangqun Zheng-Bradley, Jan O. Korbel, Principal Investigator, Adrian M. Stütz, Sean Humphray, Project Leader, Markus Bauer, R. Keira Cheetham, Tony Cox, Michael Eberle, Terena James, Scott Kahn, Lisa Murray, Aravinda Chakravarti, Kai Ye, Francisco M. De La Vega, Principal Investigator, Yutao Fu, Fiona C. L. Hyland, Jonathan M. Manning, Stephen F. McLaughlin, Heather E. Peckham, Onur Sakarya, Yongming A. Sun, Eric F. Tsung, Mark A. Batzer, Principal Investigator, Miriam K. Konkel, Jerilyn A. Walker, Ralf Sudbrak, Project Leader, Marcus W. Albrecht, Vyacheslav S. Amstislavskiy, Ralf Herwig, Dimitri V. Parkhomchuk, Stephen T. Sherry, Co-Chair, DCC, Principal Investigator, Richa Agarwala, Hoda M. Khouri, Aleksandr O. Morgulis, Justin E. Paschall, Lon D. Phan, Kirill E. Rotmistrovsky, Robert D. Sanders, Martin F. Shumway, Chunlin Xiao, Gil A. McVean, Co-Chair, Population Genetics, Principal Investigator, Adam Auton, Zamin Iqbal, Gerton Lunter, Jonathan L. Marchini, Loukas Moutsianas, Simon Myers, Afidalina Tumian, Brian Desany, Project Leader, James Knight, Roger Winer, David W. Craig, Principal Investigator, Steve M. Beckstrom-Sternberg, Alexis Christoforides, Ahmet A. Kurdoglu, John V. Pearson, Shripad A. Sinari, Waibhav D. Tembe, David Haussler, Principal Investigator, Angie S. Hinrichs, Sol J. Katzman, Andrew Kern, Robert M. Kuhn, Molly Przeworski, Co-Chair, Population Genetics, Principal Investigator, Ryan D. Hernandez, Bryan Howie, Joanna L. Kelley, S. Cord Melton, Gonçalo R. Abecasis, Co-Chair Principal Investigator, Yun Li, Project Leader, Paul Anderson, Tom Blackwell, Wei Chen, William O. Cookson, Jun Ding, Hyun Min Kang, Mark Lathrop, Liming Liang, Miriam F. Moffatt, Paul Scheet, Carlo Sidore, Matthew Snyder, Xiaowei Zhan, Sebastian Zöllner, Philip Awadalla, Principal Investigator, Ferran Casals, Youssef Idaghdour, John Keebler, Eric A. Stone, Martine Zilversmit, Lynn Jorde, Principal Investigator, Jinchuan Xing, Evan E. Eichler, Principal Investigator, Gozde Aksay, Can Alkan, Iman Hajirasouliha, Fereydoun Hormozdiari, Jeffrey M. Kidd, S. Cenk Sahinalp, Peter H. Sudmant, Elaine R. Mardis, Co-Principal Investigator, Ken Chen, Asif Chinwalla, Li Ding, Daniel C. Koboldt, Mike D. McLellan, David Dooling, George Weinstock, John W. Wallis, Michael C. Wendl, Qunyuan Zhang, Richard M. Durbin, Principal Investigator, Cornelis A. Albers, Qasim Ayub, Senduran Balasubramaniam, Jeffrey C. Barrett, David M. Carter, Yuan Chen, Donald F. Conrad, Petr Danecek, Emmanouil T. Dermitzakis, Min Hu, Ni Huang, Matt E. Hurles, Hanjun Jin, Luke Jostins, Thomas M. Keane, Si Quang Le, Sarah Lindsay, Quan Long, Daniel G. MacArthur, Stephen B. Montgomery, Leopold Parts, James Stalker, Chris Tyler-Smith, Klaudia Walter, Yujun Zhang, Mark B. Gerstein, Co-Principal Investigator, Michael Snyder, Co-Principal Investigator, Alexej Abyzov, Suganthi Balasubramanian, Robert Bjornson, Jiang Du, Fabian Grubert, Lukas Habegger, Rajini Haraksingh, Justin Jee, Ekta Khurana, Hugo Y. K. Lam, Jing Leng, Xinmeng Jasmine Mu, Alexander E. Urban, Zhengdong Zhang, Yingrui Li, Ruibang Luo, Gabor T. Marth, Principal Investigator, Erik P. Garrison, Deniz Kural, Aaron R. Quinlan, Chip Stewart, Michael P. Stromberg, Alistair N. Ward, Jiantao Wu, Charles Lee, Co-Chair Principal Investigator, Ryan E. Mills, Xinghua Shi, Steven A. McCarroll, Project Leader, Eric Banks, Mark A. DePristo, Robert E. Handsaker, Chris Hartl, Joshua M. Korn, Heng Li, James C. Nemesh, Jonathan Sebat, Principal Investigator, Vladimir Makarov, Kenny Ye, Seungtai C. Yoon, Jeremiah Degenhardt, Mark Kaganovich, Laura Clarke, Project Leader, Richard E. Smith, Xiangqun Zheng-Bradley, Jan O. Korbel, Sean Humphray, Project Leader, R. Keira Cheetham, Michael Eberle, Scott Kahn, Lisa Murray, Kai Ye, Francisco M. De La Vega, Principal Invesigator, Yutao Fu, Heather E. Peckham, Yongming A. Sun, Mark A. Batzer, Principal Investigator, Miriam K. Konkel, Jerilyn A. Walker, Chunlin Xiao, Zamin Iqbal, Brian Desany, Tom Blackwell, Project Leader, Matthew Snyder, Jinchuan Xing, Evan E. Eichler, Co-Chair, Principal Investigator, Gozde Aksay, Can Alkan, Iman Hajirasouliha, Fereydoun Hormozdiari, Jeffrey M. Kidd, Ken Chen, Asif Chinwalla, Li Ding, Mike D. McLellan, John W. Wallis, Matt E. Hurles, Co-Chair, Principal Investigator, Donald F. Conrad, Klaudia Walter, Yujun Zhang, Mark B. Gerstein, Co-Principal Investigator, Michael Snyder, Co-Principal Investigator, Alexej Abyzov, Jiang Du, Fabian Grubert, Rajini Haraksingh, Justin Jee, Ekta Khurana, Hugo Y. K. Lam, Jing Leng, Xinmeng Jasmine Mu, Alexander E. Urban, Zhengdong Zhang, Richard A. Gibbs, Co-Chair, Principal Investigator, Matthew Bainbridge, Danny Challis, Cristian Coafra, Huyen Dinh, Christie Kovar, Sandy Lee, Donna Muzny, Lynne Nazareth, Jeff Reid, Aniko Sabo, Fuli Yu, Jin Yu, Gabor T. Marth, Co-Chair, Principal Investigator, Erik P. Garrison, Amit Indap, Wen Fung Leong, Aaron R. Quinlan, Chip Stewart, Alistair N. Ward, Jiantao Wu, Kristian Cibulskis, Tim J. Fennell, Stacey B. Gabriel, Kiran V. Garimella, Chris Hartl, Erica Shefler, Carrie L. Sougnez, Jane Wilkinson, Andrew G. Clark, Co-Principal Investigator, Simon Gravel, Fabian Grubert, Laura Clarke, Project Leader, Paul Flicek, Principal Investigator, Richard E. Smith, Xiangqun Zheng-Bradley, Stephen T. Sherry, Principal Investigator, Hoda M. Khouri, Justin E. Paschall, Martin F. Shumway, Chunlin Xiao, Gil A. McVean, Sol J. Katzman, Gonçalo R. Abecasis, Principal Investigator, Tom Blackwell, Elaine R. Mardis, Principal Investigator, David Dooling, Lucinda Fulton, Robert Fulton, Daniel C. Koboldt, Richard M. Durbin, Principal Investigator, Senduran Balasubramaniam, Allison Coffey, Thomas M. Keane, Daniel G. MacArthur, Aarno Palotie, Carol Scott, James Stalker, Chris Tyler-Smith, Mark B. Gerstein, Principal Investigator, Suganthi Balasubramanian, Aravinda Chakravarti, Co-Chair, Bartha M. Knoppers, Co-Chair, Gonçalo R. Abecasis, Carlos D. Bustamante, Neda Gharani, Richard A. Gibbs, Lynn Jorde, Jane S. Kaye, Alastair Kent, Taosha Li, Amy L. McGuire, Gil A. McVean, Pilar N. Ossorio, Charles N. Rotimi, Yeyang Su, Lorraine H. Toji, Chris TylerSmith, Lisa D. Brooks, Adam L. Felsenfeld, Jean E. McEwen, Assya Abdallah, Christopher R. Juenger, Nicholas C. Clemm, Francis S. Collins, Audrey Duncanson, Eric D. Green, Mark S. Guyer, Jane L. Peterson, Alan J. Schafer, Gonçalo R. Abecasis, David L. Altshuler, Adam Auton, Lisa D. Brooks, Richard M. Durbin, Richard A. Gibbs, Matt E. Hurles, and Gil A. McVean

Abstract

High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2–4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ~1,000 sequenced chromosomes per population, whereas ~2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.

Keywords: demographic inference, genetic drift, population genetics, human evolution

The Thousand Genomes Project (1000G) is the most extensive study to date of human genomic diversity (1). The pilot phase of the project consisted of whole-genome, low-coverage sequencing of 179 samples from four HapMap populations at 2–4× coverage, an exon pilot experiment that targeted exons from over 800 genes in 697 samples across seven HapMap populations with ~50× coverage, and a trio pilot focusing on two mother–father–child trios (1). In this article, we present an approach for combining the low-coverage and the exon pilot data, and use it to estimate the joint allele frequency spectrum for individuals of European origin in Utah (CEU), Han Chinese individuals in Beijing (CHB), Japanese individuals in Tokyo (JPT), and Yoruba individuals in Ibadan, Nigeria (YRI). Our motivation for this analysis is that two pilot projects provide complementary information: the low-coverage pilot captures most of the common variation in the populations sequenced across the accessible human genome at the cost of missing some of the rarer variants, whereas the target capture data provide a more complete picture of rare variants on an interesting subset of the data. In this article, we are interested in leveraging the strengths of the exon and low-coverage pilots to obtain accurate estimates of population genetic parameters. We will focus in particular on the P-population site frequency spectrum Φ, a P-dimensional histogram that records the joint distribution of diallelic SNPs as displayed in Fig. 1.

Fig. 1.
The two-population joint SFS from panels of Chinese individuals from Beijing (CHB) and Yoruba individuals from Ibadan, Nigeria (YRI) for variants occurring in less than 15 of 100 sequenced chromosomes in both panels. Of the 3,366 variants in the overlap ...

More specifically, the value Φ(f1, f2, …, fP) of bin (f1, f2, …, fP) is the number of SNPs that occurs in f1 chromosomes from population 1, f2 chromosomes from population 2, etc. Because the allele frequency in diploid population i ranges from 0 to 2ni, where ni is the number of individuals sequenced in this population, Φ is a (2n1 + 1) × (2n2 + 1) × … × (2nP + 1) array. Because the number of individuals who are successfully sequenced at any given site may vary, ni is, in practice, chosen to be somewhat smaller than the total number of individual sequenced, and each site with n > ni sequenced individuals contributes to bin f in proportion to the probability that one finds f derived alleles in a random selection of ni of the n samples.

The one-population site frequency spectrum (SFS) is a staple of population genetics and is commonly used to reveal broad patterns of selection (2, 3) and demography (35). The multiple-population SFS has received increased attention recently (610), because it provides additional information about between-population structure. Many standard population genetic statistics, such as FST and Tajima's D, are summaries of the multiple-population SFS.

In this article, we study SFSs derived from the 1000G pilot project data. We develop methods to precisely estimate SFSs from high-throughput sequencing data and use this information to estimate demographic parameters for a detailed Out-of-Africa demographic model by using [partial differential]a[partial differential]i (6), a software package that uses diffusion approximation to calculate expected SFSs across multiple populations (6, 10). We use these parameters to predict the number of variants to be discovered as the number of sequenced samples in the 1000G is increased. We also present a jackknife-based approach to the prediction of the number of undiscovered variants and compare the predictions of the two approaches.

Theory

Linear Error Model for SFSs from Low-Coverage Data.

Data of the kind collected by the 1000G low-coverage pilot (1) (2–4× coverage across 179 individuals) provide a large volume of data from which precise demographic inference can be drawn. However, the low coverage leads to biases that must be addressed to ensure accuracy of the inference (11, 12). We use an empirical approach to tune an error model for low-coverage sequencing based on a direct comparison of the SFSs generated by whole genome and capture experiments on the part of the genome sequenced by both experiments.

The usefulness of this approach relies on two observations. First, demographic inference does not require the knowledge of the particular sites that are variable, but rather requires statistical averages over all sites. Although it is impossible to infer which variable sites were missed, the average number of missed sites can be estimated directly. Second, the most significant bias caused by low coverage in the 1000G data is an elevated false-negative rate for rare variant genotype calls (1). Because the majority of genetic variants are rare, it is possible to infer error rates for such variants based on high-quality sequence data from a relatively small subset of the genome.

Because the SFS does not keep track of linkage information, we use an error model that acts independently on each genomic site. We suppose that the underlying true SFS An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i1.jpg and the observed SFS Si for population p are related by a linear error model: An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i2.jpg. In this model, An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i3.jpg represents the proportion of sites with true frequency i′ that are assigned to frequency i. In the three-population case, which we will consider below, this model generalizes to

equation image

If we have Nc frequency bins per population, the number of parameters in this model is An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i4.jpg. We therefore need to introduce additional simplifying assumptions (justifications are provided below):

  • i) The errors occur independently in each population: An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i5.jpg.
  • ii) The probability An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i6.jpg of missing a site decays exponentially with the number of variants present in the population: An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i7.jpg
  • iii) If a site is found to be variable in one population, its frequency is estimated accurately: An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i8.jpg

The resulting model has six parameters (amplitudes αp and error decay rates βp) and captures the bulk of the discrepancy between the high- and low-coverage data. It is motivated by the following observations:

  • i) The low-coverage SNP calls were made population by population. We assume, first, that the leading source of error is an insufficient number of variant reads to confidently call a variant and, second, that in a low-coverage experiment, the uncorrelated sampling fluctuations in read numbers play the largest role in the variation in read numbers.
  • ii) Variant calls in 1000G require multiple independent observations of a variant across a population to rule out read errors and call a variant genotype. This stringency strongly reduces the rate of false-positive calls, but it results in missing actual variants at a rate that depends on the expected number of nonreference reads observed at a given position across a population (1). The decay in the probability of detecting less than a fixed, small number c of reads for a variant present in i of N chromosome sequenced at depth d is dominated by An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i9.jpg. We therefore fit a heuristic exponential error model αe−βi, where the effective read depth 2β accounts for fluctuations in read depth and read quality across the genome (Table 1).
    Table 1.
    Parameter values in the error model (Eq. 1)
  • iii) After enough variant reads are found in a 1000G population to justify a variant call, a single variant read is sufficient to make a genotype variant call, hence a reduced false-negative rate and low systematic bias in estimated frequencies when a variant has been identified.

Estimating the Parameters of the Error Model.

Because errors are assumed to occur independently in each population, the error rates can be inferred directly from error rates in single-population SFSs. We compute the single-population SFSs at sites that are found to be variable in the exon pilot data and with at least 80 genotype calls in all three populations. The exon pilot and low-coverage pilot are then compared, and the optimal parameters αp and βp are obtained through a linear fit using the first three frequency bins of the compared spectra.

Note that this error model can be inverted to give a correction model for the SFS, which does not require the knowledge of the number of fixed variants (SI Appendix). However, the correction model may involve the subtraction of large numbers and has non-Poisson uncertainties. When inferring demographic parameters by maximum likelihood of a Poisson Random Field (6), we therefore incorporate the error model in our demographic model rather than attempt to correct the SFS.

Prediction of the Rate of Variant Discovery.

One practical use of inferring a demographic model is the ability to predict the number of variants that will be discovered in subsequent experiments. To study the impact of model choice on such predictions, we propose an alternate predictor of discovery rate based on sampling theory and inspired by an analogy with capture–recapture approaches to estimating animal population sizes (13,15) (let us consider rabbits for definiteness). In this analogy, a rabbit is akin to a SNP, a field trip is akin to an individual sequenced, and a rabbit capture is akin to the identification of a variant in a sequenced individual. In the absence of measurement errors, the probability of identifying a variant in a randomly chosen sequenced individual is proportional to the frequency of the variant in the population. This distribution of probabilities is akin to the variability in rabbit capture probability; a common SNP is akin to a trap-happy rabbit, and a rare SNP is akin to a trap-shy rabbit.

We propose a population genetics analog of the Burnham–Overton jackknife (16, 17) to estimate the total number V(N) of segregating sites in a sample of N chromosomes based on a subsample of n-sequenced chromosomes. This jackknife estimator uses the assumption that

equation image

where An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i10.jpg for a fixed jackknife order p. Explicit expressions for the An external file that holds a picture, illustration, etc.
Object name is pnas.1019276108i11.jpg as well as performance benchmarking and additional discussion of this estimator are provided in SI Appendix.

Results

After filtering away the exon pilot calls based on less than 15× coverage and individuals with substantial discrepancy with HapMap (SI Appendix), we compared the joint SFSs with the expected spectra obtained if each individual had been assigned to a population randomly in the independent sites model. That is, given an N × N spectrum [var phi](i, j), we have an expected spectrum of

equation image

Figs. 2 and and33 indicate that, even for pairs of closely related populations, we find a substantial reduction in allele sharing for rare variants compared with a single randomly mixing population. In particular, Fig. 2, Right shows the Anscombe residuals between expectation and data. Blue strips along the axis correspond to a significant excess of variants private to one panel in the data, and they are accompanied by a reduction in shared variants (red). The residuals are larger (darker colors) for rare variants not only because of a larger number of sites but also because of reduced sharing. Indeed, we see in Fig. 3 that the amount of sharing, expressed as a proportion of the expectation in a panmictic population, is only a few percent between continental populations for variants present at 2% minor allele frequency (MAF) and about 60% for variants at 20% MAF. More closely related populations, such as CHB and JPT, still exhibit a 50% reduction in sharing at 2% MAF but barely any reduction for variants at 20% MAF. Interestingly, even closely related populations, such as CHB and CHD, exhibit a 20% reduction in sharing for 2% MAF. This finding is consistent with recent population structure, but although this analysis used only genotype calls with high-coverage data, such a reduction in sharing could also be partly explained by differences in the sequencing platform between the two populations.

Fig. 2.
Joint allele SFSs (all sites) for selected pairs of populations from the exome sequencing panel (Left) compared with expected spectra under site by site population label permutation (Center). Shown are sites occurring in, at most, 15 of 100 chromosomes. ...
Fig. 3.
The probability that two individuals carrying an allele of given minor frequency come from different populations, normalized by the expected frequency in a panmictic population, using the seven panels of the exome capture dataset. Sharing decreases dramatically ...

To increase the number of sites available for estimating joint SFS, we turned to the low-coverage pilot data. Direct comparison of low-coverage and exon capture genotype calls at sites called in the exon capture pilot shows a significant discrepancy for rare variants (SI Appendix, Figs. S1–S3) because of elevated rates of false-negative variant calls in low-coverage data. This finding results in biased estimates of the distribution of allele frequencies.

The bulk of the systematic discrepancy between high- and low-coverage SNP calls could be described using the simple false-negative model described above (Table 1 and SI Appendix, Figs. S1–S4). The most substantial discrepancy between this model and the data is in the CHB + JPT, possibly because this group was the metapopulation with the lowest coverage. In this case, the high-coverage singleton counts are 634% higher than the uncorrected low-coverage counts. After error correction, a discrepancy of 19% remains, with the corrected low-coverage site predicting more counts (SI Appendix, Fig. S3). Despite the high false-negative rate for singleton calls in the low-coverage data, its sheer volume provides an advantage in estimation precision over the much smaller exon pilot dataset. Similarly, the false-negative model for multiple-population SFSs (Eq. 1) was found to account for the bulk of the discrepancy between the multiple-population SFS derived from low- and high-coverage SNP calls (SI Appendix, Fig. S4).

We modeled the joint SFS for synonymous sites in African (YRI), Asian (CHB and JPT), and European (CEU) data sequenced in the low-coverage pilot using the 13-parameter demographic model used in ref. 6 (Fig. 4 and Table 2), taking into account the expected error model. The SFS was calculated using n = 40 samples per panel (80 chromosomes). We obtained maximum composite likelihood estimates for the 13 parameters using [partial differential]a[partial differential]i, a diffusion-approximation-based package for estimating expected SFSs resulting from various demographic models (6) (Table 2). Our maximum likelihood parameters are broadly consistent with previously reported values using National Institute on Environmental Health Sciences (NIEHS) data (6). However, the resulting confidence intervals, determined by conventional bootstrap (likelihood profiles are provided in SI Appendix), are substantially narrower than those intervals resulting from NIEHS or high-coverage data alone. As an example, using a 25 y generation time, we find a time of split between African and Eurasian populations of TB = 51 thousand years ago (kya; 95% confidence interval = 45–69 kya). By contrast, the NIEHS data (6) resulted in a maximum likelihood estimate of TB = 140 kya (95% confidence interval = 40–270 kya). The inference based on the exon pilot alone yields TB = 98 kya (95% confidence interval = 43–210 kya). In general, the gain in precision was strongest for the parameters involved in more ancient events. Inference based on uncorrected low-coverage data yielded an unrealistic TB = 14 kya split.

Fig. 4.
An illustration of the inferred demographic model, with line width corresponding to population size and time flowing from left to right. The width of the red arrows is proportional to the migration intensity. Model details are provided in Table 2 and ...
Table 2.
Parameter estimates obtained using the NIEHS data (6), 1000G exon and low-coverage data, and 1000G exon pilot data only (this work)

Beyond their fundamental interest as descriptors of human history, these parameters allow for a number of experimental predictions; given a demographic model, we can predict, for example, the number of synonymous variants to be discovered in samples of larger size that are currently in the process of being sequenced. We predicted the number of variants to be discovered in each of the three population considered (CEU, CHB + JPT, and YRI) as the sample size is increased using both the inferred demographic model and the jackknife estimator of the number of undiscovered variants presented in Methods (Fig. 5). Because the jackknife does not rely on assumptions about demography and selection, we also used it to predict the number of nonsynonymous sites to be discovered. The jackknife approach predicts that, as sample size is increased, the total number of segregating sites in CEU and CHB + JPT panels should overtake the number of segregating sites in the YRI population.

Fig. 5.
Observed and projected numbers of synonymous and nonsynonymous variants in CEU, CHB + JPT, and YRI as a function of the sample size (two times the number of individuals sequenced). Long and short dashes correspond to jackknife and model-based projections ...

Discussion

Our results illustrate that the vast majority of human variable sites are rare and that the majority of rare variants exhibit, at most, very little sharing among continental populations. We also find reduced sharing for rare variants compared with common variants among more closely related populations, such as CHB-JPT, CEU-TSI, and CHB-CHD. This lack of sharing can be explained by population divergence, and we expect that the fraction of newly discovered variable sites that are population-specific will keep increasing with sample size. This finding poses a formidable challenge for the reproduction of genome-wide association studies for rare functional variants across diverse populations, because the statistical difficulties caused by variant rarity within a population combine with increased between-population divergence.

We also show how sequencing a large number of individuals at low coverage is an efficient strategy not only for discovering the maximum number of variable sites but also for estimating demographic parameters, at least when error rates can be estimated. Different statistical methods have been proposed that include read depth information and models of sequencing errors to reduce biases in allele frequency estimation (11, 12). Because of the availability of high-coverage data for a subset of the genome in the populations studied here, we used direct comparison with high-coverage data to estimate and correct biases caused by low coverage. A significant advantage of the direct comparison approach is simplicity and computational efficiency; it can use existing curated genotype calls rather than require a full analysis of an error model at the individual read level. This advantage is particularly useful for data generated by 1000G, because multiple sequencing platforms and calling pipelines with different error modes have been used jointly. In general, the two approaches are not mutually exclusive, and when practical, a statistically corrected low-coverage SFS could be further corrected by comparison with targeted high-coverage data. Here, we used, as a reference, an exon capture dataset with >50× coverage and validation rates of 96.8% overall and 93.8% for singletons. We also restricted our analysis to a high-quality subset of the data (by selecting individuals with good coverage and HapMap concordance and selecting sites with sufficient coverage). The false-negative rate in the exon capture data was estimated to be below 5% for variants of at least 1% in frequency and 26% for variants below 1% in frequency. To avoid resulting biases in the frequency-dependent false-negative estimates for the low-coverage data, we restricted the comparison with sites where a high-coverage variant call had been made.

We found that the bulk of the discrepancy between high- and low-coverage data could be described by a simple model that uses only two parameters per population, and in which error rates decay exponentially with MAF, frequencies of detected variant sites are accurately determined, and errors occur independently in each population. The latter assumption is perhaps the most debatable: we expect at least some correlations in the coverage at a given site for different populations. An error model taking into account such correlations would, therefore, be desirable. However, given the limited data available to infer the parameters of the error model, the independence assumption is a reasonable tradeoff that allows for the capture of the bulk of the error patterns. Finally, the error rates likely differ between different genomic regions (such as coding vs. noncoding DNA), motivating our focus on exonic regions where high-coverage data were available. This finding emphasizes the importance of obtaining high-quality genotype data through sequencing or chip genotyping for representative noncoding regions.

The demographic model discussed in this paper was introduced in Gutenkunst et al. (6), where it was used to analyze the NIEHS intergenic data. Despite differences in putatively neutral sites (selected intergenics vs. synonymous), sequencing technology (Sanger vs. high throughput), and panel choice (CHB only vs. CHB + JPT), the inferred parameters are in broad agreement (Table 2). Inference based only on capture data provides overlapping 95% confidence intervals, with the single exception of Europe–Asia migration rate (1.8 − 3.9 × 10−5 vs. 4.1 − 8.2 × 10−5). The main difference between these three sets of parameter estimates is the width of the confidence intervals. The inference based on exon capture data provides reduced uncertainty compared with the NIEHS data, despite a comparable number of variable sites in the SFS; the additional number of samples per site results in more accurate frequency estimates that further constrain the demographic model. A much greater reduction in the confidence intervals is obtained by considering the low-coverage and exon capture data jointly (a 90% reduction of the confidence interval for the Out-of-Africa split time compared with a 27% reduction with the exon data only). Our estimate of the Out-of-Africa split time using the low-coverage data, 51 kya, is also in better agreement with both prior genetic and archaeological estimates of the modern human expansion out of Africa (18). It should be emphasized that, because we use a single Western African population as our African panel, the divergence described by our model might have occurred earlier than the actual Out-of-Africa event.

The narrow confidence intervals on some of the parameters should not obscure the fact that the parameter estimates are model-dependent. As a simple example, a model that does not allow for migration would require more recent split times to produce similar levels of population divergence. The demographic history of the four populations considered is much more eventful than what is accounted for by our model. Additional geographically intermediate populations from the Near East and Central Asia that were not included in our analysis might contribute significantly to the allele frequency distribution as ghost populations (19). Incorporating an appropriate number of source populations for estimates of migration has been a general limitation of two- and three-population models under isolation migration coalescent, approximate Bayesian computation, and diffusion-based approaches. This limitation might explain why our estimate of the divergence between East Asians and Europeans is more recent than estimates based on archaeological evidence (18), but is comparable with estimates of 23 kya (20) under an approximate Bayesian computation approach and 25 kya under an isolation migration approach with mtDNA X and Y sequence data (21).

Similarly, the current population sizes inferred from our model (15,500, 35,900, and 49,000 for YRI, CEU, and CHB, respectively) are still significantly lower than census sizes. Because our model accounts for some population size changes, these are expected to be in closer relationship to census sizes compared with the classical effective population size, but additional model refinement [such as structure within populations, generation overlap, and a recent increase in growth rate, which was observed in the work by Coventry et al. (22), in a sample of 10,422 European-Americans] will be needed to close the gap.

Predictions based on the demographic model and the jackknife approach differ as to the number of new variants to be discovered, particularly for CHB + JPT (Fig. 5). This difference is easily understood by considering the differences in the two approaches. The demographic model attempts to fit the complete SFS at the cost of model assumptions that might bias the results. By contrast, the jackknife approach focuses on the rare variants, and the model assumptions are weaker. The difference can be traced to the fact that the maximum likelihood demographic model predicts a number of singletons somewhat lower than the observed number (SI Appendix, Fig. S5). If this discrepancy is due to limitations in the model that fail to account for an excess of rare variants, we expect the jackknife estimator to be more accurate. By contrast, if the difference is because of inaccurate singleton frequency estimation (from sequencing errors leading to 6.2% of false-positive variants in the high-coverage data) or limitations of our correction model (SI Appendix, Figs. S1–S3), the demographic model is expected to provide more robust estimates.

Nonetheless, both methods predict at least 50,000 synonymous variants in the human genome when sequencing 1,000 individuals for the CEU and CHB populations, substantially more than would be predicted from population genetic models of constant size. The jackknife approach applied directly to the seven target capture populations shows similar patterns, with some variation within continents, in JPT samples showing less rare variants than the Chinese populations, and in TSI samples showing more rare variants than CEU (SI Appendix, Fig. S7). These results highlight the importance, for the planning of medical sequencing experiments, of accurate demographic models of human populations and the dramatic impact that recent human population growth has had on the structure of genetic variation. Specifically, our prediction that most genetic variants are rare and highly diverged suggests that genome-wide association studies aiming to correlate common disease susceptibility with rare variants may need extraordinarily large sample sizes and precise definitions of population samples to accurately compare frequencies in cases and controls. Eventually, a clear tradeoff will ensue between cataloging variants and genotyping vs. completely sequencing human genomes and comparing them among populations of cases and controls.

Methods

Numerics.

The unprecedented size of the 1,000 genomes data created challenges for the numerical solution of the diffusion equation. Namely, the number of grid points required to accurately estimate the three population SFS grows rapidly with the number of samples in each population. We optimized [partial differential]a[partial differential]i and released version 1.5.0, in which the number of grid points necessary to achieve a given accuracy is reduced. As in ref. 6, we obtained SFSs with three different grid sizes (60, 70, and 80) and extrapolated to infinite grid size. Each likelihood evaluation took between 1 and 2 min on a 2.26-GHz processor. Optimization required hundreds to thousands of likelihood evaluations. Likelihoods were computed using the folded SFS to avoid biases caused by ancestral misidentification. Convergence of the maximum likelihood optimization process was ensured by restarting the search with modified initial conditions. The maximal likelihood parameters were chosen, but differences in parameter estimates from the different restarts were, on average, much smaller than the reported confidence intervals.

Conversion from Genetic to Physical Units.

The different parameters involved in the diffusion equation solved by [partial differential]a[partial differential]i are normalized by the ancestral population size Na during the likelihood maximization. The optimal value of Na is calculated using the fact that the total number of segregating sites in a sample of n individuals is proportional to NaLμ, where μ is the mutation rate and L is the effective length sequenced. For this analysis, we used L = 5,007,837, the number of autosomal fourfold degenerate sites that passed quality control in all three populations, and μ = 2.36 × 10−8. For estimates based on exon data alone, we fixed the effective sequencing length to 68% of the target length by requesting equal values for NA in the corrected low-coverage and exon pilot estimates. The remaining 32% is composed of called sites that failed quality controls and sites for which no genotype call has been made. When performing bootstrap analysis, the total number of fourfold degenerate sites varied from bootstrap sample to bootstrap sample and was adjusted accordingly. Finally, to convert generation time to years, we used a generation time of 25 y. Estimated parameters are shown in Table 2.

Supplementary Material

Supporting Information:

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1019276108/-/DCSupplemental.

Contributor Information

David L. Altshuler, The 1000 Genomes Project.

Richard M. Durbin, The 1000 Genomes Project.

Gonçalo R. Abecasis, The 1000 Genomes Project.

David R. Bentley, The 1000 Genomes Project.

Aravinda Chakravarti, The 1000 Genomes Project.

Andrew G. Clark, The 1000 Genomes Project.

Francis S. Collins, The 1000 Genomes Project.

Francisco M. De La Vega, The 1000 Genomes Project.

Peter Donnelly, The 1000 Genomes Project.

Michael Egholm, The 1000 Genomes Project.

Paul Flicek, The 1000 Genomes Project.

Stacey B. Gabriel, The 1000 Genomes Project.

Richard A. Gibbs, The 1000 Genomes Project.

Bartha M. Knoppers, The 1000 Genomes Project.

Eric S. Lander, The 1000 Genomes Project.

Hans Lehrach, The 1000 Genomes Project.

Elaine R. Mardis, The 1000 Genomes Project.

Gil A. McVean, The 1000 Genomes Project.

Debbie A. Nickerson, The 1000 Genomes Project.

Leena Peltonen, The 1000 Genomes Project.

Alan J. Schafer, The 1000 Genomes Project.

Stephen T. Sherry, The 1000 Genomes Project.

Jun Wang, The 1000 Genomes Project.

Richard K. Wilson, The 1000 Genomes Project.

Richard A. Gibbs, The 1000 Genomes Project.

David Deiros, The 1000 Genomes Project.

Mike Metzker, The 1000 Genomes Project.

Donna Muzny, The 1000 Genomes Project.

Jeff Reid, The 1000 Genomes Project.

David Wheeler, The 1000 Genomes Project.

Jun Wang, The 1000 Genomes Project.

Jingxiang Li, The 1000 Genomes Project.

Min Jian, The 1000 Genomes Project.

Guoqing Li, The 1000 Genomes Project.

Ruiqiang Li, The 1000 Genomes Project.

Huiqing Liang, The 1000 Genomes Project.

Geng Tian, The 1000 Genomes Project.

Bo Wang, The 1000 Genomes Project.

Jian Wang, The 1000 Genomes Project.

Wei Wang, The 1000 Genomes Project.

Huanming Yang, The 1000 Genomes Project.

Xiuqing Zhang, The 1000 Genomes Project.

Huisong Zheng, The 1000 Genomes Project.

Eric S. Lander, The 1000 Genomes Project.

David L. Altshuler, The 1000 Genomes Project.

Lauren Ambrogio, The 1000 Genomes Project.

Toby Bloom, The 1000 Genomes Project.

Kristian Cibulskis, The 1000 Genomes Project.

Tim J. Fennell, The 1000 Genomes Project.

Stacey B. Gabriel, The 1000 Genomes Project.

David B. Jaffe, The 1000 Genomes Project.

Erica Shefler, The 1000 Genomes Project.

Carrie L. Sougnez, The 1000 Genomes Project.

David R. Bentley, The 1000 Genomes Project.

Niall Gormley, The 1000 Genomes Project.

Sean Humphray, The 1000 Genomes Project.

Zoya Kingsbury, The 1000 Genomes Project.

Paula Koko-Gonzales, The 1000 Genomes Project.

Jennifer Stone, The 1000 Genomes Project.

Kevin J. McKernan, The 1000 Genomes Project.

Gina L. Costa, The 1000 Genomes Project.

Jeffry K. Ichikawa, The 1000 Genomes Project.

Clarence C. Lee, The 1000 Genomes Project.

Ralf Sudbrak, The 1000 Genomes Project.

Hans Lehrach, The 1000 Genomes Project.

Tatiana A. Borodina, The 1000 Genomes Project.

Andreas Dahl, The 1000 Genomes Project.

Alexey N. Davydov, The 1000 Genomes Project.

Peter Marquardt, The 1000 Genomes Project.

Florian Mertes, The 1000 Genomes Project.

Wilfiried Nietfeld, The 1000 Genomes Project.

Philip Rosenstiel, The 1000 Genomes Project.

Stefan Schreiber, The 1000 Genomes Project.

Aleksey V. Soldatov, The 1000 Genomes Project.

Bernd Timmermann, The 1000 Genomes Project.

Marius Tolzmann, The 1000 Genomes Project.

Michael Egholm, The 1000 Genomes Project.

Jason Affourtit, The 1000 Genomes Project.

Dana Ashworth, The 1000 Genomes Project.

Said Attiya, The 1000 Genomes Project.

Melissa Bachorski, The 1000 Genomes Project.

Eli Buglione, The 1000 Genomes Project.

Adam Burke, The 1000 Genomes Project.

Amanda Caprio, The 1000 Genomes Project.

Christopher Celone, The 1000 Genomes Project.

Shauna Clark, The 1000 Genomes Project.

David Conners, The 1000 Genomes Project.

Brian Desany, The 1000 Genomes Project.

Lisa Gu, The 1000 Genomes Project.

Lorri Guccione, The 1000 Genomes Project.

Kalvin Kao, The 1000 Genomes Project.

Andrew Kebbel, The 1000 Genomes Project.

Jennifer Knowlton, The 1000 Genomes Project.

Matthew Labrecque, The 1000 Genomes Project.

Louise McDade, The 1000 Genomes Project.

Craig Mealmaker, The 1000 Genomes Project.

Melissa Minderman, The 1000 Genomes Project.

Anne Nawrocki, The 1000 Genomes Project.

Faheem Niazi, The 1000 Genomes Project.

Kristen Pareja, The 1000 Genomes Project.

Ravi Ramenani, The 1000 Genomes Project.

David Riches, The 1000 Genomes Project.

Wanmin Song, The 1000 Genomes Project.

Cynthia Turcotte, The 1000 Genomes Project.

Shally Wang, The 1000 Genomes Project.

Elaine R. Mardis, The 1000 Genomes Project.

Richard K. Wilson, The 1000 Genomes Project.

David Dooling, The 1000 Genomes Project.

Lucinda Fulton, The 1000 Genomes Project.

Robert Fulton, The 1000 Genomes Project.

George Weinstock, The 1000 Genomes Project.

Richard M. Durbin, The 1000 Genomes Project.

John Burton, The 1000 Genomes Project.

David M. Carter, The 1000 Genomes Project.

Carol Churcher, The 1000 Genomes Project.

Alison Coffey, The 1000 Genomes Project.

Anthony Cox, The 1000 Genomes Project.

Aarno Palotie, The 1000 Genomes Project.

Michael Quail, The 1000 Genomes Project.

Tom Skelly, The 1000 Genomes Project.

James Stalker, The 1000 Genomes Project.

Harold P. Swerdlow, The 1000 Genomes Project.

Daniel Turner, The 1000 Genomes Project.

Anniek De Witte, The 1000 Genomes Project.

Shane Giles, The 1000 Genomes Project.

Richard A. Gibbs, The 1000 Genomes Project.

David Wheeler, The 1000 Genomes Project.

Matthew Bainbridge, The 1000 Genomes Project.

Danny Challis, The 1000 Genomes Project.

Aniko Sabo, The 1000 Genomes Project.

Fuli Yu, The 1000 Genomes Project.

Jin Yu, The 1000 Genomes Project.

Jun Wang, The 1000 Genomes Project.

Xiaodong Fang, The 1000 Genomes Project.

Xiaosen Guo, The 1000 Genomes Project.

Ruiqiang Li, The 1000 Genomes Project.

Yingrui Li, The 1000 Genomes Project.

Ruibang Luo, The 1000 Genomes Project.

Shuaishuai Tai, The 1000 Genomes Project.

Honglong Wu, The 1000 Genomes Project.

Hancheng Zheng, The 1000 Genomes Project.

Xiaole Zheng, The 1000 Genomes Project.

Yan Zhou, The 1000 Genomes Project.

Guoqing Li, The 1000 Genomes Project.

Jian Wang, The 1000 Genomes Project.

Huanming Yang, The 1000 Genomes Project.

Gabor T. Marth, The 1000 Genomes Project.

Erik P. Garrison, The 1000 Genomes Project.

Weichun Huang, The 1000 Genomes Project.

Amit Indap, The 1000 Genomes Project.

Deniz Kural, The 1000 Genomes Project.

Wan-Ping Lee, The 1000 Genomes Project.

Wen Fung Leong, The 1000 Genomes Project.

Aaron R. Quinlan, The 1000 Genomes Project.

Chip Stewart, The 1000 Genomes Project.

Michael P. Stromberg, The 1000 Genomes Project.

Alistair N. Ward, The 1000 Genomes Project.

Jiantao Wu, The 1000 Genomes Project.

Charles Lee, The 1000 Genomes Project.

Ryan E. Mills, The 1000 Genomes Project.

Xinghua Shi, The 1000 Genomes Project.

Mark J. Daly, The 1000 Genomes Project.

Mark A. DePristo, The 1000 Genomes Project.

David L. Altshuler, The 1000 Genomes Project.

Aaron D. Ball, The 1000 Genomes Project.

Eric Banks, The 1000 Genomes Project.

Toby Bloom, The 1000 Genomes Project.

Brian L. Browning, The 1000 Genomes Project.

Kristian Cibulskis, The 1000 Genomes Project.

Tim J. Fennell, The 1000 Genomes Project.

Kiran V. Garimella, The 1000 Genomes Project.

Sharon R. Grossman, The 1000 Genomes Project.

Robert E. Handsaker, The 1000 Genomes Project.

Matt Hanna, The 1000 Genomes Project.

Chris Hartl, The 1000 Genomes Project.

David B. Jaffe, The 1000 Genomes Project.

Andrew M. Kernytsky, The 1000 Genomes Project.

Joshua M. Korn, The 1000 Genomes Project.

Heng Li, The 1000 Genomes Project.

Jared R. Maguire, The 1000 Genomes Project.

Steven A. McCarroll, The 1000 Genomes Project.

Aaron McKenna, The 1000 Genomes Project.

James C. Nemesh, The 1000 Genomes Project.

Anthony A. Philippakis, The 1000 Genomes Project.

Ryan E. Poplin, The 1000 Genomes Project.

Alkes Price, The 1000 Genomes Project.

Manuel A. Rivas, The 1000 Genomes Project.

Pardis C. Sabeti, The 1000 Genomes Project.

Stephen F. Schaffner, The 1000 Genomes Project.

Erica Shefler, The 1000 Genomes Project.

Ilya A. Shlyakhter, The 1000 Genomes Project.

David N. Cooper, The 1000 Genomes Project.

Edward V. Ball, The 1000 Genomes Project.

Matthew Mort, The 1000 Genomes Project.

Andrew D. Phillips, The 1000 Genomes Project.

Peter D. Stenson, The 1000 Genomes Project.

Jonathan Sebat, The 1000 Genomes Project.

Vladimir Makarov, The 1000 Genomes Project.

Kenny Ye, The 1000 Genomes Project.

Seungtai C. Yoon, The 1000 Genomes Project.

Carlos D. Bustamante, The 1000 Genomes Project.

Andrew G. Clark, The 1000 Genomes Project.

Adam Boyko, The 1000 Genomes Project.

Jeremiah Degenhardt, The 1000 Genomes Project.

Simon Gravel, The 1000 Genomes Project.

Ryan N. Gutenkunst, The 1000 Genomes Project.

Mark Kaganovich, The 1000 Genomes Project.

Alon Keinan, The 1000 Genomes Project.

Phil Lacroute, The 1000 Genomes Project.

Xin Ma, The 1000 Genomes Project.

Andy Reynolds, The 1000 Genomes Project.

Laura Clarke, The 1000 Genomes Project.

Paul Flicek, The 1000 Genomes Project.

Fiona Cunningham, The 1000 Genomes Project.

Javier Herrero, The 1000 Genomes Project.

Stephen Keenen, The 1000 Genomes Project.

Eugene Kulesha, The 1000 Genomes Project.

Rasko Leinonen, The 1000 Genomes Project.

William M. McLaren, The 1000 Genomes Project.

Rajesh Radhakrishnan, The 1000 Genomes Project.

Richard E. Smith, The 1000 Genomes Project.

Vadim Zalunin, The 1000 Genomes Project.

Xiangqun Zheng-Bradley, The 1000 Genomes Project.

Jan O. Korbel, The 1000 Genomes Project.

Adrian M. Stütz, The 1000 Genomes Project.

Sean Humphray, The 1000 Genomes Project.

Markus Bauer, The 1000 Genomes Project.

R. Keira Cheetham, The 1000 Genomes Project.

Tony Cox, The 1000 Genomes Project.

Michael Eberle, The 1000 Genomes Project.

Terena James, The 1000 Genomes Project.

Scott Kahn, The 1000 Genomes Project.

Lisa Murray, The 1000 Genomes Project.

Aravinda Chakravarti, The 1000 Genomes Project.

Kai Ye, The 1000 Genomes Project.

Francisco M. De La Vega, The 1000 Genomes Project.

Yutao Fu, The 1000 Genomes Project.

Fiona C. L. Hyland, The 1000 Genomes Project.

Jonathan M. Manning, The 1000 Genomes Project.

Stephen F. McLaughlin, The 1000 Genomes Project.

Heather E. Peckham, The 1000 Genomes Project.

Onur Sakarya, The 1000 Genomes Project.

Yongming A. Sun, The 1000 Genomes Project.

Eric F. Tsung, The 1000 Genomes Project.

Mark A. Batzer, The 1000 Genomes Project.

Miriam K. Konkel, The 1000 Genomes Project.

Jerilyn A. Walker, The 1000 Genomes Project.

Ralf Sudbrak, The 1000 Genomes Project.

Marcus W. Albrecht, The 1000 Genomes Project.

Vyacheslav S. Amstislavskiy, The 1000 Genomes Project.

Ralf Herwig, The 1000 Genomes Project.

Dimitri V. Parkhomchuk, The 1000 Genomes Project.

Stephen T. Sherry, The 1000 Genomes Project.

Richa Agarwala, The 1000 Genomes Project.

Hoda M. Khouri, The 1000 Genomes Project.

Aleksandr O. Morgulis, The 1000 Genomes Project.

Justin E. Paschall, The 1000 Genomes Project.

Lon D. Phan, The 1000 Genomes Project.

Kirill E. Rotmistrovsky, The 1000 Genomes Project.

Robert D. Sanders, The 1000 Genomes Project.

Martin F. Shumway, The 1000 Genomes Project.

Chunlin Xiao, The 1000 Genomes Project.

Gil A. McVean, The 1000 Genomes Project.

Adam Auton, The 1000 Genomes Project.

Zamin Iqbal, The 1000 Genomes Project.

Gerton Lunter, The 1000 Genomes Project.

Jonathan L. Marchini, The 1000 Genomes Project.

Loukas Moutsianas, The 1000 Genomes Project.

Simon Myers, The 1000 Genomes Project.

Afidalina Tumian, The 1000 Genomes Project.

Brian Desany, The 1000 Genomes Project.

James Knight, The 1000 Genomes Project.

Roger Winer, The 1000 Genomes Project.

David W. Craig, The 1000 Genomes Project.

Steve M. Beckstrom-Sternberg, The 1000 Genomes Project.

Alexis Christoforides, The 1000 Genomes Project.

Ahmet A. Kurdoglu, The 1000 Genomes Project.

John V. Pearson, The 1000 Genomes Project.

Shripad A. Sinari, The 1000 Genomes Project.

Waibhav D. Tembe, The 1000 Genomes Project.

David Haussler, The 1000 Genomes Project.

Angie S. Hinrichs, The 1000 Genomes Project.

Sol J. Katzman, The 1000 Genomes Project.

Andrew Kern, The 1000 Genomes Project.

Robert M. Kuhn, The 1000 Genomes Project.

Molly Przeworski, The 1000 Genomes Project.

Ryan D. Hernandez, The 1000 Genomes Project.

Bryan Howie, The 1000 Genomes Project.

Joanna L. Kelley, The 1000 Genomes Project.

S. Cord Melton, The 1000 Genomes Project.

Gonçalo R. Abecasis, The 1000 Genomes Project.

Yun Li, The 1000 Genomes Project.

Paul Anderson, The 1000 Genomes Project.

Tom Blackwell, The 1000 Genomes Project.

Wei Chen, The 1000 Genomes Project.

William O. Cookson, The 1000 Genomes Project.

Jun Ding, The 1000 Genomes Project.

Hyun Min Kang, The 1000 Genomes Project.

Mark Lathrop, The 1000 Genomes Project.

Liming Liang, The 1000 Genomes Project.

Miriam F. Moffatt, The 1000 Genomes Project.

Paul Scheet, The 1000 Genomes Project.

Carlo Sidore, The 1000 Genomes Project.

Matthew Snyder, The 1000 Genomes Project.

Xiaowei Zhan, The 1000 Genomes Project.

Sebastian Zöllner, The 1000 Genomes Project.

Philip Awadalla, The 1000 Genomes Project.

Ferran Casals, The 1000 Genomes Project.

Youssef Idaghdour, The 1000 Genomes Project.

John Keebler, The 1000 Genomes Project.

Eric A. Stone, The 1000 Genomes Project.

Martine Zilversmit, The 1000 Genomes Project.

Lynn Jorde, The 1000 Genomes Project.

Jinchuan Xing, The 1000 Genomes Project.

Evan E. Eichler, The 1000 Genomes Project.

Gozde Aksay, The 1000 Genomes Project.

Can Alkan, The 1000 Genomes Project.

Iman Hajirasouliha, The 1000 Genomes Project.

Fereydoun Hormozdiari, The 1000 Genomes Project.

Jeffrey M. Kidd, The 1000 Genomes Project.

S. Cenk Sahinalp, The 1000 Genomes Project.

Peter H. Sudmant, The 1000 Genomes Project.

Elaine R. Mardis, The 1000 Genomes Project.

Ken Chen, The 1000 Genomes Project.

Asif Chinwalla, The 1000 Genomes Project.

Li Ding, The 1000 Genomes Project.

Daniel C. Koboldt, The 1000 Genomes Project.

Mike D. McLellan, The 1000 Genomes Project.

David Dooling, The 1000 Genomes Project.

George Weinstock, The 1000 Genomes Project.

John W. Wallis, The 1000 Genomes Project.

Michael C. Wendl, The 1000 Genomes Project.

Qunyuan Zhang, The 1000 Genomes Project.

Richard M. Durbin, The 1000 Genomes Project.

Cornelis A. Albers, The 1000 Genomes Project.

Qasim Ayub, The 1000 Genomes Project.

Senduran Balasubramaniam, The 1000 Genomes Project.

Jeffrey C. Barrett, The 1000 Genomes Project.

David M. Carter, The 1000 Genomes Project.

Yuan Chen, The 1000 Genomes Project.

Donald F. Conrad, The 1000 Genomes Project.

Petr Danecek, The 1000 Genomes Project.

Emmanouil T. Dermitzakis, The 1000 Genomes Project.

Min Hu, The 1000 Genomes Project.

Ni Huang, The 1000 Genomes Project.

Matt E. Hurles, The 1000 Genomes Project.

Hanjun Jin, The 1000 Genomes Project.

Luke Jostins, The 1000 Genomes Project.

Thomas M. Keane, The 1000 Genomes Project.

Si Quang Le, The 1000 Genomes Project.

Sarah Lindsay, The 1000 Genomes Project.

Quan Long, The 1000 Genomes Project.

Daniel G. MacArthur, The 1000 Genomes Project.

Stephen B. Montgomery, The 1000 Genomes Project.

Leopold Parts, The 1000 Genomes Project.

James Stalker, The 1000 Genomes Project.

Chris Tyler-Smith, The 1000 Genomes Project.

Klaudia Walter, The 1000 Genomes Project.

Yujun Zhang, The 1000 Genomes Project.

Mark B. Gerstein, The 1000 Genomes Project.

Michael Snyder, The 1000 Genomes Project.

Alexej Abyzov, The 1000 Genomes Project.

Suganthi Balasubramanian, The 1000 Genomes Project.

Robert Bjornson, The 1000 Genomes Project.

Jiang Du, The 1000 Genomes Project.

Fabian Grubert, The 1000 Genomes Project.

Lukas Habegger, The 1000 Genomes Project.

Rajini Haraksingh, The 1000 Genomes Project.

Justin Jee, The 1000 Genomes Project.

Ekta Khurana, The 1000 Genomes Project.

Hugo Y. K. Lam, The 1000 Genomes Project.

Jing Leng, The 1000 Genomes Project.

Xinmeng Jasmine Mu, The 1000 Genomes Project.

Alexander E. Urban, The 1000 Genomes Project.

Zhengdong Zhang, The 1000 Genomes Project.

Yingrui Li, The 1000 Genomes Project.

Ruibang Luo, The 1000 Genomes Project.

Gabor T. Marth, The 1000 Genomes Project.

Erik P. Garrison, The 1000 Genomes Project.

Deniz Kural, The 1000 Genomes Project.

Aaron R. Quinlan, The 1000 Genomes Project.

Chip Stewart, The 1000 Genomes Project.

Michael P. Stromberg, The 1000 Genomes Project.

Alistair N. Ward, The 1000 Genomes Project.

Jiantao Wu, The 1000 Genomes Project.

Charles Lee, The 1000 Genomes Project.

Ryan E. Mills, The 1000 Genomes Project.

Xinghua Shi, The 1000 Genomes Project.

Steven A. McCarroll, The 1000 Genomes Project.

Eric Banks, The 1000 Genomes Project.

Mark A. DePristo, The 1000 Genomes Project.

Robert E. Handsaker, The 1000 Genomes Project.

Chris Hartl, The 1000 Genomes Project.

Joshua M. Korn, The 1000 Genomes Project.

Heng Li, The 1000 Genomes Project.

James C. Nemesh, The 1000 Genomes Project.

Jonathan Sebat, The 1000 Genomes Project.

Vladimir Makarov, The 1000 Genomes Project.

Kenny Ye, The 1000 Genomes Project.

Seungtai C. Yoon, The 1000 Genomes Project.

Jeremiah Degenhardt, The 1000 Genomes Project.

Mark Kaganovich, The 1000 Genomes Project.

Laura Clarke, The 1000 Genomes Project.

Richard E. Smith, The 1000 Genomes Project.

Xiangqun Zheng-Bradley, The 1000 Genomes Project.

Jan O. Korbel, The 1000 Genomes Project.

Sean Humphray, The 1000 Genomes Project.

R. Keira Cheetham, The 1000 Genomes Project.

Michael Eberle, The 1000 Genomes Project.

Scott Kahn, The 1000 Genomes Project.

Lisa Murray, The 1000 Genomes Project.

Kai Ye, The 1000 Genomes Project.

Francisco M. De La Vega, The 1000 Genomes Project.

Yutao Fu, The 1000 Genomes Project.

Heather E. Peckham, The 1000 Genomes Project.

Yongming A. Sun, The 1000 Genomes Project.

Mark A. Batzer, The 1000 Genomes Project.

Miriam K. Konkel, The 1000 Genomes Project.

Jerilyn A. Walker, The 1000 Genomes Project.

Chunlin Xiao, The 1000 Genomes Project.

Zamin Iqbal, The 1000 Genomes Project.

Brian Desany, The 1000 Genomes Project.

Tom Blackwell, The 1000 Genomes Project.

Matthew Snyder, The 1000 Genomes Project.

Jinchuan Xing, The 1000 Genomes Project.

Evan E. Eichler, The 1000 Genomes Project.

Gozde Aksay, The 1000 Genomes Project.

Can Alkan, The 1000 Genomes Project.

Iman Hajirasouliha, The 1000 Genomes Project.

Fereydoun Hormozdiari, The 1000 Genomes Project.

Jeffrey M. Kidd, The 1000 Genomes Project.

Ken Chen, The 1000 Genomes Project.

Asif Chinwalla, The 1000 Genomes Project.

Li Ding, The 1000 Genomes Project.

Mike D. McLellan, The 1000 Genomes Project.

John W. Wallis, The 1000 Genomes Project.

Matt E. Hurles, The 1000 Genomes Project.

Donald F. Conrad, The 1000 Genomes Project.

Klaudia Walter, The 1000 Genomes Project.

Yujun Zhang, The 1000 Genomes Project.

Mark B. Gerstein, The 1000 Genomes Project.

Michael Snyder, The 1000 Genomes Project.

Alexej Abyzov, The 1000 Genomes Project.

Jiang Du, The 1000 Genomes Project.

Fabian Grubert, The 1000 Genomes Project.

Rajini Haraksingh, The 1000 Genomes Project.

Justin Jee, The 1000 Genomes Project.

Ekta Khurana, The 1000 Genomes Project.

Hugo Y. K. Lam, The 1000 Genomes Project.

Jing Leng, The 1000 Genomes Project.

Xinmeng Jasmine Mu, The 1000 Genomes Project.

Alexander E. Urban, The 1000 Genomes Project.

Zhengdong Zhang, The 1000 Genomes Project.

Richard A. Gibbs, The 1000 Genomes Project.

Matthew Bainbridge, The 1000 Genomes Project.

Danny Challis, The 1000 Genomes Project.

Cristian Coafra, The 1000 Genomes Project.

Huyen Dinh, The 1000 Genomes Project.

Christie Kovar, The 1000 Genomes Project.

Sandy Lee, The 1000 Genomes Project.

Donna Muzny, The 1000 Genomes Project.

Lynne Nazareth, The 1000 Genomes Project.

Jeff Reid, The 1000 Genomes Project.

Aniko Sabo, The 1000 Genomes Project.

Fuli Yu, The 1000 Genomes Project.

Jin Yu, The 1000 Genomes Project.

Gabor T. Marth, The 1000 Genomes Project.

Erik P. Garrison, The 1000 Genomes Project.

Amit Indap, The 1000 Genomes Project.

Wen Fung Leong, The 1000 Genomes Project.

Aaron R. Quinlan, The 1000 Genomes Project.

Chip Stewart, The 1000 Genomes Project.

Alistair N. Ward, The 1000 Genomes Project.

Jiantao Wu, The 1000 Genomes Project.

Kristian Cibulskis, The 1000 Genomes Project.

Tim J. Fennell, The 1000 Genomes Project.

Stacey B. Gabriel, The 1000 Genomes Project.

Kiran V. Garimella, The 1000 Genomes Project.

Chris Hartl, The 1000 Genomes Project.

Erica Shefler, The 1000 Genomes Project.

Carrie L. Sougnez, The 1000 Genomes Project.

Jane Wilkinson, The 1000 Genomes Project.

Andrew G. Clark, The 1000 Genomes Project.

Simon Gravel, The 1000 Genomes Project.

Fabian Grubert, The 1000 Genomes Project.

Laura Clarke, The 1000 Genomes Project.

Paul Flicek, The 1000 Genomes Project.

Richard E. Smith, The 1000 Genomes Project.

Xiangqun Zheng-Bradley, The 1000 Genomes Project.

Stephen T. Sherry, The 1000 Genomes Project.

Hoda M. Khouri, The 1000 Genomes Project.

Justin E. Paschall, The 1000 Genomes Project.

Martin F. Shumway, The 1000 Genomes Project.

Chunlin Xiao, The 1000 Genomes Project.

Gil A. McVean, The 1000 Genomes Project.

Sol J. Katzman, The 1000 Genomes Project.

Gonçalo R. Abecasis, The 1000 Genomes Project.

Tom Blackwell, The 1000 Genomes Project.

Elaine R. Mardis, The 1000 Genomes Project.

David Dooling, The 1000 Genomes Project.

Lucinda Fulton, The 1000 Genomes Project.

Robert Fulton, The 1000 Genomes Project.

Daniel C. Koboldt, The 1000 Genomes Project.

Richard M. Durbin, The 1000 Genomes Project.

Senduran Balasubramaniam, The 1000 Genomes Project.

Allison Coffey, The 1000 Genomes Project.

Thomas M. Keane, The 1000 Genomes Project.

Daniel G. MacArthur, The 1000 Genomes Project.

Aarno Palotie, The 1000 Genomes Project.

Carol Scott, The 1000 Genomes Project.

James Stalker, The 1000 Genomes Project.

Chris Tyler-Smith, The 1000 Genomes Project.

Mark B. Gerstein, The 1000 Genomes Project.

Suganthi Balasubramanian, The 1000 Genomes Project.

Aravinda Chakravarti, The 1000 Genomes Project.

Bartha M. Knoppers, The 1000 Genomes Project.

Gonçalo R. Abecasis, The 1000 Genomes Project.

Carlos D. Bustamante, The 1000 Genomes Project.

Neda Gharani, The 1000 Genomes Project.

Richard A. Gibbs, The 1000 Genomes Project.

Lynn Jorde, The 1000 Genomes Project.

Jane S. Kaye, The 1000 Genomes Project.

Alastair Kent, The 1000 Genomes Project.

Taosha Li, The 1000 Genomes Project.

Amy L. McGuire, The 1000 Genomes Project.

Gil A. McVean, The 1000 Genomes Project.

Pilar N. Ossorio, The 1000 Genomes Project.

Charles N. Rotimi, The 1000 Genomes Project.

Yeyang Su, The 1000 Genomes Project.

Lorraine H. Toji, The 1000 Genomes Project.

Chris TylerSmith, The 1000 Genomes Project.

Lisa D. Brooks, The 1000 Genomes Project.

Adam L. Felsenfeld, The 1000 Genomes Project.

Jean E. McEwen, The 1000 Genomes Project.

Assya Abdallah, The 1000 Genomes Project.

Christopher R. Juenger, The 1000 Genomes Project.

Nicholas C. Clemm, The 1000 Genomes Project.

Francis S. Collins, The 1000 Genomes Project.

Audrey Duncanson, The 1000 Genomes Project.

Eric D. Green, The 1000 Genomes Project.

Mark S. Guyer, The 1000 Genomes Project.

Jane L. Peterson, The 1000 Genomes Project.

Alan J. Schafer, The 1000 Genomes Project.

Gonçalo R. Abecasis, The 1000 Genomes Project.

David L. Altshuler, The 1000 Genomes Project.

Adam Auton, The 1000 Genomes Project.

Lisa D. Brooks, The 1000 Genomes Project.

Richard M. Durbin, The 1000 Genomes Project.

Richard A. Gibbs, The 1000 Genomes Project.

Matt E. Hurles, The 1000 Genomes Project.

Gil A. McVean, The 1000 Genomes Project.

References

1. 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. [PMC free article] [PubMed]
2. Boyko AR, et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008;4:e1000083. [PMC free article] [PubMed]
3. Williamson SH, et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc Natl Acad Sci USA. 2005;102:7882–7887. [PMC free article] [PubMed]
4. Adams AM, Hudson RR. Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms. Genetics. 2004;168:1699–1712. [PMC free article] [PubMed]
5. Marth GT, Czabarka E, Murvai J, Sherry ST. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics. 2004;166:351–372. [PMC free article] [PubMed]
6. Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009;5:e1000695. [PMC free article] [PubMed]
7. Nielsen R, et al. Darwinian and demographic forces affecting human protein coding genes. Genome Res. 2009;19:838–849. [PMC free article] [PubMed]
8. Sawyer SA, Hartl DL. Population genetics of polymorphism and divergence. Genetics. 1992;132:1161–1176. [PMC free article] [PubMed]
9. Bustamante CD, Wakeley J, Sawyer S, Hartl DL. Directional selection and the site-frequency spectrum. Genetics. 2001;159:1779–1788. [PMC free article] [PubMed]
10. Yi X, et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010;329:75–78. [PMC free article] [PubMed]
11. Lynch M. Estimation of allele frequencies from high-coverage genome-sequencing projects. Genetics. 2009;182:295–301. [PMC free article] [PubMed]
12. Li Y, et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet. 2010;42:969–972. [PubMed]
13. Ionita-Laza I, Lange C, M Laird N. Estimating the number of unseen variants in the human genome. Proc Natl Acad Sci USA. 2009;106:5008–5013. [PMC free article] [PubMed]
14. Ionita-Laza I, Laird NM. On the optimal design of genetic variant discovery studies. Stat Appl Genet Mol Biol. 2010;9:33. [PMC free article] [PubMed]
15. Bunge J, Fitzpatrick M. Estimating the number of species. A review. J Am Stat Assoc. 1993;88:364–373.
16. Burnham K, Overton W. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika. 1978;65:625–633.
17. Burnham K, Overton W. Robust estimation of population size when capture probabilities vary among animals. Ecology. 1979;60:927–936.
18. Klein RG, Hublin JJ. The Human Career. Human Biological and Cultural Origins. Chicago: University of Chicago Press; 1999.
19. Beerli P. Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations. Mol Ecol. 2004;13:827–836. [PubMed]
20. Laval G, Patin E, Barreiro LB, Quintana-Murci L. Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions. PLoS One. 2010;5:e10284. [PMC free article] [PubMed]
21. Garrigan D, et al. Inferring human population sizes, divergence times and rates of gene flow from mitochondrial, X and Y chromosome resequencing data. Genetics. 2007;177:2195–2207. [PMC free article] [PubMed]
22. Coventry A, et al. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat Commun. 2010;1:131. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles