- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- Genetics
- v.181(2); 2009 Feb
- PMC2644958

# Correcting Estimators of θ and Tajima's *D* for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery Process

^{*}Departament de Ciències de la Salut i de la Vida, Universitat Pompeu Fabra, 08003 Barcelona, Catalonia, Spain,

^{†}Departments of Integrative Biology and Statistics, University of California, Berkeley, California 94720-3140 and

^{‡}Department of Biology, University of Copenhagen, 2100 Kbh Ø, Copenhagen, Denmark

^{1}

*Corresponding author:*Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Doctor Aiguader 88, 08003 Barcelona, Spain. E-mail: ude.fpu@zerimar.anna

## Abstract

Most single-nucleotide polymorphism (SNP) data suffer from an ascertainment bias caused by the process of SNP discovery followed by SNP genotyping. The final genotyped data are biased toward an excess of common alleles compared to directly sequenced data, making standard genetic methods of analysis inapplicable to this type of data. We here derive corrected estimators of the fundamental population genetic parameter θ = 4*N*_{e}*μ* (*N*_{e}, effective population size; *μ*, mutation rate) on the basis of the average number of pairwise differences and on the basis of the number of segregating sites. We also derive the variances and covariances of these estimators and provide a corrected version of Tajima's *D* statistic. We reanalyze a human genomewide SNP data set and find substantial differences in the results with or without ascertainment bias correction.

THE HapMap data (International HapMap Consortium 2007) and other genomewide single-nucleotide polymorphism (SNP) data sets provide a valuable resource for population genetic analysis. Much interest in the analysis of such data has focused on estimating demographic parameters or inferring natural selection (*e.g.*, Bamshad and Wooding 2003; Wooding 2004; Carlson *et al.* 2006; Sabeti *et al.* 2006; Voight *et al.* 2006; Wang *et al.* 2006; Tang *et al.* 2007; Williamson *et al.* 2007). However, many of the studies of genomewide SNP data have been challenged by the fact that the SNP genotyping data have been obtained by a process in which SNPs are first discovered in a small panel of individuals and subsequently typed in a much larger panel (*e.g.*, Picoult-Newberg *et al.* 1999; Altshuler *et al.* 2000; Mead *et al.* 2003). Although this procedure provides a much faster and cheaper way of generating data than direct sequencing of the full panel, it also produces data with a relative excess of alleles of intermediate frequencies compared to directly sequenced data. Rare SNPs are more easily discovered in large panels than in small panels, so an initial discovery process based on a small panel produces an excess of high-frequency alleles in the genotyped sample. As a consequence, the data will be different from what is assumed in standard population genetic models with respect to allele frequency distribution (*e.g.*, Nielsen 2000; Wakeley *et al.* 2001), patterns of linkage disequilibrium (Nielsen and Signorovitch 2003), and level of population subdivision (Nielsen 2004). This ascertainment bias toward high-frequency alleles can have serious consequences when standard population genetic tools (*e.g.*, Tajima 1989; Fu and Li 1993; Fay and Wu 2000; Ramos-Onsins and Rozas 2002) are used for the analysis of the data. For example, Kreitman and Di Rienzo (2004) and Soldevila *et al.* (2005) showed that the apparent effects of balancing selection detected in the prion protein gene (*PRPN*) by Mead *et al.* (2003) in fact were an artifact caused by this type of ascertainment bias.

Three different approaches have been used to address the problem of ascertainment biases in studies of real data: (i) applying methods that may be more robust to the effect of ascertainment bias, such as methods based on haplotype structure (*e.g.*, Sabeti *et al.* 2002), (ii) simulating data under the ascertainment procedure to derive appropriate critical values and confidence intervals using a distribution that directly takes ascertainment into account (*e.g.*, Carlson *et al.* 2004; Voight *et al.* 2006), and (iii) directly correcting the statistical estimators and statistics for the ascertainment bias (*e.g.*, Nielsen 2000; Wakeley *et al.* 2001; Nielsen and Signorovitch 2003; Polanski and Kimmel 2003; Marth *et al.* 2004; Nielsen *et al.* 2004) in specific models. However, hitherto there have been no ascertainment correction methods available for some of the most basic population genetic tools. Here we derive ascertainment corrected estimators of the fundamental population genetic parameter θ = 4*N*_{e}*μ* (*N*_{e}, effective population size; *μ*, mutation rate) and an ascertainment corrected version of the popular statistic used for detecting selection: Tajima's *D*. Our results are for a neutral locus, without recombination, sampled from a panmictic population of constant size.

## THEORY AND METHODS

#### Estimators of θ:

Tajima's *D* (Tajima 1989) is calculated as the difference between Tajima's estimator of θ, θ_{T} (Tajima 1989), and Watterson's estimator of θ, θ_{w} (Watterson 1975). Tajima's estimator is based on the average number of pairwise differences (*π*) and is given by

where η_{i} is the number of derived alleles segregating at a frequency of *i/n*, in a sample of *n* chromosomes. The calculation of is identical for arbitrarily labeled alleles; however, we use the definition on the basis of knowing which allele is derived, to keep a consistent notation throughout. Watterson's estimator is given by

where is the number of segregating sites.

We assume an ascertainment model in which a subset of *d* chromosomes has been chosen independently among the *n* chromosomes for ascertainment, as the data to be analyzed later are the result of this same discovery procedure. We further assume that the chromosomes chosen for ascertainment are independent among SNPs; that is, each SNP has been ascertained from a different set of chromosomes. This procedures simulates the data obtained from procedures such as shotgun or array-based resequencing (used in Perlegen data), where different individuals are sequenced and the fragments obtained are aligned using a reference sequence. The probability of ascertainment of a SNP with alleles of frequencies *i/n* and (*n − i)*/*n* is then

(Nielsen 2004), where we use the definition if *k > n*. The final sample after ascertainment is denoted the *genotyped sample*.

The expected number of segregating sites in the genotyped sample under this ascertainment scheme, *S*^{(A)}, is then simply the sum over all allelic classes of the expected number of segregating sites of that allelic class (*E*[η_{i}] = θ/*i*; Tajima 1989; Fu 1995) multiplied by the probability of ascertainment of the allelic class:

An unbiased, ascertainment corrected method-of-moments estimator of θ, similar to Watterson's estimator, is then given by

The expected number of pairwise differences in the genotyped sample is similarly given by the sum over all allelic classes of the expected contribution to the pairwise differences of the allelic class multiplied by the probability of ascertainment of the allelic class:

An unbiased, ascertainment corrected method-of-moments estimator similar to Tajima's estimator is then given by

Note that these estimators are identical to the traditional estimators, and , when there is no ascertainment bias; *i.e.*, .

#### Variances of the estimators:

We use notation and some results from Durrett (2008, Chap. 2), to derive covariance and variances of these estimators assuming no intralocus recombination. In the absence of any ascertainment bias (Fu 1995; Durrett 2008),

where equals

equals

and

Using the conditional variance formula

with and , we get

Also, from Equation 8 and from the independence among SNPs of the ascertainment probabilities, we have

Then, recalling that , we obtain

We can then easily get the variance of *S*^{(A)}:

The variance of the ascertainment corrected estimator of θ based on the number of segregating sites is then given by

The variance of the estimator based on the average number of pairwise differences becomes

#### Covariances and Tajima's *D*:

Defining the coefficients

we have

by using Equations 10 and 11 and expanding the covariance of the sums as the sum of the covariances.

Also

We now define an ascertainment corrected Tajima's *D* as

To calculate for real data we need to know the value of θ and θ^{2}. We estimate θ using , similarly to the usual use of for calculating the classical Tajima's *D* statistic. We estimate θ^{2} as

The *D*_{C} statistic is identical to the traditional Tajima's *D* in the absence of an ascertainment, *i.e.*, when .

#### Simulations:

Simulated data were generated using the standard coalescent simulation program ms (Hudson 2002) with 10,000 and/or 1,000,000 replicates. We explored three different values of θ: 2.23, 22.33, and 89.30, corresponding to the estimates of θ based on Watterson's estimator calculated from the minimum, average, and maximum number of segregating sites found in the genes represented in the SeattleSNP database (http://pga.gs.washington.edu/, Crawford *et al.* 2005). We also explored results for an extreme value of θ, θ = 150. To generate ascertainment samples from the simulated data, we subsampled *d* (= 2, 5, or 10) gene copies from each segregating site in the sample of size *n* (= 20 or 50). Moreover, we have also generated a set of samples of size *n* = 100 to explore the relationship between *d* and the variance in the estimators. If the segregating site was polymorphic in the subsample, it was included in the final sample; otherwise it was ignored. In all cases, the recombination rate was set to 0.

#### Perlegen data:

Genotype data from Perlegen were obtained from http://genome.perlegen.com/browser/download.html, and we used information regarding the ascertainment protocol discussed in Clark *et al.* (2005) and Hinds *et al.* (2005). For each SNP, the number of individuals that have been included in the discovery panel is known for 69% of the SNPs (ascertainment panel A), and only these SNPs are included in our analysis. Ascertainment of SNPs in this panel was done by genomewide shotgun resequencing. Data from all populations were pooled, and Tajima's *D* was calculated chromosome by chromosome through a sliding window of 100 and 500 kb, sliding by 10 kb at a time. To take into account varying sample size (*n*) and varying ascertainment sample size (*d*), for each window, we use

where is the average sample size in the window, *f*(*d*) is the proportion of SNPs with ascertainment sample size *d*, and *d*_{max} and *d*_{min} are the maximal and minimal values of *d* observed in the window. We use this approach instead of a SNP-by-SNP correction to reduce the computational complexity of the problem. Only those windows that contained at least 10 class A SNPs were included in the analysis. To examine the effect of the ascertainment bias, we have included results for both the uncorrected and the corrected values of Tajima's *D*.

## RESULTS

We evaluate the corrections of and , their variances and covariances, and Tajima's *D* using coalescent simulations in the first three subsections. Subsequently, we apply the corrected Tajima's *D* on the Perlegen data set.

#### Correction of the estimators of θ:

Figure 1 shows the distribution of the estimates based on the uncorrected estimators and , in the presence of an ascertainment bias (*d* = 5) and without an ascertainment bias (*d* = *n* = 50), and the corresponding distributions of the corrected estimates, and , in the presence of an ascertainment bias for *n* = 50 and θ = 150. For , the average estimate of θ is 69.82 with and 150.07 without an ascertainment bias, respectively. However, the ascertainment corrected estimate is = 150.11. For , the average estimate of θ is 104.18 with and 150.11 without an ascertainment bias, respectively, and the ascertainment corrected estimate is = 150.14. For θ = (2.23, 22.33, and 89.20) we also found large differences between the average estimates of θ with and without an ascertainment bias, while the true value was recovered under ascertainment when the corrected estimators were used (see supplemental data A for more details). This shows that the traditional estimators, as expected, are biased in the presence of an ascertainment bias, but that the ascertainment corrected estimators derived here recover an unbiased estimate.

#### Correction of the variances and the covariance:

As seen in Figure 1, the variance in the corrected estimates of θ is increased in the presence of an ascertainment bias when the number of SNPs in the data set is held constant. Equations 13 and 14 quantify the variance in the estimate and have been verified by simulations (not shown).

Figure 2 shows the relationship between *d* and the variance in the estimators for *n* = 100. When the ascertainment sample size is small compared to the size of the sample, the variances and covariances are greatly increased [for *d* = 2 the variance of Tajima's θ () is nearly doubled, and the variance of is nearly multiplied by four]. However, when *d* approaches *n*/2, the difference between the real variance and the estimated variance is drastically reduced.

#### Correction of Tajima's *D*:

Figure 3 shows the distribution of Tajima's *D* and *D*_{C} values for *n* = 50, *d* = 5, and θ = 150. When there is no ascertainment bias, the distribution of Tajima's *D* values using Equation 17 is identical to the one obtained using the standard method, with mean = −0.1103 in both cases, while when there is ascertainment bias and we do not apply the corrected formula, the distribution is greatly skewed toward positive values (mean = 1.5170). If the correction is applied to the simulated data suffering from the ascertainment bias, the nonascertained distribution is approximately recovered and its mean, −0.2497, gets closer to the nonascertainment one. However, because the correction is nonlinear, it does not match the original distribution exactly but is slightly skewed toward negative values compared to the original distribution and has a slightly larger variance. Neither the original Tajima's *D* in the absence of an ascertainment bias nor the current ascertainment corrected estimator in the presence of ascertainment bias has expectation equal to zero. Both rely on a ratio of two correlated statistics, so even though the numerator has expectation equal to zero, the expectation of the ratio is not equal to zero. Also, it is not surprising that the variance is slightly larger for the ascertainment corrected statistic. It suggests that some information has been lost by the ascertainment process. The same tendencies can be seen for the other values of θ explored (see supplemental data A).

#### Analysis of Perlegen data:

To illustrate the use of the correction of Tajima's *D*, we applied it to a Perlegen data set (Hinds *et al.* 2005), previously analyzed by Clark *et al.* (2005) without correcting for ascertainment bias. The Perlegen data were analyzed chromosome by chromosome, taking windows of 100 spanning 10 kb obtaining, on average, 12,221 windows per chromosome. A total of 74.47% of the windows have ≥10 SNPs and are, therefore, included for the comparison between the corrected and the uncorrected Tajima's *D* values.

An example of the result, using windows of 500 kb on chromosome 1, is shown in Figure 4. Positive Tajima's *D* values (1.9) are found in the area containing the genes *TMEM57*, *MAN1C1*, and *LDLRAP1*. The former is a transmembrane protein and the second a mannosidase. The latter encodes for a cytosolic protein that interacts with the LDL receptor, and mutations in it have cause hypercholesterolemia, an autosomal recessive disorder (Mishra *et al.* 2005; Quagliarini *et al.* 2007). Negative Tajima's *D* values ~ −2 were found in windows containing *HIST2H**, *FCGR1A*, and *PPIAL4*, a histone cluster, a fragment of the IgG receptor, and the peptidylprolyl isomerase A, respectively. *D* values of −1.6 were found around the *SRGAP2* gene, whose mRNA has been found in melanoma, germ cell tumors, chondrosarcoma, and retinoblastoma (Katoh and Katoh 2003).

*D*on chromosome 1 in the human genome based on the Perlegen data. The genes with the most extreme

*D*values are also indicated.

Figure 5 shows the correlation of Tajima's *D* results with and without correction for all chromosomes. As expected, the *D* values are higher than the *D*_{C} values. We examine windows with extreme values of Tajima's *D*, which we have arbitrarily defined as those with values < −2 or >2, in more detail. While there are 210 windows with *D*_{C} ≤ −2, there are only 17 windows with *D* ≤ −2. Likewise, there are 99 windows with *D*_{C} ≥ 2 and 8317 with *D* ≥ 2. Table 1 summarizes the information about the 50 windows with the most extreme values of *D*_{C} (25 lowest and 25 highest). Of the 25 windows with lowest values of *D*_{C}, 3 would not be found among the 25 most significant windows using *D*, and 8, including the *GPC3* gene, would be excluded on the basis of the *D* ≤ −2 criterion. Among the 25 most significant windows with positive values of *D*_{C}, 10 of them are not included in the set of the 25 most extreme genes based on *D*. Among these windows there are genes such as *BRCA1* or *NF1*. Fourteen of 15 genes are located on the X chromosome. A possible explanation is that selection on the X chromosome is more efficient because recessive mutations are exposed to selection in males (see, *e.g.*, Schaffner 2004).

*D*results from Perlegen data with and without correction for all chromosomes.

## DISCUSSION

We have here derived estimators of the population genetic parameter θ, and the variances and covariances of the estimators, under a model with ascertainment bias. This leads us to an ascertainment correction of Tajima's *D*. We note that similar corrections could easily be derived for other statistics as well, particularly if they can be written as functions of site frequency spectrum, *i.e.*, *η*_{i}, *i* = 1, 2, …, *n* − 1. Statistics such as Fu and Li's *D* (Fu and Li 1993) and Fay and Wu's *H* (Fay and Wu 2000) are included in this category. We also emphasize that while the ascertainment scheme here is quite specific, and the results may therefore not always apply to real data, all results are expressed in terms of the probability of ascertainment of a SNP as a function of its frequency, . It is, therefore, quite trivial to extend this work to other ascertainment schemes, including the ones considered in Nielsen *et al.* (2005), as long as appropriate ascertainment information is available.

The methods applied here assume that there is no intralocus recombination, as expressions explicitly incorporating recombination are not tractable for Tajima's *D*. There is a tradition for applying Tajima's *D* and other similar statistics that are derived assuming no recombination, even in the presence of recombination. As recombination tends to reduce the variance of Tajima's *D* among regions, such applications are considered conservative (*e.g.*, Ramírez-Soriano *et al.* 2008).

The analysis of the Perlegen data illustrates that ascertainment bias correction is of great importance when analyzing SNP genotyping data. Even when just applying outlier approaches in studies of natural selection, the ranking of different genes is likely to change with and without ascertainment bias correction. Likewise, any study aimed at quantifying variability on the basis of typical SNP data will be challenged by the ascertainment bias. It is, therefore, highly desirable that SNP genotyping projects keep close track of the SNP discovery/selection protocols used. Only when such detailed data regarding these protocols are available will it be possible to make accurate ascertainment bias corrections of the data.

A computer program implementing the ascertainment bias corrections discussed in this article can be downloaded from http://www.snpator.com/public/downloads/aRamirez/tajimasDCorrector/. A list of corrected Tajima's *D* values for different regions of the human genome can be found in supplemental data B.

## Acknowledgments

We thank Marta Melé and Francesc Calafell for their comments on this manuscript. This work was supported by National Institutes of Health grant U01HL084706 and by the Danish National Science Council.

## References

- Altshuler, D., V. J. Pollara, C. R. Cowles, W. J. Van Etten, J. Baldwin
*et al.*, 2000. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407 513–516. [PubMed] - Bamshad, M., and S. P. Wooding, 2003. Signatures of natural selection in the human genome. Nat. Rev. Genet. 4 99–111. [PubMed]
- Carlson, C. S., M. A. Eberle, L. Kruglyak and D. A. Nickerson, 2004. Mapping complex disease loci in whole-genome association studies. Nature 429 446–452. [PubMed]
- Carlson, C. S., J. D. Smith, I. B. Stanaway, M. J. Rieder and D. A. Nickerson, 2006. Direct detection of null alleles in SNP genotyping data. Hum. Mol. Genet. 15 1931–1937. [PubMed]
- Clark, A. G., M. J. Hubisz, C. D. Bustamante, S. H. Williamson and R. Nielsen, 2005. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res. 15 1496–1502. [PMC free article] [PubMed]
- Crawford, D. C., D. T. Akey and D. A. Nickerson, 2005. The patterns of natural variation in human genes. Annu. Rev. Genomics Hum. Genet. 6 287–312. [PubMed]
- Durrett, R., 2008.
*Probability Models for DNA Sequence Evolution (Probability and Its Applications).*Springer, Berlin/Heidelberg, Germany/New York. - Fay, J. C., and C. I. Wu, 2000. Hitchhiking under positive Darwinian selection. Genetics 155 1405–1413. [PMC free article] [PubMed]
- Fu, Y. X., 1995. Statistical properties of segregating sites. Theor. Popul. Biol. 48 172–197. [PubMed]
- Fu, Y. X., and W. H. Li, 1993. Statistical tests of neutrality of mutations. Genetics 133 693–709. [PMC free article] [PubMed]
- Hinds, D. A., L. L. Stuve, G. B. Nilsen, E. Halperin, E. Eskin
*et al.*, 2005. Whole-genome patterns of common DNA variation in three human populations. Science 307 1072–1079. [PubMed] - Hudson, R. R., 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18 337–338. [PubMed]
- International HapMap Consortium, 2007. A second generation human haplotype map of over 3.1 million SNPs. Nature 449 851–861. [PMC free article] [PubMed]
- Katoh, M., and M. Katoh, 2003. FNBP2 gene on human chromosome 1q32.1 encodes ARHGAP family protein with FCH, FBH, RhoGAP and SH3 domains. Int. J. Mol. Med. 11 791–797. [PubMed]
- Kreitman, M., and A. Di Rienzo, 2004. Balancing claims for balancing selection. Trends Genet. 20 300–304. [PubMed]
- Marth, G. T., E. Czabarka, J. Murvai and S. T. Sherry, 2004. The allele frequency spectrum in genomewide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166 351–372. [PMC free article] [PubMed]
- Mead, S., M. P. Stumpf, J. Whitfield, J. A. Beck, M. Poulter
*et al.*, 2003. Balancing selection at the prion protein gene consistent with prehistoric kurulike epidemics. Science 300 640–643. [PubMed] - Mishra, S. K., P. A. Keyel, M. A. Edeling, A. L. Dupin, D. J. Owen
*et al.*, 2005. Functional dissection of an AP-2 beta2 appendage-binding sequence within the autosomal recessive hypercholesterolemia protein. J. Biol. Chem. 280 19270–19280. [PubMed] - Nielsen, R., 2000. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154 931–942. [PMC free article] [PubMed]
- Nielsen, R., 2004. Population genetic analysis of ascertained SNP data. Hum. Genomics 1 218–224. [PMC free article] [PubMed]
- Nielsen, R., and J. Signorovitch, 2003. Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium. Theor. Popul. Biol. 63 245–255. [PubMed]
- Nielsen, R., M. J. Hubisz and A. G. Clark, 2004. Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data. Genetics 168 2373–2382. [PMC free article] [PubMed]
- Nielsen, R., S. Williamson, Y. Kim, M. J. Hubisz, A. G. Clark
*et al.*, 2005. Genomic scans for selective sweeps using SNP data. Genome Res. 15 1566–1575. [PMC free article] [PubMed] - Picoult-Newberg, L., T. E. Ideker, M. G. Pohl, S. L. Taylor, M. A. Donaldson
*et al.*, 1999. Mining SNPs from EST databases. Genome Res. 9 167–174. [PMC free article] [PubMed] - Polanski, A., and M. Kimmel, 2003. New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth. Genetics 165 427–436. [PMC free article] [PubMed]
- Ramírez-Soriano, A., S. E. Ramos-Onsins, J. Rozas, F. Calafell and A. Navarro, 2008. Statistical power analysis of neutrality tests under demographic expansions, contractions and bottlenecks with recombination. Genetics 179 555–567. [PMC free article] [PubMed]
- Quagliarini, F., J. C. Vallve, F. Campagna, A. Alvaro, F. J. Fuentes-Jimenez
*et al.*, 2007. Autosomal recessive hypercholesterolemia in Spanish kindred due to a large deletion in the ARH gene. Mol. Genet. Metab. 92 243–248. [PubMed] - Ramos-Onsins, S. E., and J. Rozas, 2002. Statistical properties of new neutrality tests against population growth. Mol. Biol. Evol. 19 2092–2100. [PubMed]
- Sabeti, P. C., D. E. Reich, J. M. Higgins, H. Z. Levine, D. J. Richter
*et al.*, 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419 832–837. [PubMed] - Sabeti, P. C., S. F. Schaffner, B. Fry, J. Lohmueller, P. Varilly
*et al.*, 2006. Positive natural selection in the human lineage. Science 312 1614–1620. [PubMed] - Schaffner, S. F., 2004. The X chromosome in population genetics. Nat. Rev. Genet. 5 43–51. [PubMed]
- Soldevila, M., F. Calafell, A. Helgason, K. Stefansson and J. Bertranpetit, 2005. Assessing the signatures of selection in PRNP from polymorphism data: results support Kreitman and di Rienzo's opinion. Trends Genet. 21 389–391. [PubMed]
- Tajima, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123 585–595. [PMC free article] [PubMed]
- Tang, K., K. R. Thornton and M. Stoneking, 2007. A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 5 e171. [PMC free article] [PubMed]
- Voight, B. F., S. Kudaravalli, X. Wen and J. K. Pritchard, 2006. A map of recent positive selection in the human genome. PLoS Biol. 4 e72. [PMC free article] [PubMed]
- Wakeley, J., R. Nielsen, S. N. Liu-Cordero and K. Ardlie, 2001. The discovery of single-nucleotide polymorphisms–and inferences about human demographic history. Am. J. Hum. Genet. 69 1332–1347. [PMC free article] [PubMed]
- Wang, Y., L. P. Zhao and S. Dudoit, 2006. A fine-scale linkage-disequilibrium measure based on length of haplotype sharing. Am. J. Hum. Genet. 78 615–628. [PMC free article] [PubMed]
- Watterson, G. A., 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7 256–276. [PubMed]
- Williamson, S. H., M. J. Hubisz, A. G. Clark, B. A. Payseur, C. D. Bustamante
*et al.*, 2007. Localizing recent adaptive evolution in the human genome. PLoS Genet. 3 e90. [PMC free article] [PubMed] - Wooding, S., 2004. Natural selection: sign, sign, everywhere a sign. Curr. Biol. 14 R700–R701. [PubMed]

**Genetics Society of America**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (344K)

- Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium.[Theor Popul Biol. 2003]
*Nielsen R, Signorovitch J.**Theor Popul Biol. 2003 May; 63(3):245-55.* - Ascertainment bias and the pattern of nucleotide diversity at the human ALDH2 locus in a Japanese population.[J Mol Evol. 2007]
*Brown BT, Woerner A, Wilder JA.**J Mol Evol. 2007 Mar; 64(3):375-85. Epub 2007 Jan 16.* - Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data.[BMC Bioinformatics. 2013]
*Korneliussen TS, Moltke I, Albrechtsen A, Nielsen R.**BMC Bioinformatics. 2013 Oct 2; 14:289. Epub 2013 Oct 2.* - Population genetic analysis of ascertained SNP data.[Hum Genomics. 2004]
*Nielsen R.**Hum Genomics. 2004 Mar; 1(3):218-24.* - Toward positional cloning with SNPs.[Curr Opin Mol Ther. 2002]
*Morton NE, Collins A.**Curr Opin Mol Ther. 2002 Jun; 4(3):259-64.*

- A bioinformatics workflow for detecting signatures of selection in genomic data[Frontiers in Genetics. ]
*Cadzow M, Boocock J, Nguyen HT, Wilcox P, Merriman TR, Black MA.**Frontiers in Genetics. 5293* - Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data[BMC Bioinformatics. ]
*Korneliussen TS, Moltke I, Albrechtsen A, Nielsen R.**BMC Bioinformatics. 14289* - Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes[Science (New York, N.Y.). 2012]
*Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM.**Science (New York, N.Y.). 2012 Jul 6; 337(6090)64-69* - Rare variant density across the genome and across populations[BMC Proceedings. ]
*Raska P, Zhu X.**BMC Proceedings. 5(Suppl 9)S39* - Deciphering the genetic basis of animal domestication[Proceedings of the Royal Society B: Biologi...]
*Wiener P, Wilkinson S.**Proceedings of the Royal Society B: Biological Sciences. 2011 Nov 7; 278(1722)3161-3170*

- Correcting Estimators of θ and Tajima's D for Ascertainment Biases Caused by the...Correcting Estimators of θ and Tajima's D for Ascertainment Biases Caused by the Single-Nucleotide Polymorphism Discovery ProcessGenetics. Feb 2009; 181(2)701PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...