• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of genoresGenome ResearchCSHL PressJournal HomeSubscriptionseTOC AlertsBioSupplyNet
Genome Res. Nov 2005; 15(11): 1553–1565.
PMCID: PMC1310643

Genomic regions exhibiting positive selection identified from dense genotype data

Abstract

The allele frequency spectrum of polymorphisms in DNA sequences can be used to test for signatures of natural selection that depart from the expected frequency spectrum under the neutral theory. We observed a significant (P = 0.001) correlation between the Tajima's D test statistic in full resequencing data and Tajima's D in a dense, genome-wide data set of genotyped polymorphisms for a set of 179 genes. Based on this, we used a sliding window analysis of Tajima's D across the human genome to identify regions putatively subject to strong, recent, selective sweeps. This survey identified seven Contiguous Regions of Tajima's D Reduction (CRTRs) in an African-descent population (AD), 23 in a European-descent population (ED), and 29 in a Chinese-descent population (XD). Only four CRTRs overlapped between populations: three between ED and XD and one between AD and ED. Full resequencing of eight genes within six CRTRs demonstrated frequency spectra inconsistent with neutral expectations for at least one gene within each CRTR. Identification of the functional polymorphism (and/or haplotype) responsible for the selective sweeps within each CRTR may provide interesting insights into the strongest selective pressures experienced by the human genome over recent evolutionary history.

According to the theory of neutral molecular evolution (Kimura 1983), the vast majority of DNA sequence-level polymorphism is selectively neutral in a population, and therefore, most of the observed diversity represents a balance between the introduction of new polymorphism by mutation and the extinction of existing polymorphism by genetic drift. Under the mutation/drift model as well as appropriate demographic assumptions regarding population size, random mating, recombination, and mutation rate, it is possible to predict the expected site frequency spectrum (SFS) of a region (Watterson 1975; Ewens 1979).

A number of statistical tests have been devised that compare an observed SFS against neutral theory predictions. One of the most frequently used tests is Tajima's D (Tajima 1989), a comparison of nucleotide diversity estimated from the number of polymorphic sites observed in a given set of chromosomes against nucleotide diversity estimated from the allele frequency of the polymorphic sites. Other tests of SFS against neutral expectations exist, including Fu and Li's D (Fu and Li 1993), which is based upon the number of singleton derived alleles observed, and Fay and Wu's H (Fay and Wu 2000), which is based upon the frequency spectrum of nonancestral alleles. By using these tests, a substantial number of genes have been identified where the observed SFS is inconsistent with neutrality, including ABO blood group (Seltsam et al. 2003), the major histocompatibility antigens (HLA) (Hughes and Yeager 1998), lactase (Bersaglieri et al. 2004), and TRPV6 (Akey et al. 2004; Stajich and Hahn 2005). Genes with an excess of high-frequency variation (observed as a positive Tajima's D, e.g., ABO, HLA) are consistent with balancing selection, while genes with an excess of low-frequency variation (significantly negative Tajima's D, e.g., lactase, TRPV6) are consistent with positive selective pressure, where an advantageous variant (haplotype) has recently replaced most of the variation in a region. Genes subjected to a recent selective sweep, where the advantageous allele has not yet become the major allele (e.g., Duffy [Hamblin and Di Rienzo 2000; Hamblin et al. 2002] and CCR5 [Libert et al. 1998]) do not have easily detectable departures using Tajima's D, but they may be detected by using Fay and Wu's test. Genes under recent balancing selection (e.g., G6PD [Verrelli et al. 2002], β-globin [Straus and Taylor 1981]) are more difficult to detect and may not be detectable by any standard nucleotide diversity test. Other simple tests of neutrality exist such as the Ewens-Watterson haplotype diversity test (Ewens 1972), haplotype lineage diversity tests (Verrelli et al. 2002), and haplotype extent tests (Sabeti et al. 2002). In addition, tests for geographically restricted selective pressure measured by divergence between populations are also available (Fst) (Weir and Cockerham 1984). Several recent studies have evaluated the utility of combining multiple tests to identify genes showing signatures of natural selection and have introduced more sophisticated approaches to evaluate the robustness of these analyses to the neutral model. It is noteworthy that in these studies, all genes with significantly negative Tajima's D under the simplest neutral models were robust to a range of demographic parameters (Akey et al. 2004; Stajich and Hahn 2005), although questions about some of the tested models have been raised (Thornton 2005).

The identification of >10 million single nucleotide polymorphisms (SNPs) across the human genome, and the emergence of large-scale data sets of genotypes at a subset of these SNPs in multiple populations are leading to many insights into the patterns of sequence variation across the human genome (Sabeti et al. 2002; Clark et al. 2003a,b; Nielsen et al. 2005; Williamson et al. 2005). Here we describe the application of Tajima's D to a dense genotype data set (the Perlegen data set) (Hinds et al. 2005). We compared Tajima's D in completely resequenced human genes (SeattleSNPs, http://pga.gs.washington.edu) with the Perlegen data set and observed strong correlations in both African American and European American data for coding and non-coding regions. Based on these correlations, we used an empirically derived sliding window distribution of Tajima's D to examine the autosomal regions of the human genome. We identified large regions of the genome with an excess of rare variation in each of three populations, consistent with strong and recent selective sweeps that would lead to rapid increases in the frequency of an advantageous polymorphism and/or haplotype within each region (Smith and Haigh 1974). To confirm the validity of this approach, eight genes (defined as known genes or expressed sequence tags [ESTs]) from six such regions were selected for directed resequencing to obtain the full SFS and directly assess Tajima's D. Our results are consistent with a strong, recent, positive selective pressure in each of these resequenced regions, and they demonstrate the utility of dense genotype data in identifying such regions across the genome.

Results

To evaluate the utility of the Perlegen data in evaluating Tajima's D, we compared genes resequenced by SeattleSNPs in the same African-descent (AD) and European-descent (ED) individuals. Tajima's D was compared for all autosomal genes where at least five SNPs within 10 kb of the transcript were polymorphic in the Perlegen data set. The final data set consisted of 179 genes meeting this criterion in at least one population, with 178 genes in the AD population and 173 genes in the ED population. The mean value for Tajima's D in the Perlegen data (0.94 for AD, 1.25 for ED) was substantially higher than in the SeattleSNPs data (–0.54 for AD, 0.26 for ED), as expected given an ascertainment bias toward high-frequency SNPs in the Perlegen data. A significant correlation was observed for Tajima's D between these data sets in both populations (Fig. 1A,B for AD and ED, respectively). The correlation between these two data sets was stronger in the ED population (R2 = 0.59) compared with the AD population (R2 = 0.28) but was significant in both populations (P = 0.001 by Student's t-test). The observed correlation is based on a comparison of genic regions, and it is unclear how well the correlation extrapolates to large intergenic regions, as selective pressures (and therefore site frequency spectra) could be different in such regions. However, the SeattleSNPs data consist of 6% coding sequence, 4% UTR sequence, 70% intronic sequence, and 20% flanking intergenic sequence. In the SeattleSNPs data, no significant differences were observed in diversity between intronic sequence and the flanking intergenic sequence, so the observed correlation applies at least to proximal intergenic sequences.

Figure 1.
Comparison of Tajima's D between Perlegen and SeattleSNPs data sets. For each gene, Tajima's D was calculated from complete resequencing data in the SeattleSNPs data set, or from the region spanning 10 kb upstream of the transcript, the full transcript, ...

The correlation between these data sets suggested that the Perlegen data can be used to survey the genome for regions exhibiting extreme values of Tajima's D, so we applied a sliding window analysis to all three populations genotyped by Perlegen. The distribution of Tajima's D values for a 100-kbp sliding window is shown in Figure 2. The average windowed Tajima's D in the AD population was 1.20 (±0.73 SD) with a range of –2.41 to 4.03. In the ED population, the average Tajima's D was 1.40 (±1.01 SD) with a range of –2.80 to 4.34, while in the Chinese (XD) population the average was 1.45 (±1.13 SD) across a range of –3.11 to 4.42. These distributions were substantially skewed in all three populations, with a heavier tail to the distribution at low values.

Figure 2.
A probability density plot of the distribution of Tajima's D in the sliding windows is shown for each population. All three distributions depart significantly from a normal distribution, most noticeably in the heavy tail at low values in each population. ...

Results from the sliding window analysis are depicted across a 50-megabase segment of chromosome 1 (chr1, 1–50,000,000) in Figure 3, and results were similar across the rest of the genome. Tracks displaying the sliding window data for each chromosome are available on the UCSC Genome Browser (Kent et al. 2002). To identify regions recently subject to strong selective pressure, we developed a qualitative algorithm for identification of Contiguous Regions of Tajima's D Reduction (CRTRs) in the windowed data, as described in the Methods, and applied this independently to each population. The CRTRs identified by this approach are listed in Table 1: Seven CRTRs were identified in the AD population, 23 in the ED population, and 29 in the XD population. Four CRTRs overlapped between populations: chr20, 20360000–20690000, overlapped between the AD and ED populations, while chr11 (37980000–38290000), chr16 (46050000–46340000), and chr18 (28640000–29150000) overlapped between XD and ED populations. The majority of CRTRs spanned 300,000–400,000 bp (20–30 windows), although several large CRTRs spanned more than half a megabase in either the ED or the XD population. For example, nearly a megabase of chromosome 1 near the CLSPN gene (chr1, 35220000–36210000) was observed as a single CRTR in the ED population, and CRTRs spanning >600 kb were observed on chromosomes 1 (chr1, 92220000–93030000) and 2 (chr2, 108350000–109120000) in the XD population.

Figure 3.
Tajima's D in 100-kbp sliding windows with 10-kbp steps is shown across the first 50 megabases of chromosome 1. Several CRTRs are visible, including a region near 35M in the ED population containing CLSPN (large blue arrowhead) and a region near 41M in ...

A major assumption in the interpretation of SFS is that ascertainment of the SNPs genotyped is unbiased or at least consistently biased such that the positive shift in Tajima's D is similar across the genome. The SNPs genotyped in the Perlegen map were drawn from three sources: Perlegen's internal resequencing project (class A), dbSNP validated SNPs (class B), and dbSNP un-validated SNPs (class C). Because these three classes show different SFS, with classes A and C enriched for rare variants relative to class B, each class would be expected to show a different bias in the SFS. In theory, it is possible to correct an observed SFS for the ascertainment scheme used to select SNPs (Nielsen et al. 2004), but in this case because SFS has been assessed within specific ethnic groups, it would be inappropriate to use this correction because the SNPs were ascertained in a global mixture of ethnic groups (R. Nielsen, pers. commun.). In order to investigate the possibility that a biased SFS from the three classes could account for the identified CRTRs, we examined the relative frequency of each SNP class within the CRTRs and compared it against genome-wide averages. Genome-wide, 79.9% of SNPs were class A, 18.5% were class B, and 1.6% were class C. Since SNP class data were mapped to the hg16 build of the human genome, we remapped the CRTR results from hg17 to hg16. One of 55 unique CRTRs mapped to two regions in the hg16 genome build and was excluded from further analysis. Among the 54 CRTRs uniquely mapped to build hg16, we found a modest but significant excess of class B (20.6%, P = 0.001 by χ2 test) and class C SNPs (1.9%, P = 0.006 by χ2 test).

To further assess whether SNP ascertainment bias might account for the CRTRs, we examined each CRTR independently (Table 1). The relative rarity of class C meant that fewer than five class C SNPs were expected in most CRTRs, so we merged classes A and C for this comparison. The proportion of class B SNPs ranges from 1.8%–85.3%, with just six CRTRs >50%. In 28 CRTRs we observed a significant departure in the frequency of class B SNPs from genome-wide averages, after Bonferroni correction for 54 tests. Tellingly, 16 of these departures were toward an excess of class B, and 12 were toward an excess of the non-B classes. Thus, although class B SNPs are modestly enriched in the CRTRs on average, only a minority of CRTRs are enriched for such SNPs, with a nearly even number of CRTRs enriched for type A and C SNPs. We interpret this as evidence that SNP class (and thus SNP ascertainment bias) has not substantially biased CRTR identification.

To further examine the possible effects of SNP class SFS bias in identification of CRTRs, we reanalyzed the genome by using only class A SNPs, which comprise ~80% of the map overall. Fully 75% of the CRTRs detected by using all SNPs were also CRTRs using only class A. Even in the CRTRs that did not meet the CRTR definition using only class A, all but two (chr3, 89690000–90110000, in ED and chr2, 194650000–194990000, in XD, which were highly class B enriched) still showed a dramatic excess of rare variants, with between 50% and 75% of windows below the 1% empiric threshold. Thus, the majority of CRTRs are robust to potential bias introduced by SNP classes B and C, and nearly all show consistent trends toward unusual SFS using only class A SNPs.

It is also possible that low recombination rates biased our survey toward the identification of low diversity regions that coincide with low recombination rates. The average recombination rate for regions flanking the CRTRs was 0.67 cM/Mb (Table 1), which is significantly lower than the genome-wide average of 1.13 cM/Mb, but only four of the regions fell into recombination deserts with an estimated recombination rate of 0.0 cM/Mb. However, the observed correlation between CRTRs and low recombination regions is also expected if the CRTRs are the product of selective pressure, because the region that is swept along with an advantageous allele will be larger in regions of low recombination. Thus, there does appear to be a modest bias in the data toward identification of CRTRs in regions with low recombination rates.

To assess how well CRTRs in the Perlegen data predict Tajima's D in resequencing data, we selected eight targets from six population-specific CRTRs for resequencing (Table 2). In each CRTR, targets were chosen for one or more of the following reasons: dramatic allele frequency differences between populations in the Perlegen data (EDAR, CLSPN), central position within the CRTR (EDAR, CLSPN, SCMH1, FLJ23878), important gene function (GCG), or the target was a spliced EST in a CRTR without any known genes (AW183861, BX115137). CTPS and FLJ23878 were included because these genes, in combination with SCMH1, accounted for all of the known genes within a single CRTR in the AD population. Values of Tajima's D below –2 for resequencing data are significant under the simplest neutral models (Tajima 1989) and have been asserted to be robustly inconsistent with neutrality under a variety of demographic models (Akey et al. 2004; Stajich and Hahn 2005), although questions regarding the bottleneck models have been raised (Thornton 2005). In four of six CRTRs resequenced, at least one gene was identified with a Tajima's D below –2 in the appropriate population, and the other two CRTRs also show a substantial excess of low-frequency variation in the SFS. In AW183861 the observed Tajima's D was –1.92 in the ED population, placing it in the same range as DCN, which has previously been suggested as a target of selective pressure (Akey et al. 2004; Stajich and Hahn 2005). The other gene with marginal Tajima's D was GCG, which was selected because of its importance in glucose metabolism but lay at the boundary of the CRTR. In this case, the promoter and transcript of the gene (bases 1–11349) lie within the CRTR and showed a significant excess of rare polymorphism in the SFS (Tajima's D = –2.48), while the region 3′ of the transcript lies beyond the CRTR and showed a less extreme SFS (Tajima's D = –0.45). Thus, in all six resequenced regions we observed a strong trend toward a negative Tajima's D.

Table 2.
Regions selected for targeted resequencing

Although we demonstrate that CRTRs identified from the Perlegen data predict extreme Tajima's D in resequencing data, dramatic departures from the expected SFS might be expected to occur at random in the genome, so we investigated whether the underlying nucleotide diversity of these regions was also consistent with selective pressure. Tajima's D detects departures from the expected SFS under neutral assumptions by comparing two measures of nucleotide diversity, θ and π. The absolute values of these statistics can also provide evidence for selective pressure, as both are reduced as an advantageous allele nears fixation. Estimated from the 179 SeattleSNPs genes previously compared against the Perlegen data, the average π was 9.02 (±3.80) × 10–4 in AD and 7.17 (±4.00) × 10–4 in ED populations, and the average θ was 10.44 (±3.14) × 10–4 in AD and 6.48 (±2.68) × 10–4 in ED populations. All of the CRTR genes resequenced show trends toward reduced π in the appropriate population, and most also show reduced θ (Table 2), which is consistent with selection, although low absolute nucleotide diversity might also be attributable to reduced mutation rates in these regions.

We further examined the possibility that positive selective pressure might account for the resequenced CRTRs by calculating Fay and Wu's H statistic for each region (Fay and Wu 2000). H compares the nucleotide diversity estimated from heterozygosity (π) against nucleotide diversity estimated from the allele frequency of the derived (nonancestral) allele at each position (θH). Significantly negative values of H indicate an excess of high-frequency–derived alleles, consistent with recent, positive selection. Evaluating the significance of a given H depends upon the number of samples and the number of polymorphisms analyzed, so for each of the resequenced regions and in each of the three populations, we simulated 10,000 sample data sets under a standard neutral model, with constant population size and no recombination. In five out of six regions (EDAR, GCG, CLSPN, BX115137, and AW183861), significantly negative H statistics (P = 0.05) were observed in the appropriate populations, and in the AD CRTR (CTPS, FLJ23878, and SCMH1), the negative H statistic approached significance for both FLJ23878 and SCMH1 (Table 3). Taken altogether, the co-occurrence of (1) an unusually low Tajima's D, (2) an unusually high proportion of monomorphic sites in the Perlegen data (Table 2), (3) reduced absolute nucleotide diversity, and (4) the significantly negative Fay and Wu's H values suggests that selective pressure probably accounts for most of the CRTRs.

Table 3.
Fay and Wu's H in resequenced regions

Examination of the resequenced region from CLSPN (Fig. 4) revealed a coding SNP (Ser525Asn, position 10710 in Fig. 4B, dbSNP rs7537203) with extremely low serine allele frequency in the ED (4% frequency) population and extremely high serine allele frequency in the XD population (83%). This coding polymorphism was also typed in phase I of the HapMap, with similarly extreme differences in allele frequency between Asian and European populations. Close inspection of the CRTR spanning CLSPN in the ED population identified an extreme recombination hotspot conserved between all three populations at the telomeric end of the CRTR (Fig. 5), with a relative recombination rate >1000-fold above background inferred by LDhat (McVean et al. 2004). The centromeric end of the CRTR spans a larger region with modestly elevated recombination. This corresponds with the observed patterns of Tajima's D: At the telomeric boundary, normal diversity returns to average levels immediately, while at the centromeric boundary a more gradual recovery is observed.

Figure 4.
(A) A visual genotype for 1.5 Mbp spanning the CLSPN CRTR in the Perlegen data. Each row corresponds to an individual, and each column corresponds to a polymorphic site, with genotypes color coded as follows: Common allele homozygotes are shown in blue, ...
Figure 5.
A close-up of the CLSPN CRTR from the UCSC genome browser is shown, with the Tajima's D tracks as well as a set of tracks showing the inferred relative recombination rate from LDhat for each population in grayscale (track label, LDhat log RR AD/ED/XD): ...

In contrast to the CLSPN CRTR, complete resequencing of the coding regions from CTP synthase (CTPS), FLJ23878 (a predicted gene with supporting spliced EST evidence), and Sex Comb on midleg homolog 1 (SCMH1) failed to identify a single coding variant that was significantly enriched in the AD population. Also in contrast to the CLSPN CRTR, very little recombination was observed across the 300 kilobases containing these three genes in any of the three populations (|D′| = 1 in >95% of pairwise comparisons), so this region is likely to be a recombination coldspot. However, the small amount of observed recombination in the AD population revealed a clear trend across the CRTR, with the lowest Tajima's D region spanning SCMH1. Thus, if a selectively advantageous allele exists in this CRTR, it may lie within the SCMH1 transcript region, but it does not appear to be a coding polymorphism.

EDAR was selected for resequencing because it showed reduced diversity in the XD population, as well as strikingly high levels of Fst between ED and XD populations at a few SNPs. This was confirmed by resequencing, where four SNPs were observed with an allele frequency difference of >85% between populations (SNPs 173, 1663, 2531, 93981; details available at the SeattleSNPs Web site), and three other SNPs showed a difference >50% in allele frequency (SNPs 429, 1158, and 62347). Among the four SNPs with the largest allele frequency differences, the 93891 SNP changes an amino acid, but the change is quite conservative (Val370Ala): It substitutes one small, nonpolar amino acid for another and is predicted to have “benign” effects by Polyphen (Sunyaev et al. 2001). The other three SNPs with the largest allele frequency difference fell within 2000 bp of the transcription start site, suggesting a possible regulatory function for one or more of these SNPs. Although no clear CRTR was detected in this region for the ED population, the nucleotide diversity at EDAR was also significantly reduced in ED. This suggests that a CRTR smaller than our definition (<300 kb across) may exist in the ED population, consistent with a weaker selective force or an older event in the ED population.

The majority of identified CRTRs spanned more than one known gene, although 20% of the CRTRs (11 of 55 CRTRs) did not contain a “Known Gene” in the UCSC Genome Browser track. For example, the CRTR on chromosome 11 in the XD population (chr11, 37820000–38290000) did not contain any known genes or RefSeq entries and had only one spliced EST with multiple exemplars (GenBank BX115137 at chr11, 37,916,727–37,932,789). Resequencing the region spanning BX115137 confirmed the significantly low diversity in this region (Tajima's D= –2.60 in the XD population) (Table 2), thereby confirming findings from the Perlegen data set. This CRTR might reflect direct selective pressure upon this EST, or it might reflect selection upon a long-range regulatory element affecting expression of genes outside of the CRTR in the XD population. It is worth noting that the two of the closest genes to this CRTR are RAG1 and RAG2 (chr11, 36,546,150–36,557,871, and chr11, 36,570,070–36,576,362, respectively), which are essential for adaptive immunity through the rearrangement of T Cell Receptor genes (Fugmann 2001). Therefore, it is possible that long-range regulatory elements affecting the function of these genes could be subject to strong selective pressure.

Discussion

Identifying regions of the human genome that have experienced substantial selective pressure can provide insights into the location of functionally important polymorphisms and may help prioritize targets for association mapping (Sabeti et al. 2002; Clark et al. 2003b). This is especially true in genomic regions where selective pressure has been experienced in a geographically restricted manner, where large allele frequency differences in functional variation can exist between geographic subpopulations (Akey et al. 2004; Stajich and Hahn 2005). We demonstrated a significant correlation between Tajima's D in full resequencing data from SeattleSNPs and dense genotyping data from Perlegen. This correlation was exploited to identify large regions of the genome where strong selective pressures have recently been experienced by at least one of the three populations surveyed: AD, ED, and XD. A 100-kbp sliding window analysis was used to identify CRTRs in each population, representing large regions with SFS enriched for low-frequency polymorphisms. We selected eight candidate genes or ESTs within a subset of these CRTRs for resequencing analysis, and although the parameters used to define a CRTR were not theoretically derived, they appear to be conservative in detecting positive selective pressure, given the resequencing results. Thus, we demonstrate that SFSs from dense genotyping data are a useful way to screen the genome for regions that have probably experienced recent and strong selective pressure.

The observed distributions of the windowed Tajima's D values were remarkably similar in the ED and XD populations (Fig. 3), suggesting not only that the demographic histories of these populations are similar, as has previously been suggested (Yu et al. 2001; Marth et al. 2004), but that the degree of selective pressure on these populations may also have been similar. However, the precise regions of the genome that fell into the tails of the distribution were generally not the same between these two populations. Thus, to the extent that the distribution of Tajima's D was affected by selective pressure in these populations, the selective pressures appear to have acted on independent regions in each population, with a few exceptions, including the three ED/XD shared CRTRs identified in Table 1. In contrast to findings in the ED and XD populations, the mean Tajima's D distribution was lower, and the range of this distribution was substantially narrower in the AD population, suggesting a unique demographic and/or selective history for this population. Closer inspection of the AD distribution demonstrates less correlation between adjacent regions, as well as a more restricted range of Tajima's D, consistent with reports of shorter-range LD in this population attributable to a larger effective population size (Harpending and Rogers 2000; Carlson et al. 2003; Marth et al. 2004).

Data from these three major populations showed significant differences in the quantity of CRTRs between populations. Overall, relatively few CRTRs were observed in the AD population, and the observed CRTRs were generally smaller than were those in the ED and XD populations. This could be due to less dramatic selective pressure on African populations in recent evolutionary history, but we consider it unlikely that selective pressure from pathogens or diet is substantially weaker in this population. If anything, the pathogen load might be expected to be highest in regions where humans have lived the longest, although the degree of mortality from pathogens may have been attenuated. Alternatively, admixed European chromosomes might serve to obscure Africa-specific selective sweeps, but this possibility also seems unlikely because Fst is low and few if any polymorphisms have fixated between these populations. Thus, admixture between European and African populations tends to reduce Tajima's D by increasing the number of relatively rare SNPs, and admixture from the European population should actually have enhanced detection of AD-specific CRTRs. Demographic parameters such as a population bottleneck in the Eurasian populations (Marth et al. 2004) may also have enhanced detection of selective events in the non-African populations. Also, the higher average Tajima's D in ED populations may simply facilitate the identification of CRTRs in ED, relative to the AD population. A final possibility is that the relatively larger effective population size of the AD population allows greater opportunity for recombination between a selectively advantageous allele and neighboring regions during the course of a selective sweep. In a larger population, the ancestral haplotype bearing the advantageous allele is exposed to recombination in a larger number of individuals, and therefore, the segment of ancestral haplotype that eventually fixates will be shorter than in a smaller population. This hypothesis is supported by the observation that several genes with reportedly significant negative Tajima's D values in the AD population (FY [Hamblin and Di Rienzo 2000; Hamblin et al. 2002] and APOA2 [Fullerton et al. 2002]) were not observed within CRTRs in the AD population at a 1% empiric threshold, or at an even less stringent 5% threshold. Taken together with the relative infrequency of CRTRs in the AD population, this suggests that CRTRs in the AD population may cover shorter segments of the genome than in the ED and XD populations, and will therefore require smaller windows for detection. As the density of genotyping data sets increases (e.g., phase 2 of the HapMap will type millions of additional SNPs, to add to the more than one million SNPs typed in phase 1) (Gibbs et al. 2003), the sliding window size of this type of analysis can be reduced, and smaller CRTRs may become detectable in African populations.

Genes within the CRTR regions may provide important targets for genotype/phenotype studies. For example, CYP3A4 and CYP3A5 play a central role in the metabolism of some prescribed drugs, lie within a CRTR in the ED population, and have been shown to have significantly low Tajima's D in European samples (Thompson et al. 2004). Another example is VKORC1, which has been linked to human warfarin dosing and shown to have low haplotype diversity in Asian populations (Rieder et al. 2005), and is located in a CRTR on chromosome 16 in the XD population. Although we report CRTRs at a 1% empiric threshold, several previously reported genes with significantly negative Tajima's D in European populations could be identified in CRTRs at a less stringent 5% empiric threshold. For example, lactase (LCT) lies within a ~1 Mbp CRTR in the ED population at the 5% threshold and has been suggested as a target of selective pressure in the European population (Bersaglieri et al. 2004). Furthermore, a cluster of genes previously suggested as subject to natural selection in Europeans (KEL, TRPV5, TRPV6, and EPHB6) (Akey et al. 2004; Stajich and Hahn 2005) exhibited low Tajima's D in the ED population, but not across a large enough segment to meet our definition of a CRTR. Based on this combined evidence, CRTRs observed at the more stringent 1% empiric threshold (Table 1) seem quite likely to represent selective sweeps.

Resequencing of candidate genes within a number of the CRTRs provides further support for this hypothesis: In every CRTR selected for resequencing, at least one gene with a dramatic departure from the expected distribution of Tajima's D under neutrality was observed (Table 2), and in most CRTRs, a significant departure from the expected distribution of Fay and Wu'sH was also observed (Table 3). In addition to theoretical evidence, the genes resequenced in the CRTRs also fell within the bottom 5% of the empirical distribution of Tajima's D in >170 genes resequenced by SeattleSNPs (Fig. 1). For example, the resequencing-based Tajima's D for SCMH1 was the lowest Tajima's D value that we have ever observed in the AD population, compared with data from 179 resequenced genes. Furthermore, the Tajima's D for genes selected from ED CRTRs was similar to the Tajima's D for genes previously demonstrated to be robustly incompatible with neutrality in several studies (Akey et al. 2004; Stajich and Hahn 2005).

Within each CRTR, it is apparent that a single common haplotype has recently increased dramatically in frequency, at the expense of all other haplotypes within the CRTR. However, it is not yet clear whether the fitness advantage is attributable to a genotype at a single SNP or a haplotype of multiple SNPs. Haplotype effects are more plausible when a single transcript spans the majority of a CRTR (e.g., PHKB on chromosome 16 in XD) or across regions containing groups of functionally related genes (e.g., the Olfactory Receptor gene cluster contained within the AD CRTR on chromosome 11). However, the majority of CRTRs contain multiple genes without clearly related gene functions, so we have not pursued analyses of gene function within the CRTRs, because it is likely that many if not most of the genes within the CRTRs simply represent hitchhiking events where the advantageous allele within a single gene swept the neighboring genes along with it as it increased in frequency (Fay and Wu 2000). Identification of viable candidates for these selective effects would at a minimum require complete resequencing of all functional regions within a CRTR and follow-up of any potentially interesting variants. As more complete resequencing data become available within each CRTR, other SFS test statistics, such as the Fay and Wu's H test, can potentially be applied to each CRTR to narrow the candidate interval containing the advantageous variant.

The pattern of concordance for CRTRs between ED and XD populations was also interesting: An overlap of three CRTRs between these populations is quite striking, given that the CRTRs comprise <1% of the genome. Given that Tajima's D values from the resequencing data are in the range of genes previously reported to be inconsistent with neutrality under a range of demographic parameters, we believe that the shared CRTRs probably represent shared selective pressures between these populations. However, not all shared CRTRs necessarily represent a single selective sweep in multiple populations. For example, Tajima's D was significantly low at EDAR in both XD and ED populations, but the genomic extent of the CRTR is substantially greater in XD than ED populations. This could represent sweeps that occurred at different times historically, but the extreme Fst at a series of polymorphisms in EDAR is consistent with either a divergent sweep with one haplotype favored in ED and a different haplotype favored in XD, or parallel sweeps favoring an allele shared by both haplotypes but not by other African haplotypes (e.g., site 96563 in the 3′ UTR, rs1478517). No CRTRs were observed to be shared between all three populations, but this would be consistent with the ascertainment bias toward high-frequency SNPs: If no high-frequency SNPs exist in any of the three populations, then no SNPs were available for use in the Perlegen data. Therefore, although global CRTRs were not observed, such regions may be present as large regions without genotype data in the Perlegen data set.

Considering each of the regions resequenced, identification of the specific SNP or SNPs conferring a selective advantage is not trivial. For example, although the patterns of SFS suggest that SCMH1 is likely to harbor the advantageous allele responsible for the selective sweep that created the CRTR in the AD population, no coding polymorphism was identified in the AD population with significantly enriched allele frequency in either SCMH1 or the other two genes. Given that we resequenced all of the known coding regions in this CRTR, if a polymorphism within SCMH1 drove the sweep, then the function of the polymorphism was probably regulatory rather than structural. In contrast, the CLSPN and EDAR resequencing data identified interesting candidate cSNPs, and selective pressure on these cSNPs could conceivably account for the EDAR and CLSPN CRTRs. More extensive resequencing within each CRTR is required to determine whether other candidate SNPs exist in neighboring genes.

In conclusion, the availability of adequately dense genotyping data sets clearly facilitates the identification of regions of the human genome with unusual SFS, which may have been subjected to strong positive selective pressure in the recent past. Current data appear to be adequate to identify such regions in ED and CD populations, but denser data will be necessary for analysis of AD populations, probably due to the larger effective population size of this population. Detection of regions subject to balancing selection (e.g., HLA) or with less complete selective sweeps (e.g., FY) will probably require a substantially denser data set than is currently available. Although most CRTRs span multiple genes, within each CRTR the selective sweep favored only one haplotype at the expense of all others, so a single selectively advantageous polymorphism in a single gene could conceivably account for each CRTR, with the reduced diversity in flanking regions representing a hitchhiking event. Dissection of the underlying functional variant (or variants) within each CRTR may require comprehensive resequencing within the CRTR to identify candidate functional variation, but where it is feasible, functional analysis of a priori functional variants (e.g., the CLSPN Ser525Asn cSNP) should substantially accelerate this process.

Methods

Samples

Twenty-four individuals from each of three populations were resequenced: 24 African American individuals from the Coriell HD100AA diversity panel (population AD), 24 CEPH individuals (population ED), and 24 Chinese Americans from the Coriell HD100A diversity panel (population XD). All ED individuals overlap with the Perlegen European panel (dbSNP population 1371); all but one of the AD individuals overlap with the Perlegen African American panel (dbSNP population 1372); and all XD individuals overlap with the Perlegen Chinese panel (dbSNP population 1373). Coriell accession numbers are as follows: AD population, NA17101–NA17116, NA17133–NA17140; ED population, NA06990, NA07019, NA07348, NA07349, NA10830, NA10831, NA10842, NA10843, NA10844, NA10845, NA10848, NA10850, NA10851, NA10852, NA10853, NA10854, NA10857, NA10858, NA10860, NA10861, NA12547, NA12548, NA12560, NA17201; and XD population, NA17733–NA17747, NA17749, NA17752–NA17757, NA17759, NA17761.

Perlegen data

All genotype data for populations 1371 (ED), 1372 (AD), and 1373 (XD) from build 124 of dbSNP were downloaded and parsed for analysis. Several quality controls were applied to the data set to verify that the data were completely and accurately retrieved. Comparison of the 46 samples overlapping between Perlegen and SeattleSNPs identified 95,354 concordant genotypes out of 96,170 genotypes compared (>99.2%; consistent with Hinds et al. 2005). We also observed good concordance between the total number of Perlegen SNPs retrieved from dbSNP (1.58 million) and the numbers reported by Hinds et al. The Perlegen SNP class data for each SNP were downloaded for each autosome from the Perlegen Web site (http://genome.perlegen.com/browser/download.html).

Sequencing analysis

Methods for sequence analysis of candidate genes obtained from SeattleSNPs (http://pga.gs.washington.edu) have previously been described in detail (Carlson et al. 2003, 2004; Livingston et al. 2004). Briefly, PCR amplicons were tiled across the full genomic sequence of a gene or selected genomic regions and sequenced by using standard BigDye Terminator v3.1 methods. The amplification primers and sequencing protocol are available at http://pga.gs.washington.edu. Sequence data from the ABI 3730XL instrument were base called with Phred (Ewing et al. 1998) and assembled onto a reference sequence with the add reads feature in Consed (Gordon et al. 1998). Polymorphisms were identified by PolyPhred v4.29 (Nickerson et al. 1997), and Consed was used to visualize the sequences and confirm polymorphisms.

Nucleotide diversity analysis

There are several statistics that can be used to describe nucleotide diversity, including θs (equation 1), π (equation 2), and θH (equation 3). These statistics can be calculated for a given resequencing data set by using the following parameters: n is the number of chromosomes resequenced, Sn is the number of polymorphic sites observed, pi is the derived (nonancestral) allele frequency of the ith SNP, and qi is the ancestral allele frequency of the ith SNP.

equation M1
(1)

equation M2
(2)

equation M3
(3)

There are many statistics that can evaluate departures from the expected patterns of neutral variation. One of these is Tajima's D (Tajima 1989), equation 4:

equation M4
(4)

The theoretical distribution of Tajima's D (95% confidence interval between –2 and +2) assumes that polymorphism ascertainment is independent of allele frequency. High values of Tajima's D suggest an excess of common variation in a region, which can be consistent with balancing selection or population contraction. Negative values of Tajima's D, on the other hand, indicate an excess of rare variation, consistent with population growth, or positive selection. Population admixture can lead to either high or low Tajima's D values in theory. Demographic parameters would be expected to affect the genome more evenly than selective pressures, so previous analyses have suggested that using the empiric distribution of Tajima's D from a collection of regions across the genome provides advantages in assessing whether selection or demography might explain an observed deviation from expectation (Akey et al. 2004; Stajich and Hahn 2005). Because of the ascertainment bias toward common polymorphism in the Perlegen data set, extremely positive Tajima's D values are difficult to interpret, and modeling ascertainment is difficult. However, given that the ascertainment bias raises the mean of the distribution, extreme negative values in extended regions can be useful in qualitatively identifying interesting regions for full resequencing and more rigorous theoretical analysis of nucleotide diversity.

For genic comparisons between SeattleSNPs data and Perlegen data, Tajima's D in the SeattleSNPs data was calculated for all observed polymorphic sites. The median transcript size in the SeattleSNPs data set is 14,649 bp, and on average, an additional 3000 bp of flanking sequence was also resequenced, for a median analyzed region of 17.5 kbp. Given the density of the Perlegen map, the median number of polymorphic SNPs in the resequenced region was only six in the ED population and seven in the AD population, a rather small number of polymorphisms for Tajima's D estimation. One hundred nineteen out of 179 genes analyzed in the AD population had five or more polymorphic Perlegen SNPs within the resequenced region, and in the ED population, 107 genes had five or more polymorphic Perlegen SNPs. Significant linkage disequilibrium routinely extends 10 kb, even in the AD population, so patterns of nucleotide diversity should be conserved over similar distances. We extended the region analyzed from the Perlegen data to include 10 kb upstream and 10 kb downstream of the transcript, under the expectation that this would increase the number of sites per gene and therefore the accuracy of the Tajima's D estimate. Expanding the region raised the median number of polymorphic sites per gene to 16 in ED and 19 in AD. As expected, the larger number of sites per gene improved the correlation between the SeattleSNPs and Perlegen data substantially. In the AD population, R2 = 0.29 in extended versus R2 = 0.04 in the transcript for the 119 genes with five or more polymorphic SNPs in the transcript, and in the ED population, R2 = 0.66 in extended versus R2 = 0.15 in transcript for the 107 genes with five or more polymorphic SNPs in the transcript. Thus, for the genic comparison in Figure 1, Tajima's D in the Perlegen data was calculated on the basis of all observed polymorphic sites within 10 kb of the longest reported transcript in Entrez Gene. Within each population, only genes with five or more polymorphic SNPs in the Perlegen data were included in the comparison of data sets, which yielded 178 genes in the AD population and 173 genes in the ED population.

For the windowed analysis of the genome, Tajima's D was calculated independently in each population. Sliding windows of 100 kb were analyzed across all autosomal regions in the Perlegen data, stepping by 10 kb. Thus, the first window evaluated on chromosome 1 was genome coordinates chr1, 1–100,000; the second window was genome coordinates chr1, 10,001–110,000; and so forth. Because adjacent windows overlap, in the genome browser track, the Tajima's D is reported for each window using the coordinates of the central 10 kb. Thus, the observed Tajima's D for window chr1, 1–100,000, is reported at chr1, 45,001–55,000. These data have been made available as a track in the UCSC genome browser (http://genome.ucsc.edu/).

The empirically determined distribution of Tajima's D within the sliding windows was used to identify CRTRs, defined as a region of ≥20 contiguous windows where >75% of the windows were in the bottom 1% of the empirical Tajima's distribution. The empirically determined bottom 1% of Tajima's D values corresponded to Tajima's D <–0.70 in the AD population, Tajima's D<–1.49 in the ED population, and Tajima's D<–1.743 in the XD population. Window size was chosen to provide a reasonably large number of SNPs per window (average 54.3 SNPs per window in AD, 47.1 in ED, and 42.8 in XD). Under the restriction that 75% of all windows must be below threshold within a contiguous region, the distribution of contiguous region sizes for each population is shown in Supplemental Figure 1: In each population, <10% of all such contiguous regions exceeded 20 windows in length, so we used a threshold of 20 adjacent windows to define a CRTR. Given that each window overlaps its neighbor by 90,000 bp, a stretch of 20 contiguous windows corresponds to 300,000 bp.

Fay and Wu's H analysis

The chimpanzee allele was used to determine the ancestral human allele within the eight resequenced regions. Only SNPs where the chimpanzee alignment was unambiguous and matched one of the existing human alleles were used in this analysis. In each population, 24 samples were resequenced, representing 48 chromosomes, so Fay and Wu's H (H= π – θH) was calculated within each population and then compared against simulated data. Simulations were run for each region in each of the three populations, under the conservative assumption of no recombination. Simulations were performed by using mksamples (Hudson 2002), assuming 48 chromosomes and the appropriate number of SNPs (Sn) with ancestral allele information (Table 3). For each analysis, 10,000 simulations were performed, and the proportion of simulations with more negative H than the observed data (equivalent to a P-value) is shown in Table 3.

Recombination rate analysis

Recombination was addressed in two ways: Large-scale recombination rates for the region spanning each CRTR were estimated from the deCODE sex averaged recombination rate (Kong et al. 2002) for the flanking megabase, when the CRTR fell within a single deCODE interval, and by weighted averages of the flanking megabase when the CRTR spanned a boundary between deCODE intervals. At a much finer scale, the relative recombination rate per nucleotide was estimated across 3 Mb flanking the CLSPN CRTR by using the pairwise option within the LDhat program (McVean et al. 2004). This program first estimates the coalescent likelihood of observing each pair of segregating sites, treating each pair as independent, and then estimates the recombination rate for the entire region over a grid. Polymorphism data from chr1, 34,500,000–37,500,000, was analyzed independently for each population. The estimated fine-scale recombination rates across this region for each population are shown in Figure 5.

Acknowledgments

This work was supported by a Program for Genomic Applications grant from the National Heart, Lung, and Blood Institute (HL66682 and HL66642 to D.N. and M.R.). D.T. was supported by grants from the National Human Genome Research Institute (IP41HG02371 and HG02238 to David Haussler). We thank Dana Crawford, Alex Reiner, and Eric Torskey for comments on the manuscript, as well as the entire SeattleSNPs resequencing team for their extraordinary efforts on this project.

Notes

[Supplemental material is available online at www.genome.org.]

Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.4326505. Freely available online through the Genome Research Immediate Open Access option.

References

  • Akey, J.M., Eberle, M.A., Rieder, M.J., Carlson, C.S., Shriver, M.D., Nickerson, D.A., and Kruglyak, L. 2004. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2: e286. [PMC free article] [PubMed]
  • Bersaglieri, T., Sabeti, P.C., Patterson, N., Vanderploeg, T., Schaffner, S.F., Drake, J.A., Rhodes, M., Reich, D.E., and Hirschhorn, J.N. 2004. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74: 1111–1120. [PMC free article] [PubMed]
  • Carlson, C.S., Eberle, M.A., Rieder, M.J., Smith, J.D., Kruglyak, L., and Nickerson, D.A. 2003. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nat. Genet. 33: 518–521. [PubMed]
  • Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., and Nickerson, D.A. 2004. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74: 106–120. [PMC free article] [PubMed]
  • Clark, A.G., Glanowski, S., Nielsen, R., Thomas, P., Kejariwal, A., Todd, M.J., Tanenbaum, D.M., Civello, D., Lu, F., Murphy, B., et al. 2003a. Positive selection in the human genome inferred from human–chimp–mouse orthologous gene alignments. Cold Spring Harb. Symp. Quant. Biol. 68: 471–477. [PubMed]
  • Clark, A.G., Glanowski, S., Nielsen, R., Thomas, P., Kejariwal, A., Todd, M.J., Tanenbaum, D.M., Civello, D., Lu, F., Murphy, B., et al. 2003b. Inferring nonneutral evolution from human–chimp–-mouse orthologous gene trios. Science 302: 1960–1963. [PubMed]
  • Ewens, W.J. 1972. The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3: 87–112. [PubMed]
  • Ewens, W.J. 1979. Mathematical population genetics. Springer-Verlag, New York.
  • Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base-calling of automated sequencer traces using phred, I: Accuracy assessment. Genome Res. 8: 175–185. [PubMed]
  • Fay, J.C. and Wu, C.I. 2000. Hitchhiking under positive Darwinian selection. Genetics 155: 1405–1413. [PMC free article] [PubMed]
  • Fu, Y.X. and Li, W.H. 1993. Statistical tests of neutrality of mutations. Genetics 133: 693–709. [PMC free article] [PubMed]
  • Fugmann, S.D. 2001. RAG1 and RAG2 in V(D)J recombination and transposition. Immunol. Res. 23: 23–39. [PubMed]
  • Fullerton, S.M., Clark, A.G., Weiss, K.M., Taylor, S.L., Stengard, J.H., Salomaa, V., Boerwinkle, E., and Nickerson, D.A. 2002. Sequence polymorphism at the human apolipoprotein AII gene (APOA2): Unexpected deficit of variation in an African-American sample. Hum. Genet. 111: 75–87. [PubMed]
  • Gibbs, R.A., Belmont, J.W., Hardenbol, P., Willis, T.D., Yu, F., Yang, H., Ch'ang, L.Y., Huang, W., Liu, B., Shen, Y., et al. 2003. The International HapMap Project. Nature 426: 789–796.
  • Gordon, D., Abajian, C., and Green, P. 1998. Consed: A graphical tool for sequence finishing. Genome Res. 8: 195–202. [PubMed]
  • Hamblin, M.T. and Di Rienzo, A. 2000. Detection of the signature of natural selection in humans: Evidence from the Duffy blood group locus. Am. J. Hum. Genet. 66: 1669–1679. [PMC free article] [PubMed]
  • Hamblin, M.T., Thompson, E.E., and Di Rienzo, A. 2002. Complex signatures of natural selection at the Duffy blood group locus. Am. J. Hum. Genet. 70: 369–383. [PMC free article] [PubMed]
  • Harpending, H. and Rogers, A. 2000. Genetic perspectives on human origins and differentiation. Annu. Rev. Genomics Hum. Genet. 1: 361–385. [PubMed]
  • Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.G., Frazer, K.A., and Cox, D.R. 2005. Whole-genome patterns of common DNA variation in three human populations. Science 307: 1072–1079. [PubMed]
  • Hudson, R.R. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337–338. [PubMed]
  • Hughes, A.L. and Yeager, M. 1998. Natural selection and the evolutionary history of major histocompatibility complex loci. Front. Biosci. 3: d509–d516. [PubMed]
  • Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. 2002. The human genome browser at UCSC. Genome Res. 12: 996–1006. [PMC free article] [PubMed]
  • Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, UK.
  • Kong, A., Gudbjartsson, D.F., Sainz, J., Jonsdottir, G.M., Gudjonsson, S.A., Richardsson, B., Sigurdardottir, S., Barnard, J., Hallbeck, B., Masson, G., et al. 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31: 241–247. [PubMed]
  • Libert, F., Cochaux, P., Beckman, G., Samson, M., Aksenova, M., Cao, A., Czeizel, A., Claustres, M., de la Rua, C., Ferrari, M., et al. 1998. The deltaccr5 mutation conferring protection against HIV-1 in Caucasian populations has a single and recent origin in Northeastern Europe. Hum. Mol. Genet. 7: 399–406. [PubMed]
  • Livingston, R.J., von Niederhausern, A., Jegga, A.G., Crawford, D.C., Carlson, C.S., Rieder, M.J., Gowrisankar, S., Aronow, B.J., Weiss, R.B., and Nickerson, D.A. 2004. Pattern of sequence variation across 213 environmental response genes. Genome Res. 14: 1821–1831. [PMC free article] [PubMed]
  • Marth, G.T., Czabarka, E., Murvai, J., and Sherry, S.T. 2004. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166: 351–372. [PMC free article] [PubMed]
  • McVean, G.A., Myers, S.R., Hunt, S., Deloukas, P., Bentley, D.R., and Donnelly, P. 2004. The fine-scale structure of recombination rate variation in the human genome. Science 304: 581–584. [PubMed]
  • Nickerson, D.A., Tobe, V.O., and Taylor, S.L. 1997. PolyPhred: Automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res. 25: 2745–2751. [PMC free article] [PubMed]
  • Nielsen, R., Hubisz, M.J., and Clark, A.G. 2004. Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data. Genetics 168: 2373–2382. [PMC free article] [PubMed]
  • Nielsen, R., Bustamante, C., Clark, A.G., Glanowski, S., Sackton, T.B., Hubisz, M.J., Fledel-Alon, A., Tanenbaum, D.M., Civello, D., White, T.J., et al. 2005. A Scan for positively selected genes in the genomes of humans and chimpanzees. PLoS. Biol. 3: e170. [PMC free article] [PubMed]
  • Rieder, M.J., Reiner, A.P., Gage, B.F., Nickerson, D.A., Eby, C.S., McLeod, H.L., Blough, D.K., Thummel, K.E., Veenstra, D.L., and Rettie, A.E. 2005. Effect of VKORC1 haplotypes on transcriptional regulation and warfarin dose. N. Engl. J. Med. 352: 2285–2293. [PubMed]
  • Sabeti, P.C., Reich, D.E., Higgins, J.M., Levine, H.Z., Richter, D.J., Schaffner, S.F., Gabriel, S.B., Platko, J.V., Patterson, N.J., McDonald, G.J., et al. 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832–837. [PubMed]
  • Seltsam, A., Hallensleben, M., Kollmann, A., and Blasczyk, R. 2003. The nature of diversity and diversification at the ABO locus. Blood 102: 3035–3042. [PubMed]
  • Smith, J.M. and Haigh, J. 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35. [PubMed]
  • Stajich, J.E. and Hahn, M.W. 2005. Disentangling the effects of demography and selection in human history. Mol. Biol. Evol. 22: 63–73. [PubMed]
  • Straus, D.S. and Taylor, C.E. 1981. Hitchhiking and linkage disequilibrium between hemoglobin S and nearby restriction sites. Hum. Hered. 31: 348–352. [PubMed]
  • Sunyaev, S., Ramensky, V., Koch, I., Lathe III, W., Kondrashov, A.S., and Bork, P. 2001. Prediction of deleterious human alleles. Hum. Mol. Genet. 10: 591–597. [PubMed]
  • Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585–595. [PMC free article] [PubMed]
  • Thompson, E.E., Kuttab-Boulos, H., Witonsky, D., Yang, L., Roe, B.A., and Di Rienzo, A. 2004. CYP3A variation and the evolution of salt-sensitivity variants. Am. J. Hum. Genet. 75: 1059–1069. [PMC free article] [PubMed]
  • Thornton, K. 2005. Recombination and the properties of Tajima's D in the context of approximate likelihood calculation. Genetics (in press). [PMC free article] [PubMed]
  • Verrelli, B.C., McDonald, J.H., Argyropoulos, G., Destro-Bisol, G., Froment, A., Drousiotou, A., Lefranc, G., Helal, A.N., Loiselet, J., and Tishkoff, S.A. 2002. Evidence for balancing selection from nucleotide sequence analyses of human G6PD. Am. J. Hum. Genet. 71: 1112–1128. [PMC free article] [PubMed]
  • Watterson, G.A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7: 256–276. [PubMed]
  • Weir, B.S. and Cockerham, C.C. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38: 1358–1370.
  • Williamson, S.H., Hernandez, R., Fledel-Alon, A., Zhu, L., Nielsen, R., and Bustamante, C.D. 2005. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. 102: 7882–7887. [PMC free article] [PubMed]
  • Yu, N., Zhao, Z., Fu, Y.X., Sambuughin, N., Ramsay, M., Jenkins, T., Leskinen, E., Patthy, L., Jorde, L.B., Kuromori, T., et al. 2001. Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. Mol. Biol. Evol. 18: 214–222. [PubMed]

Web site references


Articles from Genome Research are provided here courtesy of Cold Spring Harbor Laboratory Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • EST
    EST
    Published EST sequences
  • Gene (nucleotide)
    Gene (nucleotide)
    Records in Gene identified from shared sequence links
  • MedGen
    MedGen
    Related information in MedGen
  • Nucleotide
    Nucleotide
    Published Nucleotide sequences
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...