- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# A Note on Exact Tests of Hardy-Weinberg Equilibrium

^{1}Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor; and

^{2}Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore

## Abstract

Deviations from Hardy-Weinberg equilibrium (HWE) can indicate inbreeding, population stratification, and even problems in genotyping. In samples of affected individuals, these deviations can also provide evidence for association. Tests of HWE are commonly performed using a simple χ^{2} goodness-of-fit test. We show that this χ^{2} test can have inflated type I error rates, even in relatively large samples (e.g., samples of 1,000 individuals that include ~100 copies of the minor allele). On the basis of previous work, we describe exact tests of HWE together with efficient computational methods for their implementation. Our methods adequately control type I error in large and small samples and are computationally efficient. They have been implemented in freely available code that will be useful for quality assessment of genotype data and for the detection of genetic association or population stratification in very large data sets.

In the absence of migration, mutation, natural selection, and assortative mating, genotype frequencies at any locus are a simple function of allele frequencies. This phenomenon, now termed “Hardy-Weinberg equilibrium” (HWE), was first described in the early part of the twentieth century (Hardy 1908; Weinberg 1908). The original descriptions of HWE are an important landmark in the history of population genetics (Crow 1988), and it is now common practice to check whether observed genotypes conform to Hardy-Weinberg expectations. These expectations appear to hold for most human populations, and deviations from HWE at particular markers may suggest problems with genotyping or population structure or, in samples of affected individuals, an association between the marker and disease susceptibility.

Here, we describe efficient implementations of exact tests for HWE, which are suitable for use in large-scale studies of SNP data, even when hundreds of thousands of markers are examined. The availability of data on patterns of linkage disequilibrium across the genome (International HapMap Consortium 2003), interest in identifying susceptibility alleles for complex diseases (Cardon and Abecasis 2003), and advances in genotyping technology (Kwok 2001; Weber and Broman 2001) suggest that such large studies will be increasingly common. The principles and procedures used for testing HWE are well established (Levene 1949; Haldane 1954; Hernandez and Weir 1989; Wellek 2004), but the lack of a publicly available, efficient, and reliable implementation for exact tests has led many scientists to rely on asymptotic tests that can perform poorly with realistic sample sizes.

Consider a sample of SNP genotypes for *N* unrelated diploid individuals measured at an autosomal locus. The sample includes 2*N* alleles, including *n*_{A} copies of the rarer allele and *n*_{B} copies of the common allele. Let the number of heterozygous AB genotypes be *n*_{AB}, and note that the numbers of AA and BB homozygous genotypes are *n*_{AA}=(*n*_{A}-*n*_{AB})/2 and *n*_{BB}=(*n*_{B}-*n*_{AB})/2. Note that there are (2*N*)!/*n*_{A}!*n*_{B}! possible arrangements for the alleles in the sample and that 2^{nAB}*N*!/(*n*_{AA}!*n*_{AB}!*n*_{BB}!) of these arrangements correspond to exactly *n*_{AB} heterozygotes. Thus, under the assumption of HWE, the probability of observing exactly *n*_{AB} heterozygotes in a sample of *N* individuals with *n*_{A} minor alleles is

This equation holds for each possible number of heterozygotes, *n*_{AB}. When *n*_{A} is odd, possible numbers of heterozygotes are 1, 3, 5,…,*n*_{A}. When *n*_{A} is even, possible numbers of heterozygotes are 0, 2, 4,…,*n*_{A}. The expression for *P*(*n*_{AB}|*N*,*n*_{A}) given in equation (1) leads to natural tests for HWE. For example, one could define one-sided tests that focus on detection of a deficit of heterozygotes, by calculating the statistic *P*_{low}=*P*(*N*_{AB}*n*_{AB}|*N*,*n*_{A}), or detection of an excess of heterozygotes, by calculating the statistic *P*_{high}=*P*(*N*_{AB}*n*_{AB}|*N*,*n*_{A}). In each case, the statistic can be calculated by simply summing over equation (1), to include all possible values of *N*_{AB} that are lower (for *P*_{low}) or higher (for *P*_{high}) than those observed in the actual data. A test for a deficit of heterozygotes in relation to Hardy-Weinberg expectations is appropriate when deviations from HWE due to inbreeding or population stratification are suspected, since both of these increase the proportion of homozygotes in the population. A test for an excess of heterozygotes is appropriate when one suspects problems in genotyping due to the existence of highly homologous regions in the genome, since these low-copy repeats often lead to an increase in the proportion of apparent heterozygotes in the sample. In other settings, it might be appropriate to use both tests. For example, many technologies score genotypes by clustering signals, and misspecified clusters can result in either vast excesses or vast deficits of heterozygotes.

When neither an increase nor a decrease in the proportion of heterozygotes is specifically expected, one could perform two separate one-sided tests or, instead, use a two-sided test statistic (Weir 1996). A natural two-sided test statistic could be defined as *P*_{2α}=*min*(1.0,2*P*_{high},2*P*_{low}). This two-sided statistic is appealing because it leads to rejection of HWE at significance level 2α in instances in which the one-sided tests lead to the rejection of HWE at significance level α. However, because of the asymmetric nature of the distribution of heterozygote counts in a sample, the statistic is quite conservative in practice, and we do not recommend its use. Instead, an appealing approach, analogous to Fisher’s exact test for contingency tables (Fisher 1934), is to calculate the probability of observing a sample configuration that is even less likely than the one being evaluated, conditional on the observed allele counts. This can be achieved using a statistic similar to the Monte Carlo statistic proposed by Guo and Thompson (1992) for multiallelic markers:

In this definition, *I*[*x*] is an indicator function that is equal to 1 when the comparison is true and equal to 0 otherwise. The sum should be performed over all heterozygote counts *n*^{*}_{AB} that are compatible with the observed number of minor alleles, *n*_{A}.

Most of the computational effort required for performing exact tests of linkage disequilibrium is spent evaluating the factorials in equation (1) for each possible value of *n*_{AB}. By use of a naive approach, evaluating equation (1) requires 5*N*–6*N* multiplications and one division for each possible value of *n*_{AB}. We simplify calculations by using the recurrence relationships previously recognized by Guo and Thompson (1992) in the implementation of their Markov chain–Monte Carlo sampler:

In this way, evaluating the probability for each possible number of heterozygotes takes only four multiplications and one division, whatever the sample size *N.* To avoid underflow, it is best to first calculate the probability of observing the expected number of heterozygotes (in this case, the most likely outcome) and then use the recurrence relationships to calculate probabilities for all other outcomes. A further reduction of computational effort is possible by noting that one need only calculate relative probabilities for each outcome and then scale these to ensure that their sum is 1.0. This means that the probability of observing the expected number of heterozygotes can be replaced with an arbitrary constant when using the recurrence relations in equation (2), provided that the final result is scaled.

Table 1 illustrates the performance of the statistics for a sample of 100 individuals in which 21 copies of the minor allele are present. The observed number of heterozygotes will vary from 1 to 21 and must be odd. Note that only a small number of distinct sample configurations are possible, and each of these is associated with a specific probability for the exact tests. If the desired significance level α does not correspond exactly to one of these discrete outcomes, then the exact test statistics will be conservative (Hernandez and Weir 1989). For example, at the significance level α=0.05, the *P*_{HWE} and *P*_{low} statistics both reject the hypothesis of HWE if 13 heterozygotes are observed in this setting. Since the probability of observing 13 heterozygotes is 0.010, the tests are conservative. In contrast, the asymptotic χ^{2} test statistic results in rejection of HWE when 15 heterozygotes are observed (for 15 heterozygotes, the χ^{2} test statistic corresponds to an asymptotic *P*.045). This results in an inflated type I error rate of 0.070 and therefore is inappropriate. In this sample, it is not possible to reject HWE because of an excess of heterozygous individuals—the probability of observing the maximum of 21 heterozygotes is 0.31, and none of the test statistics gives a *P* value <.05 for this extreme configuration. Additional examples of the performance of exact test statistics for HWE can be found in the work by Vithayasai (1973).

^{[Note]}

In general, the exact test statistics are conservative when a small number of minor-allele copies are present in the sample, but they approximate nominal significance levels as the sample size (and number of minor-allele copies) increases. In contrast, the commonly used χ^{2} statistic can produce excessively small or large *P* values for specific outcomes (Hernandez and Weir 1989). To comprehensively evaluate the performance of the χ^{2} and exact test statistics, we calculated their type I error rates for specified significance levels of α=0.05, 0.01, or 0.001, for sample sizes of *N*=100 or *N*=1,000 individuals and varying minor-allele counts. The results are summarized in figure 1 (for samples in which <25% of chromosomes carry the minor allele) and figure 2 (for samples in which >10% of chromosomes carry the minor allele), and it is clear that the statistics exhibit some periodicity in their type I error rates. As expected, both the exact *P*_{HWE} statistic and the χ^{2} statistic perform better as the sample size and minor-allele counts increase. Nevertheless, one important difference is that the χ^{2} statistic can sometimes be extremely anticonservative (e.g., in a sample of 1,000 individuals, when nominal α=0.001, the true type I error rate can exceed 0.06 and is often >0.01 for minor-allele counts <100), whereas the exact statistic never exceeds the nominal significance level. In practical settings, the χ^{2} statistic could lead to many false rejections of HWE that depend on only the particular count of minor alleles in the sample.

**...**

**...**

To understand the periodicity of the statistics, it is important to consider the discrete nature of the data. For example, for a sample of *N*=100 individuals including 2–5 copies of the minor allele, we reject HWE at the α=0.05 significance level (fig. 1*A*) when there is at least one homozygote for the minor allele. The probability of observing more than one homozygote for the minor allele increases gradually from 0.0050 when there are two copies of the allele in the sample up to 0.0499 when there are five copies of the minor allele in the sample. When there are 6–14 copies of the minor allele in the sample, we reject HWE at the α=0.05 significance level (fig. 1*A*) when at least two homozygotes for the rare allele are observed. Again, the probability of a more extreme event is quite low for small numbers of the rare allele (*P*=.0011 with six copies of the minor allele in the sample) but gradually increases if there are additional copies of the minor allele in the sample (*P*=.0482 with 13 copies of the minor allele).

In table 2, the overall type I error rates for each statistic are summarized for sample sizes of 100 or 1,000 individuals and various ranges of minor-allele counts. It is clear that, on average, the χ^{2} test approximates nominal significance levels as the number of minor alleles in the sample increases. Nevertheless, as illustrated in figure 1, this is achieved at the cost of inflated error rates for samples with specific numbers of minor alleles. Even in a sample of 1,000 individuals, the type I error rate at α = 0.001 for the χ^{2} test is inflated when there are <200 copies of the minor allele (corresponding to an allele frequency of ~10%). The exact tests approximate nominal significance levels with increasing sample size but remain conservative because of the discrete nature of the data.

^{2}Test Statistic and the

*P*

_{HWE}Test Statistic for Nominal Significance Level α = 0.01 or 0.001

^{[Note]}

As a final evaluation of our approach, we applied our method to a subset of the genotypes collected by the International HapMap Consortium (2003). We focused on a set of 18,460 SNP markers genotyped independently by two different centers with no discrepancies between the two sets of experimental results. For each of these markers, we evaluated evidence against HWE by using both the exact *P*_{HWE} statistic and the asymptotic χ^{2} statistic. Results were broadly similar for 14,889 markers with minor-allele frequencies 20%. However, we observed noticeable differences for 3,571 markers with minor-allele frequencies <20%. For example, the χ^{2} test rejected HWE for 71 of these markers at α=0.01 (twice as many as the 35 markers expected to fail this test by chance), whereas the exact test rejected HWE for only 33 markers. At the more stringent α=0.001 significance level, the χ^{2} test rejected HWE for 28 markers (rejection for 3 markers is expected by chance), whereas the exact *P*_{HWE} statistic rejected HWE for only 5 markers.

Although we focus on testing the agreement of observed genotypes with HWE proportions, computationally efficient exact tests can be constructed for any desired genotype proportions. In brief, let the expected proportion of heterozygotes be *p*_{AB} and the two homozygote proportions be *p*_{AA} and *p*_{BB}. For example, in a population with inbreeding coefficient *f,* we might expect the proportion of heterozygotes to be 2(1-*f*)*p*_{A}*p*_{B}. Define the quantity θ=*p*^{2}_{AB}/*p*_{AA}*p*_{BB} so that θ=4 when HWE holds. Then, the probability of observing *n*_{AB} heterozygotes is

where

(Wellek 2004). It is simple to verify that the recurrence relationships given in equation (2) can be extended to this setting by replacing the number 4 with the quantity θ in each expression.

The exact test statistics for HWE described here are accurate for a variety of allele frequencies and can be computed in an inexpensive manner. We recommend that they be used instead of the standard χ^{2} test statistic in all situations. For large data sets, rather than fixing an arbitrary threshold for rejecting HWE, we suggest that methods based on the false-discovery rate (Benjamini and Hochberg 1995) be used to identify a subset of markers whose genotypes do not conform to the expected equilibrium distribution.

The *P*_{HWE} test statistic described here is implemented in the Pedstats software package (see Pedstats Web site), which generates summaries and checks the integrity of genetic data. In addition, code for calculating *P*_{low}, *P*_{high}, and *P*_{HWE} in C/C++, R, and Fortran is available from the authors’ Web site. With appropriate citation, our code is freely available for use and can be incorporated into other programs. The HapMap Project genotype data are freely available at the HapMap Web site.

## Acknowledgments

We gratefully acknowledge grant support from the National Human Genome Research Institute and the National Eye Institute. The manuscript was improved by helpful comments from reviewers.

## Electronic-Database Information

The URLs for data presented herein are as follows:

## References

**American Society of Human Genetics**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (537K)

- Detection of genotyping errors and pseudo-SNPs via deviations from Hardy-Weinberg equilibrium.[Genet Epidemiol. 2005]
*Leal SM.**Genet Epidemiol. 2005 Nov; 29(3):204-14.* - Testing departure from Hardy-Weinberg proportions.[Methods Mol Biol. 2012]
*Wang J, Shete S.**Methods Mol Biol. 2012; 850:77-102.* - Adapting the logical basis of tests for Hardy-Weinberg Equilibrium to the real needs of association studies in human and medical genetics.[Genet Epidemiol. 2009]
*Goddard KA, Ziegler A, Wellek S.**Genet Epidemiol. 2009 Nov; 33(7):569-80.* - [Hardy-Weinberg equilibrium in genetic epidemiology].[Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2010]
*Liu H, Hu Y.**Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2010 Jan; 35(1):90-3.* - [Application of chi-square test and exact test in Hardy-Weinberg equilibrium testing].[Fa Yi Xue Za Zhi. 2004]
*Huang DX, Yang QE.**Fa Yi Xue Za Zhi. 2004; 20(2):116-9.*

- A pilot study of genetic variants in dopamine regulators with indoor tanning and melanoma[Experimental dermatology. 2013]
*Flores KG, Erdei E, Luo L, White KA, Leng S, Berwick M, Lazovich D.**Experimental dermatology. 2013 Sep; 22(9)576-581* - Genetic polymorphisms and haplotypes of the organic cation transporter 1 gene (SLC22A1) in the Xhosa population of South Africa[Genetics and Molecular Biology. 2014]
*Jacobs C, Pearce B, Du Plessis M, Hoosain N, Benjeddou M.**Genetics and Molecular Biology. 2014 Jun; 37(2)350-359* - Single Nucleotide Polymorphisms of NLRP12 Gene and Association with Non-specific Digestive Disorder in Rabbit[Asian-Australasian Journal of Animal Scienc...]
*Liu YF, Zhang GW, Xiao ZL, Yang Y, Deng XS, Chen SY, Wang J, Lai SJ.**Asian-Australasian Journal of Animal Sciences. 2013 Aug; 26(8)1072-1079* - A map of human microRNA variation uncovers unexpectedly high levels of variability[Genome Medicine. ]
*Carbonell J, Alloza E, Arce P, Borrego S, Santoyo J, Ruiz-Ferrer M, Medina I, Jiménez-Almazán J, Méndez-Vidal C, González-del Pozo M, Vela A, Bhattacharya SS, Antiñolo G, Dopazo J.**Genome Medicine. 4(8)62* - GADD45a Promoter Regulation by a Functional Genetic Variant Associated with Acute Lung Injury[PLoS ONE. ]
*Mitra S, Wade MS, Sun X, Moldobaeva N, Flores C, Ma SF, Zhang W, Garcia JG, Jacobson JR.**PLoS ONE. 9(6)e100169*

- PubMedPubMedPubMed citations for these articles