- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# Rational Inferences about Departures from Hardy-Weinberg Equilibrium

^{1}Human Genetics and

^{2}Medicine, The University of Chicago, Chicago

## Abstract

Previous studies have explored the use of departure from Hardy-Weinberg equilibrium (DHW) for fine mapping Mendelian disorders and for general fine mapping. Other studies have used Hardy-Weinberg tests for genotyping quality control. To enable investigators to make rational decisions about whether DHW is due to genotyping error or to underlying biology, we developed an analytic framework and software to determine the parameter values for which DHW might be expected for common diseases. We show analytically that, for a general disease model, the difference between population and Hardy-Weinberg–expected genotypic frequencies (Δ) at the susceptibility locus is a function of the susceptibility-allele frequency (*q*), heterozygote relative risk (β), and homozygote relative risk (γ). For unaffected control samples, Δ is a function of risk in nonsusceptible homozygotes (α), the population prevalence of disease (*K*_{P}), *q,* β, and γ. We used these analytic functions to calculate Δ and the number of cases or controls needed to detect DHW for a range of genetic models consistent with common diseases (1.1 γ 10 and 0.005 *K*_{P} 0.2). Results suggest that significant DHW can be expected in relatively small samples of patients over a range of genetic models. We also propose a goodness-of-fit test to aid investigators in determining whether a DHW observed in the context of a case-control study is consistent with a genetic disease model. We illustrate how the analytic framework and software can be used to help investigators interpret DHW in the context of association studies of common diseases.

## Introduction

Hardy-Weinberg equilibrium (HWE) has been used for more than a century to better understand genetic characteristics of populations. In 1902, William E. Castle noted that, if the selective removal from the general population of individuals who have a recessive genotype at a particular locus ceases, then the future generation will establish an equilibrium value of recessive alleles. His conclusions assume that one allele is dominant with respect to the other(s), that the alleles do not affect fertility, that there is no migration into or out of the population, and that the population is large and randomly mating (Castle 1903; Li 1967). Five years later, G. H. Hardy and W. Weinberg individually came to the same conclusion but noted that the population allele frequencies could be used to calculate the equilibrium-expected genotypic proportions (Hardy 1908; Weinberg 1908). If *p* is the frequency of one allele (*A*) and *q* is the frequency of the alternative allele (*a*) for a biallelic locus, then the HWE-expected frequency will be *p*^{2} for the *AA* genotype, 2*pq* for the *Aa* genotype, and *q*^{2} for the *aa* genotype. The three genotypic proportions should sum to 1, as should the allele frequencies (Hardy 1908; Weinberg 1908).

The most common way to assess HWE is through a goodness-of-fit χ^{2} test (Weir 1996). The null hypothesis is that alleles are chosen randomly, and the genotypic proportions thus follow HWE-expected proportions (i.e., *p*^{2}, 2*pq,* and *q*^{2}). Alternatively, the second allele is dependent on the first allele being selected, resulting in the genotypic proportions deviating from the HWEexpected proportions. The χ^{2} test for HWE has *k*(*k*-1)/2 df, where *k* is the number of alleles at the locus being studied (Weir 1996). Or, more intuitively, the degrees of freedom can be calculated by *g*-*k**,* where *g* is the number of possible genotypes and *k* is the number of alleles. For example, a SNP that has two alleles and three possible genotypes yields 1 df for the χ^{2} test that assesses the significance of departure from HWE (DHW).

Testing for HWE is commonly used for quality control of large-scale genotyping and is one of the few ways to identify systematic genotyping errors in unrelated individuals (Gomes et al. 1999; Hosking et al. 2004). However, there is little consensus on the correct threshold for identifying DHW in the context of large-scale studies or on what to do with markers that fit all other quality-control criteria but show a significant DHW, given the study-specific threshold. Some association studies do not consider DHW in patients to indicate genotyping error but prefer to assume a biological explanation for DHW in a patient sample, while requiring control or random samples to be in HWE (Oka et al. 1999; Levecque et al. 2003; Nejentsev et al. 2003). Others require that both the patient and the control samples be in HWE for the marker to be used in further analyses (Martin et al. 2000; Xu et al. 2001; Wang et al. 2003). Bonferroni correction is commonly used to correct for multiple testing when many markers are tested for HWE but is a conservative approach if the markers are correlated (e.g., are in linkage disequilibrium [LD]). Another widespread approach is to assign a significance level, such as 5% or 1%, and to test each marker without regard for whether markers are independent. Sometimes, markers showing DHW are removed from a data set because subsequent genetic analysis requires HWE to be assumed. However, since the design of association studies is to partition cases and controls in the general population on the basis of their phenotype and genetic composition, cases and controls at a disease locus may be expected to show DHW under certain conditions. There is little guidance on how to distinguish between markers whose DHW is due to chance, genotyping error, or failure of one of the requisite assumptions of HWE from markers whose DHW is due to their proximity to an allele affecting a phenotype on which the sample is ascertained (Xu et al. 2002).

Our focus is to understand DHW observed in the context of association studies. When an investigator observes a marker with a significant DHW, a key question is “Are deviations from HWE due to genotyping error, chance, failure of assumptions underlying Hardy-Weinberg expectations, or, instead, to the underlying genetic disease model?” To address this question, we developed an analytic framework for assessing the difference between population and HWE-expected genotypic proportions (Δ), as well as software tools for calculating expectations over a range of single-locus genetic disease models. We use a single-locus context in our examination of DHW because it enables us to represent the marginal effects of the particular susceptibility locus being examined. For general disease models, we show that Δ in patients at the disease locus is determined by the population susceptibility-allele frequency (*q*), the homozygote relative risk (γ), and the heterozygote relative risk (β). For “true” control samples (i.e., those ascertained by being unaffected with respect to disease status), the value of Δ depends on the risk to nonsusceptible homozygotes (α), *q,* γ, and β. Using these formulations, we have examined a wide range of parameter values for genetic models commonly believed to be consistent with observations for complex disorders. Our results provide support for the notion that systematic examination of HWE has been underutilized as a tool for fine mapping. To aid investigators in distinguishing a DHW that is generated by the biological model underlying disease transmission from one that is generated by genotyping errors, chance, or failure of the requisite assumptions of HWE, we propose a general goodness-of-fit test to determine whether the observed genotype counts are significantly different from the expected genotype counts in patients and controls for the genetic disease model that best fits the observed data. We provide software for both the investigation of DHW in a generalized single-locus context and the examination of specific observations of DHW.

## Methods

### Common Disease Models

The prevalence of disease in the general population is defined as *K*_{P}=*p*^{2}α+2*pq*αβ+*q*^{2}αγ, where *p* is the population frequency of the wild-type allele (*A*), *q* is the frequency of the disease-susceptibility allele (*a*), and α is the baseline penetrance of disease in homozygotes without a risk allele at this locus. By modeling the population prevalence of disease in the general population in this manner, we explicitly assume that the susceptibility locus will be in HWE in the general population. Note that, although this parameterization is for a single-locus model, there may be many genetic and nongenetic risk factors, in addition to the particular region being studied, that will determine values of the parameters α, β, and γ. Thus, we use this approach to model the marginal effects of a particular model, with the full recognition that, in general, there will be nongenetic risk factors and many other loci with genetic variation that affect the phenotype.

We explored dominant, recessive, additive, and multiplicative genetic models with *K*_{P} and γ values that are characteristic of a variety of common diseases, for which *K*_{P} values range from 0.005 to 0.2 and γ is equal to 1.1, 1.3, 1.5, 2, 5, or 10. All dominant, recessive, and additive models examined correspond to sibling relative risk (λ_{s}) values <2.5, although most models have λ_{s}1.1. Multiplicative models generally have higher λ_{s} values than dominant, recessive, and additive models with identical parameters. The maximum λ_{s} for a multiplicative model is 4.20, which occurs at the lowest population prevalence (*K*_{P}=0.005) and the highest homozygote relative risk (γ=10), but most multiplicative models considered have λ_{s} values <1.2.

### The Difference between Population and Expected Genotypic Frequencies (Δ)

Weir (1996) defined a variable, Δ, as the difference between population and HWE-expected genotypic frequencies. Δ is population specific (e.g., Δ_{p} for patients and Δ_{c} for unaffected controls), and it ranges from to .

To better understand the magnitude and direction of DHW for different genetic disease models, we defined Δ in terms of key parameters of the genetic model, specified separately in patients and controls, and characterized how parameters interact to affect directionality of DHW (i.e., if Δ<0, there is a deficiency of homozygotes and an excess of heterozygotes; if Δ>0, there is an excess of homozygotes and a deficiency of heterozygotes). For patients in a general disease model,

A more thorough derivation of the above equation and the simplified equations for the classic dominant, recessive, and additive models are given in appendix A. For a multiplicative model, Δ_{p} is equal to 0 and there is no expected DHW. Figure 1 highlights the direction and magnitude of Δ_{p} as the susceptibility-allele frequency at the susceptibility locus varies from 0 to 1.

_{p}plotted versus the susceptibility-allele frequency for patients.

*A, B,*and

*D,*Data points are as follows: γ=1.1 (

*blackened diamonds*), γ=1.3 (

*unblackened triangles*), γ=1.5 (

*blackened triangles*), γ=2 (

*unblackened*

**...**

Similarly, for an unaffected control sample,

and equations for the classic recessive, classic dominant, additive, and multiplicative models are given in appendix B. Figure 2 shows the direction and magnitude of Δ_{c} for several dominant, recessive, additive, and multiplicative models.

### Number of Individuals Needed to Detect DHW

The χ^{2} test of HWE for patients can be simplified to the following:

where *N*_{p} is the number of patients and is the susceptibility-allele frequency estimated in patients (Weir 1996). A χ^{2} value can also be determined for controls by using the susceptibility-allele frequency in controls (), Δ_{c}, and the number of controls in the sample (*N*_{c}). The power of the χ^{2} test can be accounted for by using a noncentral χ^{2} distribution with noncentrality parameter

(Weir 1996).

Given the above equations, it is straightforward to determine the number of individuals required to detect DHW for a specified level of significance and power, for patients and controls under any disease model. In figure 3, results for a variety of models are graphed in terms of *N*_{p} or *N*_{c} and the overall susceptibility-allele frequency for patients or controls. The results presented in figure 3*A* (patients) and and33*B* (controls) use a central χ^{2} distribution corresponding to a significance level of 5% and 50% power, and the results presented in figure 3*C* (patients) and and33*D* (controls) use a noncentral χ^{2} distribution corresponding to a significance level of 5% and 80% power.

*A,*Number of patients needed to detect DHW as the susceptibility-allele frequency changes at a significance level of 5% and 50% power. Data points are as follows: dominant model with γ=1.3 (

*unblackened triangles*), dominant model with γ=10

**...**

For congruence with other studies reporting the results in terms of λ_{s}, the risk of disease in siblings of an affected individual relative to the risk in the general population, are summarized in appendix C with respect to common disease models. We also summarize results with respect to λ_{s} and Δ in figure 4.

### Goodness-of-Fit Test

Given the analytic framework characterized above and a set of observations in patients and controls with DHW in one or both samples, we proceed to identify the genetic disease model with the best fit to the genotypic proportions observed in patients and controls (constrained by the known lifetime prevalence of disease). If that “best-fit” model is, nevertheless, a poor fit to the observed data, as assessed by a χ^{2} test with 1 df for a general model and 2 df for a restrictive (dominant, recessive, additive, or multiplicative) model (see appendix D), then it is unlikely that the underlying genetic disease model has generated the observed DHW; thus, alternative explanations for the DHW, including chance, genotyping error, and/or violations of the requisite assumptions of HWE, must be considered.

### Examples

We chose several examples from the literature that illustrate how our results and software can help investigators distinguish a DHW consistent with genetic models from one that is not. The first example involves a DHW in patients but not in controls, with data derived from an association study of Crohn disease and a rare frameshift polymorphism (Leu3020fsinsC) in *CARD15/NOD2* (Ogura et al. 2001; J. Cho, personal communication). The second example includes data from an association study that identified a highly significant association (*P*=3.3×10^{-6}) between *LTA* and myocardial infarction (MI) (Ozaki et al. 2002), attributable in part to the effect of significant DHW in both patients and controls in opposite directions. We also took advantage of several recent surveys of polymorphisms with DHW (Xu et al. 2002; Györffy et al. 2004; Kocsis et al. 2004*a*, 2004*b*; Osawa et al. 2004) to determine the proportion that are consistent with a genetic disease model (table 1).

## Results

### Magnitude of Δ

As noted elsewhere, the larger the value of Δ, the more the genotypic frequency departs from HWE expectations (Weir 1996). In patients, the magnitude of Δ_{p} is determined by *q,* γ, and β, as shown in equation (1) above. The maximum value for Δ_{p}, given a genetic disease model, occurs at the susceptibility-allele frequency that maximizes the HWE-expected proportion of heterozygotes affected with disease (*g*_{Aa}) and can be calculated by

For example, the maximum Δ_{p} for a dominant model with γ=1.5 occurs when the allele frequency is 0.45, which corresponds to the highest proportion of HWE-expected heterozygotes among patients (*g*_{Aa}=0.5505) under this model.

In controls, Δ_{c} is a function of *K*_{P}, α, β, γ, and *q.* Larger Δ_{c} values occur as the prevalence of disease in the general population increases, whereas, in patients, *K*_{P} has no direct effect on the size of Δ_{p}. For example, a dominant model with γ=1.5 and *q*=0.33 leads to a Δ_{c} of 0.00209 when *K*_{P}=0.1 and a Δ_{c} of 0.000193 when *K*_{P}=0.01. Larger Δ_{c} values also occur with higher γ values, as shown in figure 2. The susceptibility-allele frequency also affects the magnitude of Δ_{c}, with common allele frequencies (0.1*q*0.8) corresponding to larger Δ_{c} values and with allele frequencies at the tails of the frequency spectrum corresponding to smaller Δ_{c} values.

### Direction of Δ

The direction of Δ is vital to understanding whether an observed DHW can be due to the underlying genetic model, and, for patients, the disease model specifies the direction of Δ_{p} (table 2). The additive model is different than the dominant, recessive, or multiplicative models, in that Δ_{p} can be positive, negative, or zero. Δ_{p} is positive when 2 < γ < 4 and negative when γ>4. However, when γ=4, the additive model is equivalent to the multiplicative model (β=2 and γ=4), and Δ_{p} is equal to 0 (i.e., no DHW is expected). The general model is dependent on the value associated with β, where Δ_{p} is positive if γ>β^{2} and negative if γ<β^{2}.

The direction of Δ in controls is also dependent on the underlying genetic disease model, and, for dominant and recessive models, Δ_{c} is in the direction opposite that observed in patients. Interestingly, Δ_{c} for controls under the additive model does not change with γ, as observed for patients, but always shows an excess of heterozygotes (Δ_{c}<0). However, there is an asymmetry in the homozygote classes when Δ≠0 for both patients and controls, with patients showing an excess of disease-susceptible homozygotes and controls showing an excess of wild-type homozygotes. The multiplicative model generates Δ values for controls similar to those generated by the recessive and additive models, with Δ_{c}<0. The direction of DHW for a general disease model in controls is negative if γ+1+αβ^{2}>2β+αγ and positive if the inequality is reversed (i.e., γ+1+αβ^{2}<2β+αγ).

### Number of Individuals Needed to Detect DHW for a Specified Significance and Power

As might be expected, fewer patients are needed to detect DHW as the genotype relative risk increases. It is notable, however, that low γ values (e.g., 1.5) can produce DHW with relatively small samples of patients. For example, <500 patients are needed to detect DHW (at a significance level of 5% and 50% power) under a recessive model with γ=1.5 and over a range of susceptibility-allele frequencies from 0.29 to 0.63. As γ increases to 2, the range of susceptibility-allele frequencies increases to 0.11–0.81, with a minimum sample size of ~130 patients when *q*=0.4 with 50% power (fig. 3*A*) and 267 patients with 80% power (fig. 3*C*). An additive model, however, requires a minimum of ~750 patients at *q*=0.38 to detect DHW with γ=3, but the required sample size decreases to 176 patients with γ=2.2. As shown above, Δ_{p} increases for additive models as γ increases or decreases from 4. Therefore, at a significance level of 5% and 50% power, an additive model can lead to detection of DHW in 176 patients when either γ=2.2 or γ=7.56. The difference between the two observations, however, is the direction of the DHW, in which the former (γ=2.2) shows an excess of homozygotes and the latter (γ=7.56) shows a deficiency of homozygotes.

The number of controls needed to detect DHW is generally much larger but is often still within the range of commonly collected sample sizes for a wide range of models. As with patients, the number of controls required to detect DHW decreases as γ increases. But, unlike in the calculations for patients, *K*_{P} also determines the number of controls needed to detect DHW. For dominant models, <1,000 unaffected individuals are needed to detect DHW when *K*_{P}=0.2 and γ5 with 50% power (fig. 3*B*), and <2,000 controls with 80% power (fig. 3*D*). Because of the specifications of the additive model, the same effect is observed in controls as in patients, whereby a decrease in the number of controls needed to detect DHW occurs as γ increases or decreases from 4. Recessive models with γ5 can lead to detection of DHW in <200 controls with 50% power when *K*_{P}=0.2. When *K*_{P} decreases to as low as 0.05, ~1,000 controls are needed to detect DHW for the recessive model with γ10 and *q*=0.23 (fig. 3*B*). Multiplicative models show DHW in a sample of <2,000 controls when *K*_{P}0.1 and γ10 and in a sample of <1,200 controls when *K*_{P}0.2 and γ5 (at 50% power and a significance level of 5%). Generally, larger Δ_{c} values are associated with models specified by high *K*_{P} values (e.g., *K*_{P}0.2) and high γ values (e.g., γ5).

### Sibling Relative Risk Values that Correspond to Expected DHW

We calculated λ_{s} for all models studied, to understand the results for these common disease models relative to those of previous studies that assessed power by using this parameter. As expected, λ_{s} increases as *K*_{P} decreases, and, overall, the resulting λ_{s} values (including those that are expected to give DHW in a small sample) are quite small. For dominant models or recessive models (γ=1.5), in which DHW is expected for samples of <500 patients, λ_{s} is <1.03 (fig. 4). For additive models, in which <1,400 patients are needed to detect DHW, λ_{s} values do not exceed 1.18. Thus, DHW can be observed for strikingly low λ_{s} values in sample sizes currently being used for large-scale association studies.

### Example 1: DHW in Patients Only

Ogura et al. (2001) conducted a case-control association study of Crohn disease to fine map *IBD1* in the pericentromeric region of chromosome 16. They identified a cytosine insertion (Leu3020fsinsC) in *CARD15/NOD2* that causes a truncated NOD2 protein and that results in deficient activation of NF-κB and the immune response. We calculated HWE for the Leu3020fsinsC (Cins) polymorphism in unrelated Jewish and European American patients with Crohn disease and in unrelated controls. Because of the low proportion of HWE-expected homozygotes, we used Fisher’s exact test to determine DHW (Weir 1996). The distribution of genotypes in Jewish patients was 121 wild-type (wt)/wt homozygotes, 15 Cins/wt heterozygotes, and 4 Cins/Cins homozygotes, which yields *P*=.0059. The genotype distribution in European American patients was 314 wt/wt homozygotes, 41 Cins/wt heterozygotes, and 9 Cins/Cins homozygotes. Fisher’s exact test generates *P*=1.28×10^{-4}, which is also highly significant. When all unrelated patients are tested for DHW (456 wt/wt homozygotes, 61 Cins/wt heterozygotes, and 13 Cins/Cins homozygotes), *P*=6.83×10^{-6}. All three populations of patients showed significant DHW, with an excess of homozygotes. The distribution of genotypes in all controls was 311 wt/wt homozygotes, 25 Cins/wt heterozygotes, and 0 Cins/Cins homozygotes, which is consistent with HWE.

If the allele frequencies in both the patients (*q*_{p}=0.082) and the controls (*q*_{c}=0.037) are used to help delineate the general genetic disease model that best fits the observed genotypic distributions, we find that it can be characterized by *K*_{P}=0.002, *q*=0.0379, α=0.00186, β=1.696, and γ=18.338. If a χ^{2} test is performed to determine the goodness-of-fit of the expected number of patients and controls, given the model that best fits the observed genotypic counts, χ^{2}=0.478 (*P*=.827 with 1 df); thus, the best-fit model is a good fit to the observed data. Moreover, the model is consistent with what has been obtained elsewhere for other types of data (Hampe et al. 2001; Ogura et al. 2001).

### Example 2: DHW in Opposite Directions in Patients and Controls

Ozaki et al. (2002) published a large-scale association study designed to identify genetic variation affecting risk of MI in a Japanese sample. The study involved an initial screen of >92,000 SNPs in a small sample and follow-up studies for those SNPs with evidence of association (*P*<.01 for a dominant or recessive model) performed in a larger sample (1,133 patients, 1,006 random controls, and 872 age-matched random controls). This led to more intensive studies of the *LTA* region (near *HLA* on 6p21), with 5 of 26 SNPs genotyped in the region showing highly significant association with MI (e.g., for *LTA* exon 1 polymorphism, *P*=3.3×10^{-6}).

Although Ozaki et al. (2002) reported that all 26 SNPs were in HWE, with *P*>.01, we note that each of the 5 SNPs for which genotype distributions are published show a significant DHW in opposite directions for patients and controls. The *LTA* exon 1 polymorphism is the most significantly associated with MI, with a genotype distribution among the 1,133 patients of 416 GG homozygotes, 504 GA heterozygtoes, and 213 AA homozygotes, which yields χ^{2}=7.40 (*P*=.0065)—an excess of homozygotes relative to HWE-expected genotype proportions. The genotype distribution among the 1,006 random controls is 378 GG homozygotes, 512 GA heterozygotes, and 116 AA homozygotes. This sample of controls generates χ^{2}=8.51 (*P*=.0035) but shows a deviation in the opposite direction of patients, with a deficiency of homozygotes. The age-matched random control group of 872 individuals has a genotype distribution of 344 GG homozygotes, 428 GA heterozygotes, and 100 AA homozygotes, corresponding to a nonsignificant χ^{2}=3.69 (*P*=.055) but also showing a deficiency of homozygotes similar to that seen for the 1,006 random controls.

The first problem easily identified is that a significant DHW is observed in a sample that may be better characterized as a random sample than a control sample and thus would not be expected to show DHW, regardless of the underlying genetic disease model. If we assume that the control sample is a control sample, rather than a random sample, the best-fit model is *K*_{P}=0.01, *q*=0.371, α=0.00932, β=1.024, and γ=1.469. The goodness-of-fit to the observed data is significantly poor, with χ^{2}=5.024 (*P*=.025 with 1 df). Given that the best-fit model is not a good fit to what is observed in the 1,133 patients and 1,006 controls, the underlying genetic disease model for a susceptibility locus in that region is therefore an unlikely explanation for the observed DHW.

### Survey of DHW

We identified 41 association studies with 60 polymorphisms that depart from HWE, from several recent reviews (Xu et al. 2002; Györffy et al. 2004; Kocsis et al. 2004*a*, 2004*b*; Osawa et al. 2004) (table 1). All *K*_{P} values were either given by the association study or were identified on the World Health Organization Web site. There are 35 polymorphisms that depart from HWE in patients only, 21 that depart in controls only, 2 that depart in the same direction in patients and controls, and 1 that departs in the opposite direction in patients and controls. Of those that have a dominant or recessive model as the best-fit genetic model, 35.3% (12 of 34) did not have a genetic model that fit the observed genotype distributions. Of those for which the best-fit model was a general model, 53.8% (14 of 26) did not have a model that fit well the observed data. Therefore, 43.3% of the polymorphisms we identified from the reviews are inconsistent with a biological reason for the observed DHW. This highlights the importance not only of correctly assessing HWE for genotype data but also of understanding whether an observed DHW is consistent with a genetic model of disease susceptibility.

## Discussion

Investigators have taken surprisingly inconsistent pathways in the interpretation and use of DHW. In some cases, investigators report significant association between variation at candidate genes and complex disorders that is quite dependent on the existence of DHW (i.e., the associations are genotypic rather than allelic, and the genotypic differences are driven by DHW). In at least some situations, the observed DHW is implausible from a biological perspective, which calls into question the association result. On the other hand, some investigators believe that any marker showing a DHW is likely to be erroneous and/or misleading; as a result, they may be throwing away information valuable for mapping and identifying causal polymorphisms. With the analytic framework and software we have developed, we provide a way for investigators to assess markers with DHW in a more logical and systematic way by distinguishing those that could be attributed to the underlying genetic model at the susceptibility locus from those due to genotyping errors, chance, and/or violations of the assumptions of HWE, thereby improving the quality of scientific inferences.

Given the increasing sample sizes contemplated for large-scale association studies, it would seem that DHW is a neglected and potentially fruitful avenue for further research to improve signal localization. Results of studies of the *CARD15/NOD2* region provide support for this notion. A common coding polymorphism (P268S) (*q*=0.35 in patients with Crohn disease, and *q*=0.29 in unaffected controls) shows significant DHW, with an excess of homozygotes in both Jewish patients (*P*=.0277) and in patients overall (*P*=.00546) (J. Cho, personal communication). The best-fit model for this polymorphism provides a good fit to the observed genotypic distributions (χ^{2}=1.501×10^{-10}; *P*=1.0 with 1 df). Although this polymorphism is not thought to affect the risk of Crohn disease, three of the known susceptibility alleles (Leu3020fsinsC, Gly908Arg, and Arg702Trp) are in the same haplotype as the more rare allele at this site. If patients with any of those mutations are removed from the patient population and HWE is reassessed, the DHW is no longer significant (*P*=.2115 for Jewish patients, and *P*=.632 for patients overall). If none of the causative SNPs had been previously identified, the observation that the common polymorphism departed from HWE in a way that was consistent with a genetic model could have been used to support further investigation of the local region. This observation highlights the value of using DHW when fine mapping complex diseases, as originally hypothesized by Nielsen et al. (1998).

Given the relatively large sample sizes used in modern candidate-gene studies as well as in fine-mapping and positional cloning studies, and the fact that DHW should be expected for a wide range of genetic disease models (consistent with the modest genetic contributions likely to be relevant for complex disorders), it is perhaps surprising that we have not seen a larger number of markers reported to show DHW. Of 60 SNPs from association studies in which DHW has been identified in patients and/or controls and in which both patient and control genotype distributions are available, 34 (56.7%) have genotype distributions consistent with the expectations from a best-fit model (table 1). DHW in patients is never expected for a multiplicative model and is less likely to be detected as significant for a model with susceptibility-allele frequencies at the tails of the frequency spectrum. We have not examined the consequences for HWE in more-complex disease models with allelic or locus epistasis, profound sex effects, etc., and, if such models were standard for complex disorders, it is unclear what the consequences would be. Thus, the failure to more often observe DHW at polymorphisms hypothesized to affect susceptibility to complex disorders may reflect biological characteristics of the genetic disease model that make such DHW impossible or at least less likely.

It is also possible that investigators mistrust data with DHW and may sometimes ignore such polymorphisms with DHW in their studies. Systematic errors in genotyping or nonrandom patterns of missing data may generate a relatively consistent pattern of DHW (e.g., disproportionate missing data in heterozygotes may lead to a consistent pattern of DHW, with an excess of homozygotes). Other types of error may be more sporadic. An unrecognized polymorphism in primer sequences used in subsequent PCR may lead to DHW, with an excess of homozygotes, particularly when the primer polymorphism is in LD with one of the test alleles. Genomic duplications or deletions can also lead to DHW. Because even the systematic errors will not be universal, error and nonrandom patterns of missing data may be detectable in a single marker within a set that are in strong LD. In contrast, when markers have DHW due to violations of assumptions of HWE or to chance, markers in LD may show similar patterns of DHW. Independent samples will tend not to replicate chance DHW but may replicate DHW due to violations of HWE assumptions. Clearly, genomewide or large-scale studies offer additional information for interpretation of DHW, including, for example, the ability to examine potential population substructure. Many of the studies summarized in table 1, however, focus on one or a small number of candidate genes, and the approaches we propose offer a unique opportunity to improve the scientific inferences about observed DHW. It should also be noted that just because an observed DHW is consistent with genetic models does not mean that errors, missing data patterns, or violations of HWE assumptions did not generate or contribute to the observed DHW, and DHW can no doubt be attributable to a combination of factors. Finally, it should also be noted that investigators frequently underestimate the significance of observed DHW, probably because of a misunderstanding of the degrees of freedom associated with the test.

## Acknowledgments

We acknowledge William Wen, for creating the DHW software, and Dr. Judy Cho, for giving us access to the association data on *NOD2/CARD15* and Crohn disease. We also acknowledge Drs. Mark Abney, Carole Ober, Dan Nicolae, Jonathan Pritchard, and Daniel Schaid, for their helpful discussions. This work was supported by grants DK-55889, DK-58026, and U01-GM61393. The DHW software is freely available and can be downloaded from the Web site given below.

## Appendix A: Δ in Patients

Weir (1996) defines Δ as the difference between the population and HWE-expected genotypic frequencies. Therefore, we first define these genotypic frequencies in terms of *q,* the susceptibility-allele frequency in the general population; α, the baseline penetrance for wild-type homozgyotes; β, the relative risk to heterozygotes; γ, the relative risk to homozygotes for the susceptibility allele; and *K*_{P}, the population prevalence of disease. The expected genotypic proportions in patients, under the assumption of HWE in the general population, are as follows:

The expected susceptibility-allele frequency for patients is

The frequency of the wild-type allele in patients is

Therefore,

which then simplifies to equation (1).

For a classic recessive model, where β=1 and γ>1,

For a classic dominant model, where β=γ and γ>1,

For an additive model, where γ=2β and β>1,

As explained in the “Methods” section, the multiplicative model for patients, where β^{2}=γ and β>1, simplifies to Δ_{p}=0.

## Appendix B: Δ in Controls

As defined for patients in appendix A, we define the genotypic proportions in controls for a general disease model as

The expected susceptibility-allele frequency in controls is

The frequency of the wild-type allele in controls is

Therefore,

which simplifies to equation (2).

For a classic recessive model, where β=1 and γ>1,

For a classic dominant model, where β=γ and γ>1,

For an additive model, where γ=2β and β>1,

Finally, for a multiplicative model, where γ=β^{2} and β>1,

## Appendix C: **λ**_{s} Values

Given that

(Risch 1990), where *V*_{A} is the additive variance that can be calculated by

and *V*_{D} is the dominance variance that can be calculated by

λ_{s} under the classic dominant model can then be calculated as

λ_{s} under the classic recessive model can be calculated as

λ_{s} under the additive model can be calculated as

and, finally, λ_{s} under the multiplicative model can be calculated as

## Appendix D: Goodness-of-Fit Test

The expected genotypic proportions for patients under a genetic disease model are *g*_{AA-p}, *g*_{Aa−p}, and *g*_{aa-p}, as found in appendix A. The expected genotypic proportions for controls under a genetic disease model are *g*_{AA-c}, *g*_{Aa-c}, and *g*_{aa-c}, as found in appendix B.

For a particular set of disease model parameter values (α, β, γ, and *q*), where the total number of patients is *N*_{p}, the total number of controls denoted is *N*_{c}, and the observed number of patients (or controls), given a specific genotype, is *N*_{genotype-p} (or *N*_{genotype-c}), the goodness-of-fit test is as follows:

Minimizing the resulting test statistic over the entire parameter space, subject to appropriate constraints on α, β, γ, and *q* and where *K*_{P} is fixed, yields parameter estimates that are approximately maximum-likelihood estimates. The minimal value of the test statistic is approximately distributed as a χ^{2} with 1 df for a general model and 2 df for restricted models (i.e., dominant, recessive, additive, and multiplicative models) (Cramer 1946). We verify through simulations that applying this test to data showing a DHW does not affect the resulting distribution of the test statistic. We generated 1,000 replicates of 1,000 patients and 1,000 controls under a dominant, recessive, or general disease model as specified by the penetrances, which offer an advantage in being bound between 0 and 1. We kept for further analysis only replicates that showed DHW in patients, in controls, or in both patients and controls. We used the goodness-of-fit test to find the best-fit genetic model for each simulation and compared the resulting χ^{2} value to 1,000 simulated χ^{2} values, given the specified degrees of freedom (fig. 5).

*A,*1,000 simulations of a disease locus constructed under a general model (

*q*=0.20, α=0.12, β=2.67, and γ=4.33), a dominant model (

*q*=0.20, α=0.11, and β=γ=3.27),

**...**

If the goodness-of-fit test, with 1 df for a general model or 2 df for a restrictive model, corresponds to a *P* value less than the specified threshold, the genetic disease model can be rejected as a poor fit to the observed data. Note that rare-allele frequencies, which may produce low genotype counts, may produce approximations from the goodness-of-fit test that do not perform as expected.

## Electronic-Database Information

The URLs for data presented herein are as follows:

## References

*Gut*between 1998 and 2003: altered conclusions after recalculating the Hardy-Weinberg equilibrium. Gut 53:614–616. [PMC free article] [PubMed]

*NOD2*gene and Crohn’s disease in German and British populations. Lancet 357:1925–1928. [PubMed] [Cross Ref]10.1016/S0140-6736(00)05063-7

*a*) Examination of Hardy-Weinberg equilibrium in papers of

*Kidney International*: an underused tool. Kidney Int 65:1956–1958. [PubMed] [Cross Ref]10.1111/j.1523-1755.2004.00596.x

*b*) Reanalysis of genotype distributions published in

*Neurology*between 1999 and 2002. Neurology 63:357–358. [PubMed]

*APOE*in Alzheimer disease. Am J Hum Genet 67:383–394. [PMC free article] [PubMed]

*NOD2*associated with susceptibility to Crohn’s disease. Nature 411:603–606. [PubMed] [Cross Ref]10.1038/35079114

*PSA*) gene, risk of prostate cancer, and serum PSA levels in Japanese population. Cancer Letters 202:53–59. [PubMed] [Cross Ref]10.1016/j.canlet.2003.08.001

**American Society of Human Genetics**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (573K)

- Biased tests of association: comparisons of allele frequencies when departing from Hardy-Weinberg proportions.[Am J Epidemiol. 1999]
*Schaid DJ, Jacobsen SJ.**Am J Epidemiol. 1999 Apr 15; 149(8):706-11.* - Assessing departure from Hardy-Weinberg equilibrium in the presence of disease association.[Genet Epidemiol. 2008]
*Li M, Li C.**Genet Epidemiol. 2008 Nov; 32(7):589-99.* - Adapting the logical basis of tests for Hardy-Weinberg Equilibrium to the real needs of association studies in human and medical genetics.[Genet Epidemiol. 2009]
*Goddard KA, Ziegler A, Wellek S.**Genet Epidemiol. 2009 Nov; 33(7):569-80.* - Applicability of allele/genotype frequency from documented controls for case-control studies on genotypes among Japanese: MTHFR C677T as an example.[Asian Pac J Cancer Prev. 2009]
*Iida K, Tomita K, Okada R, Kawai S, Morita E, Hishida A, Naito M, Wakai K, Hamajima N.**Asian Pac J Cancer Prev. 2009 Apr-Jun; 10(2):231-6.* - [Hardy-Weinberg equilibrium in genetic epidemiology].[Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2010]
*Liu H, Hu Y.**Zhong Nan Da Xue Xue Bao Yi Xue Ban. 2010 Jan; 35(1):90-3.*

- An Evolutionary Analysis of Antigen Processing and Presentation across Different Timescales Reveals Pervasive Selection[PLoS Genetics. ]
*Forni D, Cagliani R, Tresoldi C, Pozzoli U, De Gioia L, Filippi G, Riva S, Menozzi G, Colleoni M, Biasin M, Lo Caputo S, Mazzotta F, Comi GP, Bresolin N, Clerici M, Sironi M.**PLoS Genetics. 10(3)e1004189* - Genomewide Association for Schizophrenia in the CATIE Study: Results of Stage 1[Molecular psychiatry. 2008]
*Sullivan PF, Lin D, Tzeng JY, van den Oord E, Perkins D, Stroup TS, Wagner M, Lee S, Wright FA, Zou F, Liu W, Downing AM, Lieberman J, Close SL.**Molecular psychiatry. 2008 Jun; 13(6)570-584* - Association of X-Ray Repair Cross-Complementing Group 1 Arg194Trp, Arg399Gln and Arg280His Polymorphisms with Head and Neck Cancer Susceptibility: A Meta-Analysis[PLoS ONE. ]
*Wu W, Liu L, Yin Z, Guan P, Li X, Zhou B.**PLoS ONE. 9(1)e86798* - One-Carbon Metabolism Pathway Gene Variants and Risk of Clear Cell Renal Cell Carcinoma in a Chinese Population[PLoS ONE. ]
*Zhang L, Meng X, Ju X, Cai H, Li P, Cao Q, Shao P, Qin C, Yin C.**PLoS ONE. 8(11)e81129* - A Multistage Genetic Association Study Identifies Breast Cancer Risk Loci at 10q25 and 16q24[Cancer epidemiology, biomarkers & preventio...]
*Higginbotham KS, Breyer JP, McReynolds KM, Bradley KM, Schuyler PA, Plummer WD, Freudenthal ME, Trentham-Dietz A, Newcomb PA, Parl FF, Sanders ME, Page DL, Egan KM, Dupont WD, Smith JR.**Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology. 2012 Sep; 21(9)1565-1573*

- Rational Inferences about Departures from Hardy-Weinberg EquilibriumRational Inferences about Departures from Hardy-Weinberg EquilibriumAmerican Journal of Human Genetics. Jun 2005; 76(6)967PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...