- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# Contrasting Linkage-Disequilibrium Patterns between Cases and Controls as a Novel Association-Mapping Method

^{1}National Institute of Environmental Health Sciences, National Institutes of Health, and

^{2}Department of Genetics Research, GlaxoSmithKline, Research Triangle Park, NC; and

^{3}Department of Biostatistics and Programming, Sanofi-Aventis, Bridgewater, NJ

## Abstract

Identification and description of genetic variation underlying disease susceptibility, efficacy, and adverse reactions to drugs remains a difficult problem. One of the important steps in the analysis of variation in a candidate region is the characterization of linkage disequilibrium (LD). In a region of genetic association, the extent of LD varies between the case and the control groups. Separate plots of pairwise standardized measures of LD (e.g., *D*^{′}) for cases and controls are often presented for a candidate region, to graphically convey case-control differences in LD. However, the observed graphic differences lack statistical support. Therefore, we suggest the “LD contrast” test to compare whole matrices of disequilibrium between two samples. A common technique of assessing LD when the haplotype phase is unobserved is the expectation-maximization algorithm, with the likelihood incorporating the assumption of Hardy-Weinberg equilibrium (HWE). This approach presents a potential problem in that, in the region of genetic association, the HWE assumption may not hold when samples are selected on the basis of phenotypes. Here, we present a computationally feasible approach that does not assume HWE, along with graphic displays and a statistical comparison of pairwise matrices of LD between case and control samples. LD-contrast tests provide a useful addition to existing tools of finding and characterizing genetic associations. Although haplotype association tests are expected to provide superior power when susceptibilities are primarily determined by haplotypes, the LD-contrast tests demonstrate substantially higher power under certain haplotype-driven disease models.

There has been considerable progress in designing techniques that go beyond sequential testing of SNPs. These methods are particularly important for the analysis of multiple SNPs that jointly represent variation within common transcripts and other functional regions, such as promoters. Methods for detection of association between traits and interacting genetic polymorphisms are being rapidly developed. Many approaches have considered important situations in which haplotypes of consecutive markers can be defined and tested for association with the trait. Methods have been designed to incorporate various sampling designs as well as the haplotype phase uncertainty.^{1}^{}^{}^{}^{}^{}^{–}^{7}

It has been noted that the extent of linkage disequilibrium (LD) can be different between the case and the control groups in a region of genetic association, and the case-control LD comparison can aid the analysis in a region of putative association.^{8} Contrasting pairwise LD matrices between cases and controls via graphic display provides a direct visual comparison.^{9} However, the observed graphic difference is subject to sampling variation and lacks statistical support. Therefore, a statistical test is desirable. In the context of association mapping, Nielsen et al. presented a directLD comparison approach involving two diallelic loci and noted that, in certain situations, a test that directly compares LD extent between the case and the control groups can be a powerful alternative to either haplotype-based or single-marker approaches.^{10} A test comparing the LD extent will include only a single LD parameter that results in a 1-df test, whereas a haplotype test will include four haplotypes with 3 df. Nielsen et al. considered the case of unambiguous haplotype phase. When the haplotype phase is unknown, the expectation-maximization (EM) algorithm is used to infer frequencies of haplotypes and, ultimately, to assess LD. The likelihood is constructed by assuming HWE on the level of haplotypes. With two diallelic markers, there are four haplotypes, and the usual assumption is that the two-locus haplotypes are in HWE. Checking that each SNP is in HWE is not sufficient to ensure HWE at the haplotypic level. Furthermore, in the region of association, the HWE is generally expected to be distorted in case and control samples.^{11}^{,}^{12} Therefore, the EM computation, although a valuable tool for evaluating LD in a sample of population controls, is not strictly appropriate for comparing LD levels in samples of cases and controls or for samples otherwise selected on the basis of phenotype.

Recently, Schaid^{13} and Zaykin^{14} showed that LD estimation with use of the composite disequilibrium approach, discussed below, provides results similar to those of the EM-based method under HWE, is computationally simpler, and avoids the assumption of haplotypic HWE. Hamilton and Cole^{15} and Zaykin^{14} gave bounds and proposed normalization for LD based on the composite definition. Zaykin showed that this normalization is robust to departures from HWE. Therefore, we propose use of the composite coefficient and its normalization for characterizing the LD in case-control samples. This leads to efficient methods of comparing and testing the difference of pairwise LD matrices between the case and control samples. We show that certain disease models result in high power of the LD-contrast test in comparison with the haplotypic test, even under situations when susceptibilities are largely determined by haplotypes. In such situations, the LD contrast test outperforms both the haplotype-based test and multilocus tests based on comparison of SNP scores.

## Methods

A test for comparing two single LD coefficients (“LD contrast” test) was described by Nielsen et al. for the case of known phase—that is, when four haplotype classes are directly observed.^{10} A χ^{2} test statistic has the form

with the variances given by Weir.^{16}^{(p113)} The LD coefficient, *D*_{AB}, is equal to *P*_{AB}-*p*_{A}*p*_{B}, where *P*_{AB} is the frequency of haplotype carrying alleles A and B and *p*_{A} and *p*_{B} are the corresponding allele frequencies. A log-linear framework for comparing disequilibria coefficients at one and two loci has been provided by Huttley and Wilson.^{17} There is only a single LD parameter describing dependence among the four haplotypes and the corresponding allele frequencies. Therefore, this LD-contrast test has an advantage of being a 1-df test, whereas a haplotypic test with all four haplotypes has 3 df. Nielsen et al. found empirically that this test can have higher power than either a haplotypic or a single-marker test when the pairwise LD between a functional site and two tested markers is low. When the haplotype phase is unknown, the above test can be extended to compare two composite LD coefficients with a test analogous to equation (1), with the variances provided by Weir.^{16} Furthermore, comparison of standardized coefficients may be of interest when single-locus genotypic frequencies differ. One of the commonly used standardized measures of LD is the coefficient *D*^{′}_{AB}, suggested by Lewontin,^{18} where *max*(*D*_{AB}) is the maximum possible absolute value of

given allele frequencies, also described in a nongenetic context by Yule,^{19} as

Weir^{20} discussed the correlation between alleles A and B, given as

which has the range that depends on allele frequencies. Weir^{20} and Peduzzi et al.^{21} gave the bounds for _{AB}.

Whereas allele counts are directly observed, haplotype phase is often ambiguous; therefore, *P*_{AB} cannot be estimated as a proportion of AB haplotypes among all 2*n* haplotypes in a sample. The maximum-likelihood estimate and, correspondingly, and can be obtained. However, this approach usually requires the assumption of HWE—that is, the dilocus genotype frequencies are given by the products of frequencies of haplotypes. Weir and Cockerham^{22} suggested estimating the composite LD coefficient instead, defined as Δ_{AB}=*P*_{AB}+*P*_{A/B}-2*p*_{A}*p*_{B}, with the composite correlation

where *D*_{A} and *D*_{B} are the Hardy-Weinberg disequilibrium coefficients at two loci and *P*_{A/B} is the joint frequency of alleles A and B at two different gametes. This coefficient is directly estimated from dilocus counts^{16} and, under HWE, corresponds to *D*_{AB}. Weir^{20} and Schaid^{13} investigated statistical properties of the composite LD estimator and made comparisons of the composite () and the maximum-likelihood () estimators. The composite estimator appears to perform well, since it is robust with respect to the HWE assumption.

The maximum and minimum possible values for Δ_{AB}, given genotypic frequencies at two loci, were reported by Hamilton and Cole^{15} and Zaykin.^{14} These values correspond to bounds on covariance between two trinary variables that take values −1, 0, and 1. Equation 4 in the work by Zaykin^{14} gives the bounds for *abs*(Δ_{AB}) succinctly as

The standardized composite measure of LD with the range −1 to 1 is computed as

The standardization with use of equation (3) takes into account composite LD dependency on genotype frequencies and holds the promise for association-mapping applications. Cases and controls generally have different extents of gametic as well as nongametic disequilibria around a region of genetic association,^{10}^{,}^{14} which is captured by the composite LD.

Such a test is extended here to the comparison of whole matrices of standardized coefficients between the case and the control groups, to aid in identification of effects due to interactions among SNPs. The matrix of nonstandardized coefficients is nonnegative definite, by virtue of being a variance-covariance matrix. Therefore, one can compute statistics on the basis of eigenvalue-eigenvector (spectral) decomposition of the LD matrix. Elsewhere, we proposed the use of spectral decomposition of the composite-LD matrix for selection of a subset of markers that optimize the information retained in a genomic region, using samples of population controls.^{23} The matrix of standardized composite LD is not necessarily positive definite, which limits the application of the spectral decomposition-based statistic. Another statistic used in this study is based on the overall LD difference (*Z*_{2}).

We define the standardized composite-LD matrices as Δ^{′}_{Y} and Δ^{′}_{N} and matrices of the composite LD correlation (eq. 2) for the case and the control groups as *r*_{Y} and *r*_{N}*.* In both cases, diagonal entries of the matrices are equal to 1. Composite-LD matrices have spectral decompositions

and

where are sample composite LD eigenvalues and eigenvectors (for the control, *N,* or for the case, *Y,* group), and *T* denotes the matrix transpose. Spectral decompositions based on the composite-LD covariance matrices Δ_{Y} and Δ_{N} are defined similarly.

We define matrices of first *k* column case and control eigenvectors by *E*_{Y} and *E*_{N}, respectively. The two statistics are

and

We suggest that the *Z*_{2} statistic should take a slightly different form when computed using the standardized LD:

In these equations, *L* is the number of markers and *k**L* is the number of principal components. The denominator, 4*L*(*L*-1), is the upper bound for the numerator of *Z*_{2}. The denominator does not affect the magnitude of the resulting *P* value because it is invariant under permutations.

The statistic *Z*_{1} measures the difference between two spaces (sum of squared cosines of the angles between the eigenvectors) defined by the first *k* eigenvectors and ranges between *k* and 0 (maximum difference). Krzanowski described this statistic and the corresponding permutation-based tests (where the phenotype value is randomly shuffled among individuals) for comparison of two sets of principal components.^{24}^{,}^{25}

The value *k* must be specified in advance. Krzanowski^{25} suggested using the value of *k* that is the largest integer smaller than *L*/2. This ensures that the “important” components are represented, whereas values *k**L*/2 will cause the subspaces defined by the two sets of eigenvectors to intersect in at least one dimension.

The sum-of-squared-differences statistic *Z*_{2} measures the overall difference in the corresponding pairwise LD. This statistic is also appropriate for comparing Δ^{′}_{Y} and Δ^{′}_{N}*.* Note that, for the standardized LD, the range of *Z*_{2} is 0*Z*_{2}1, with 1 giving the maximum difference.

Both *Z*_{1} and *Z*_{2} definitions can be covariance based rather than correlation based. However, results for the covariance-based tests are not reported here because these tests showed consistently inferior power when compared with the correlation-based tests. We also performed a preliminary examination of several statistics based specifically on the comparison of corresponding LD eigenvectors as well as eigenvalues (e.g., sum of squared differences) between the case and the control groups. These tests did not show prominent power characteristics, and the results are not reported here. Nevertheless, detailed study of utility of such tests may warrant further investigation.

The generalized *T*^{2} test was applied in the association-mapping context by Xiong et al.^{26} This test employs the composite-LD matrix as part of the test statistic. The *T*^{2} test compares mean vectors of SNP values in cases and controls, where SNP values are obtained by recoding genotypes as *AA*→1, *Aa*→0, and *aa*→-1. The variance part of the *T*^{2} test statistic is the pooled variance-covariance matrix for the recoded values. It follows that, under the hypothesis of no association, the off-diagonal elements of this matrix are estimates of twice the composite LD coefficients, and the diagonal entries are twice the estimates of the variances of allele frequencies. Therefore, the generalized *T*^{2} test indirectly uses the composite LD in the variance part of the statistic.

## Results

To evaluate performance of the proposed tests, we compared methods that are designed to detect either single-SNP effects or SNP interactions when the effects are associated with entire haplotypes. The *T*^{2} test is expected to have good power in the presence of several SNPs contributing to the association. In contrast, the min(*P*) test^{27} is most sensitive to a single associated SNP while accounting for correlation between SNPs due to LD. This test evaluates significance of the most extreme association test statistic (Armitage’s trend test in the present study). The significance is evaluated via permutations, preserving dependencies among SNPs. To detect haplotypic effects, we employed the “Haplotype Trend Regression” method of Zaykin et al.^{2} Methods used for power comparisons in this study are merely providing a reference point of comparison under different models. It is unlikely that a single “best” method can be recommended for the discovery of genetic associations, because the power obtained for the different methods will vary with the disease models assumed.

### Pharmacogenetic Association-Mapping Example: *CYP2D6*

Identification of individual genetic differences in response to medicine has potential for reducing side effects and improving efficacy of drugs. The cytochrome p450 gene, *CYP2D6,* is involved in metabolism of ~20% of marketed drugs.^{27} Hosking et al.^{28} described the association of SNP and haplotype polymorphisms with the poor drug-metabolizer phenotype in a region around the *CYP2D6* gene. The data set consisted of 41 “poor metabolizer” cases and 977 controls. SNPs from the middle of the region show very high levels of association, which would be strongly supported by any of the tests discussed here. To illustrate an application of our technique, we identified six 5′-flanking consecutive SNPs. Missing genotypes were imputed with the package MICE.^{29} Further details of the data set are given in the work of Hosking et al.

We found pronounced differences in LD between the case and the control groups. Figure 1 is a graphic presentation of the differences in LD and displays the LD matrices by use of ellipses whose shape reflects the magnitude of LD and whose direction reflects the sign of the disequilibrium: 45°-oriented ellipses reflect the positive sign of LD, whereas the more circular shape of an ellipse reflects a low degree of LD. Murdoch and Chow^{30} suggested the use of such graphs to display correlation matrices. Evidently, there are large observed differences, since some of the coefficients are reversed in sign. The values of *r* (left graph) and Δ^{′} (right graph) are similar to each other.

The difference in correlation is significant at the 5% level: for the *Z*_{2}(*r*) test, *P*=.033, although the *Z*_{2}(Δ^{′}) test *P* value of .061 does not reach significance (all tests except the asymptotic *T*^{2} are based on 50,000 permutations). The test comparing the first two correlation-based principal components, *Z*_{1}(*k*=2), gave a significant *P* value, .026. Statistics based on *k*=1,3 resulted in *P* values of .232 and .283, respectively. There is a multiple-testing issue involved with evaluating statistics that are based on the different numbers of principal components, *k*=1,2,3. Nevertheless, we note that the value *k*=2 corresponds to Krzanowski’s recommendation and could be set as the default value. The *T*^{2} test gave the *P* value equal to .337, reflecting the apparent absence of detectable effects associated with individual SNPs. Neither the allelic trend test nor the test comparing genotypic frequencies at individual SNPs was significant. The overall haplotypic test was not significant (*P*=.168). Thus, the application of the LD-contrast test to this particular data set shows that the method is successful in detecting the case-control LD difference, which supports visual differences in the LD patterns conveyed by figure 1.

### Simulation I: 5-SNP and 6-SNP Haplotypes

A more extensive evaluation of the tests based on the *Z*_{1} and *Z*_{2} statistics was performed using simulations. When susceptibilities are driven mainly by haplotypes (i.e., there are pronounced haplotype effects but no interaction between haplotypes), it is expected that haplotypic tests should have optimal power. Nevertheless, there are notable exceptions to this rule. For two markers, Nielsen et al.^{10} showed that there are scenarios in which a test comparing LD coefficients is more powerful than is a single-locus or a haplotypic test. One situation in which this is the case is when multilocus susceptibilities induce an “orthogonal”-like distribution of dilocus haplotypes between the case and the control groups. By “orthogonal,” we mean the situation when high-susceptibility haplotypes tend to be defined by different SNPs. Culverhouse et al.^{31} considered epistatic models of this type.

To mimic this scenario, a set of simulations was constructed, under a haplotype-driven model common for all simulations. Haplotype frequencies were drawn from the Dirichlet(1,…,1) distribution. Effect sizes were drawn from the Gamma(1) distribution and were inspected to ensure that two large effect sizes (*h*_{i}) are allocated to the most-distinct 6-SNP haplotypes—111111 and 222222—corresponding to a situation of two independent mutations in high LD with two very distinct haplotypes. To form an individual, a pair of haplotypes in these simulations was sampled from the population with the Dirichlet-derived haplotype frequencies. To obtain the binary outcome, the continuous phenotype values (*Y*_{ijk}=*h*_{i}+*h*_{j}+*e*_{k})—where *e*_{k}~*N*(0,σ^{2})—and σ=7.5 were dichotomized around two different threshold values, determined by the 0.05 and 0.5 population quantiles of *Y.* The population values of Δ^{′} among the cases and among the controls are listed in table 1. The correlation LD values followed the same pattern and were similar in values to the values of Δ^{′}. The largest difference between the corresponding *r* and Δ^{′} coefficients was 0.06. The population LD values were small, which may correspond to a situation in which a set of SNPs in a candidate gene is selected on the basis of redundancy reduction.^{23} The largest case-control LD difference (0.116) was between the (1,4) and (4,1) entries of the LD matrix, the minimum difference was 0.009, and the mean difference was 0.07. For this set of simulations, 250 cases and 250 controls were sampled for each of 10,000 simulation runs.

Tests based on *Z*_{1} and *Z*_{2} statistics (with use of both Δ^{′}- and *r*-based versions of *Z*_{2}) were performed, and *P* values were recorded. Results of these simulations for the two values of population prevalence are shown in table 2. The results show that the LD-contrast test that is based on the squared difference statistic (*Z*_{2}) has the largest power with use of both the Δ^{′} and the correlation-based definitions.

The power of the haplotype-based test^{2} was substantially lower, and the power of both *T*^{2} and min(*P*) (single-SNP permutation-based trend allelic test) was low. The power of the principal components-based test (*Z*_{1}) was lower then the power of the test based on the *Z*_{2} statistic. However, in this model, it was higher than the power of the *T*^{2}, min(*P*), and the haplotypic tests (at the value of *k*<*L*/2=2). Dichotomization around the population mean to produce the binary outcome yielded results similar to the quantile-defined thresholds just described (data not shown). In addition to these results with fixed parameters, we conducted a set of 5-SNP simulations in which samples of haplotypes were obtained using the forward evolutionary model of drift with recombination.^{32} The simulations are “forward” to distinguish them from a popular coalescent approximation of this process, which operates “backwards” in time. These forward simulations are a typical implementation of a genetic drift with admixture population-genetic model and with nonoverlapping generations and recombination modeled as a Poisson process. A very similar model was used by Zaykin et al.^{2} The effects that determine susceptibilities were sampled from a template that induces pairwise orthogonality, with added normal variability. In contrast to the simulation just described, all population parameters were sampled anew prior to each simulation. This allowed averaging across a variety of models. The power is not necessarily expected to be reduced in this setup. In general, larger variance associated with haplotype effects would result in higher power values of the tests. In addition, the induced “marginal effects” at the level of SNPs and dilocus haplotypes are dependent on both the susceptibility values and the population frequencies.

In these simulations, we observed that the 5×5 composite-LD matrix comparison tests (*Z*_{2}) still had higher power on average (88% power for the correlation and 75% for Δ^{′}) than either the generalized *T*^{2} (55% power) or the haplotype-specific test (60% power). Thus, these simulations confirmed that the power of the LD-contrast tests is still the highest, as was found to be the case for the fixed set of effects and frequencies.

### Simulation II: 15- and 30-SNP Haplotypes

An evaluation of the tests in which the trait variation is determined by the diploid pairs of haplotypes (diplotypes) was performed using simulations. For this model, we used much larger—15-SNP and 30-SNP—haplotypes sampled from a population generated by the forward evolutionary model of drift with recombination.^{32} The phenotype model was similar to the one described above. We considered a more general, diplotype-driven model, in which normally distributed diplotype, rather than haplotype, effects were added to the trait value, together with the common normal error. New diplotype effects were sampled prior to each simulation. In this set of simulations, the trait values have been dichotomized around the mean to produce a binary trait. The LD-contrast tests were verified to have the correct type I error by setting the population genetic effects to zero and examining quantiles of the resulting *P* value distribution.

This set of simulations generated relatively high pairwise LD. The two middle quartiles for the population LD distribution (measured by *r*_{AB}) were estimated to be 0.413 and 0.975, with the median value of 0.735. One of the 30-SNP samples from this simulation study was used to produce an illustrative graphic plot of pairwise LD (fig. 2). The plot illustrates LD differences between the upper (cases) and the lower (controls) samples. For example, there is a region of high LD around the marker pair (21,8) in the cases, whereas this region has relatively low LD in the controls. Nonetheless, statistical tests, as described here, are needed to assess the extent to which these LD differences can be attributed to the sampling variation.

*Above the diagonal,*.

*Below the diagonal,*. Δ

^{′}-based LD difference

*P*<1×10

^{-3}. The scale of colors from blue (lower values) to red (higher values)

**...**

As before, we assumed haplotypic phase to be unknown. Many published haplotype association–mapping algorithms would not be computationally feasible, given the large number of SNPs. The generalized *T*^{2} test^{26} was used for the comparison, as was the single SNP–based “min(*P*)” shuffling test, in which the significance of the allelic trend test with the maximum value of χ^{2} is obtained via permutations.^{33}^{,}^{34}

It should be noted that the *T*^{2} test has high power when alleles of multiple SNPs independently contribute to the trait, because the test compares means of SNP scores between the case and the control groups. In both 15-SNP and 30-SNP settings, we observed similar power for the *T*^{2} and *Z*_{2} tests.

For the 15-SNP data, the power was 0.71 for *T*^{2} and *Z*_{2}, when *Z*_{2} was based on the correlation LD matrix (table 3). The power for the *Z*_{2} test based on the standardized matrix was lower, 0.62. The power of the single-SNP permutation-based trend (allelic) test was 0.57. Thus, despite taking into account the correlation between SNPs, single-marker tests had relatively lower power. We computed the eigenvector statistic *Z*_{1} for values of *k* required to account for various proportions of the variance (0.5,0.75,0.9), as can be determined by the cumulative sum of eigenvalues (such an approach has been employed elsewhere by Meng et al.^{23}). The maximum value of *k* was set to 7. This resulted in *k* equal to 1–3 () for 0.5 of the variance, *k* equal to 1–6 () for 0.75 of the variance, and *k* equal to 6–7 () for 0.9 of the variance. Fixed values of *k* have been tried as well; however, we could not achieve power comparable to that of the test based on *Z*_{2}. Because of higher LD in this set of simulations, the best power was observed at intermediate values of *k.*

Similar relative power was observed for the 30-SNP data. When *Z*_{2} was based on the correlation LD matrix, the power was 0.82 for the *T*^{2} test, 0.85 for the *Z*_{2} test, and 0.77 for the *Z*_{2} test based on the standardized LD (Δ^{′}). The eigenvector statistic–based test (*Z*_{1}) had highest power, 0.62, at the proportion of the variance equal to 0.75, which corresponded to *k*=2–8 (). The single-SNP permutation-based trend test had very low power, 0.17.

Although the LD-matrix comparison test was found to have power similar to the generalized *T*^{2}, our results suggest that these tests tend to identify essentially different attributes of genetic association in a region. The left graph of figure 3 shows a plot of *P* values obtained from the *T*^{2} test versus the corresponding *P* values of *Z*_{2} (15-SNP simulation). The correlation between the two tests was quite low (0.36), and over half of *T*^{2} test *P* values >.05 were <.05 when evaluated with *Z*_{2}. On the other hand, the right graph for the correspondence between *Z*_{1} and *Z*_{2} statistics shows very large correlation (0.91). The *Z*_{1} test for the difference between the case group and the control group principal components had lower power than the test based on *Z*_{2}, which shifted points up from the diagonal on the second graph.

## Discussion

Genetic association studies typically report characterization of LD in candidate regions with LD plots—that is, using graphic representation of LD matrices.^{9} These plots are usually given for population control samples, although LD plots for case samples are reported and are compared (visually) with the LD pattern in control samples. Rubio et al.^{35} compared graphic plots of LD between multiple sclerosis case and control samples in the human leukocyte antigen region and concluded that *D*^{′} values appeared slightly higher in the case sample. Suarez et al.^{36} compared specifically composite LD coefficients between samples of alcoholics and nonalcoholics and concluded that there was similarity in the pattern of LD, “although there is the suggestion of less disequilibria in the alcoholic sample than among the controls” (p. 14). We suggest that such comparisons be complemented by a statistical procedure. Moreover, we found that the power of comparing LD patterns is comparable to that of traditional mapping techniques, or is even superior in certain situations.

The standardized LD coefficient *D*^{′} remains a popular measure that accounts for dependence of the LD range on allele frequencies. However, one of the problems with an EM-based estimator is the requirement of the random union of haplotypes (haplotypic HWE). We resolve this problem by accommodating results of Hamilton and Cole^{15} and Zaykin,^{14} and we suggest that the plots can be based on the standardized composite coefficient. Straightforward definitions of the composite and standardized composite LD, as well as the efficiency of the computations, make it easy to compare LD plots between samples of cases and controls. Such comparisons can be done “by eye,” but a statistical procedure is desirable. An asymptotic test to compare two LD (i.e., covariance) matrices can be constructed; however, such tests are rather sensitive to distributional assumptions.^{37}^{,}^{38} In addition, the distribution theory and inference become much more complicated once normalizations of covariance (e.g., correlation) are considered.^{25} Because of these concerns, we adopted the permutational framework to provide comparison of LD matrices based on the composite coefficient and its standardized version.

Statistical approaches specifically tailored for identification of haplotype effects are being rapidly developed.^{39} There is strong biological evidence that entire haplotypes rather than single SNPs are important in determining the trait variation. Therefore, identification and estimation of haplotype effects are important issues. Still, the multiplicity of haplotypes and phase uncertainty adversely affect statistical power. Multilocus “scoring” approaches that capitalize on marginal effects of individual SNPs are being developed as well. These approaches indirectly take into account the interaction between SNPs while adjusting for LD.^{26}^{,}^{40}

In particular, these approaches are expected to have good power under models that induce substantial marginal SNP effects and strong LD. Although haplotype-based approaches and scoring methods, such as the generalized *T*^{2} test, provide relatively high power in the respective situations, it has been noted that the extent of LD can be markedly different between the case and the control groups in a region of genetic association.^{8} Therefore, a case-control LD comparison appears to be a promising addition to existing methods of characterizing multilocus associations. Further, Nielsen et al.^{10} examined a two-SNP situation and found that a test comparing LD coefficients can be more powerful than a single-locus or a haplotypic test.

We extend these results to the case of multiple SNPs. The LD-contrast test, like any other method, would not be expected to have superior power across all susceptibility models. One virtue of the LD-contrast test is the reduced number of parameters (e.g., there is only a single LD coefficient in the two-SNP case, although there are four haplotypes), and, as we discuss here, there are certain models that may result in good power of the LD-contrast test. A prominent example is when multilocus susceptibilities induce orthogonal-like distribution of dilocus haplotypes in cases and controls. Heterogeneity models in which mutations are associated with haplotypes that are distinct, with respect to a large proportion of the alleles that they carry, may result in such orthogonality for some of the dilocus pairs. To illustrate why the LD contrast test can work well under these scenarios, denote two alleles at either of two loci by 1 and 2: *A*1, *B*1, *a*2, *b*2*c*. The disequilibrium coefficient can be written in terms of the haplotype frequencies as *D*_{AB}*D*_{11}=*P*_{11}*P*_{22}-*P*_{12}*P*_{21}. The disequilibrium is large in a particular sample if the haplotypes 11 and 22 are overrepresented compared with the two other types. Therefore, the ratio *D*^{cases}_{AB}/*D*^{controls}_{AB} tends to deviate from 1 when the orthogonal haplotypes 11 and 22 are overrepresented in one of the groups. A similar situation holds with the composite LD definition, because the value of the sum *D*_{AB}+*D*_{A/B} increases with *D*_{AB}. Moreover, we found that, whereas the *T*^{2} test and the LD contrast test provide similar power under a general diplotype-driven model, the correlation between *P* values of the two tests is low (left graph of fig. 3), which suggests that these approaches distinguish between different aspects of association. That is, the LD comparison tests are more sensitive to interactions that tend to induce small marginal effects associated with individual SNPs. The magnitude of marginal effects is largely unpredictable in practice, since it is determined by both multilocus susceptibility values and the corresponding population frequencies. Therefore, it would vary from population to population, given the same penetrance configuration.

The question remains: which measure, Δ^{′} or the correlation *r,* is more appropriate for the comparison of patterns of LD? Two samples can have equal LD correlation values even when the standardized LD coefficients are unequal, and vice versa,^{41} making the a priori choice of the statistic somewhat difficult. An appealing feature of the standardization by the LD bounds (“di-prime-ization”) in that it makes the measure independent of single locus frequencies. The independence is in the sense of the range that the coefficient can take. The allele frequencies very much remain a part of that definition,^{42}^{,}^{43} and it would be a mistake to interpret the standardized coefficient as being free of dependencies on the allele or genotype frequencies. On the other hand, the correlation coefficient enjoys well-defined statistical and population-genetic properties and gives a straightforward extension to the principal components–based inference. The simulations (Simulation II) show somewhat higher power of the tests based on the correlation, and the *CYP2D6* data set considered here provides an example in which the test based on the correlation provides slightly stronger evidence of association (*P*=.033 vs. *P*=.061), although both results can be considered indicative of association. In the absence of a specific hypothesis, it seems reasonable to employ the correlation-based analysis and to reserve the Δ^{′}-based LD comparisons for more-detailed characterizations of LD. Nevertheless, the correlation and the Δ^{′}-based comparisons address different hypotheses, and both tests have their value. Our choice of a particular simulation design might have favored greater deviations from the equality of correlations. The orthogonal model (Simulation I), in which the correlation and the Δ^{′} values were almost identical, showed similar power of the two tests.

In our simulations, we found that the squared difference–based statistic has better power than the statistic based on the comparison of the principal components, although the correlation between *P* values obtained for these tests is high. The squared difference is more of an omnibus test. For example, if the amount of LD is proportionally higher among the cases for all pairs of markers, the principal-components test will lack power. In addition, there is uncertainty about the number of components to use. Still, such tests can provide a description of the multivariate structure of LD. The principal component–based analysis seems to be most valuable at the descriptive stage once the association is established. The default value for the number of principal components (*k*) can be set to Krzanowski’s recommendation^{25} to use the largest integer *k* that is smaller than *L*/2.

In summary, we suggest that statistical approaches to compare pairwise LD matrices between the case and the control samples are useful additions to already-available statistical-mapping tools. As with single-marker case-control analysis, population heterogeneity is an issue. Further research should emphasize extending these methods to accommodate family data and to provide methods robust to population stratification and admixture.

## Acknowledgments

This research was supported in part by the Intramural Research Program of the National Institutes of Health, National Institute of Environmental Health Sciences. Programs implementing the LD-contrast tests are available at D.V.Z.'s Web site^{} and from D.V.Z. on request. Liling Warren, Norman Kaplan, and two reviewers provided useful comments that improved the manuscript.

## Web Resource

The URL for data presented herein is as follows:

## References

*T*

^{2}test for genome association studies. Am J Hum Genet 70:1257–1268 [PMC free article] [PubMed]

**American Society of Human Genetics**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (927K)

- Improving power in contrasting linkage-disequilibrium patterns between cases and controls.[Am J Hum Genet. 2007]
*Wang T, Zhu X, Elston RC.**Am J Hum Genet. 2007 May; 80(5):911-20. Epub 2007 Mar 28.* - Joint modeling of linkage and association: identifying SNPs responsible for a linkage signal.[Am J Hum Genet. 2005]
*Li M, Boehnke M, Abecasis GR.**Am J Hum Genet. 2005 Jun; 76(6):934-49. Epub 2005 Apr 5.* - QTL fine mapping by measuring and testing for Hardy-Weinberg and linkage disequilibrium at a series of linked marker loci in extreme samples of populations.[Am J Hum Genet. 2000]
*Deng HW, Chen WM, Recker RR.**Am J Hum Genet. 2000 Mar; 66(3):1027-45.* - Multipoint linkage disequilibrium mapping approach: incorporating evidence of linkage and linkage disequilibrium from unlinked region.[Genet Epidemiol. 2003]
*Hsu FC, Liang KY, Beaty TH.**Genet Epidemiol. 2003 Jul; 25(1):1-13.* - Linkage disequilibrium for different scales and applications.[Brief Bioinform. 2004]
*Mueller JC.**Brief Bioinform. 2004 Dec; 5(4):355-64.*

- A Common Cortactin Gene Variation Confers Differential Susceptibility to Severe Asthma[Genetic epidemiology. 2008]
*Ma SF, Flores C, Dudek SM, Nicolae DL, Ober C, Garcia JG.**Genetic epidemiology. 2008 Dec; 32(8)757-766* - Power of Single- vs. Multi-Marker Tests of Association[Genetic epidemiology. 2012]
*Wang X, Morris NJ, Schaid DJ, Elston RC.**Genetic epidemiology. 2012 Jul; 36(5)480-487* - A Simple and Fast Two-Locus Quality Control Test to Detect False Positives Due to Batch Effects in Genome-Wide Association Studies[Genetic Epidemiology. 2010]
*Lee SH, Nyholt DR, Macgregor S, Henders AK, Zondervan KT, Montgomery GW, Visscher PM.**Genetic Epidemiology. 2010 Dec; 34(8)854-862* - Evaluation of seven common lipid associated loci in a large Indian sib pair study[Lipids in Health and Disease. ]
*Rafiq S, Venkata KK, Gupta V, Vinay D, Spurgeon CJ, Parameshwaran S, Madana SN, Kinra S, Bowen L, Timpson NJ, Smith GD, Dudbridge F, Prabhakaran D, Ben-Shlomo Y, Reddy KS, Ebrahim S, Chandak GR.**Lipids in Health and Disease. 11155* - A New Association Test to Test Multiple-Marker Association[Genetic epidemiology. 2009]
*Wang X, Zhang S, Sha Q.**Genetic epidemiology. 2009 Feb; 33(2)164-171*

- Contrasting Linkage-Disequilibrium Patterns between Cases and Controls as a Nove...Contrasting Linkage-Disequilibrium Patterns between Cases and Controls as a Novel Association-Mapping MethodAmerican Journal of Human Genetics. May 2006; 78(5)737PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...