- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# A Powerful Strategy to Account for Multiple Testing in the Context of Haplotype Analysis

## Abstract

Haplotypes—that is, linear arrangements of alleles on the same chromosome that were inherited as a unit—are expected to carry important information in the context of association fine mapping of complex diseases. In consideration of a set of tightly linked markers, there is an enormous number of different marker combinations that can be analyzed. Therefore, a severe multiple-testing problem is introduced. One method to deal with this problem is Bonferroni correction by the number of combinations that are considered. Bonferroni correction is appropriate for independent tests but will result in a loss of power in the presence of linkage disequilibrium in the region. A second method is to perform simulations. It is unfortunate that most methods of haplotype analysis already require simulations to obtain an uncorrected *P* value for a specific marker combination. Thus, it seems that nested simulations are necessary to obtain *P* values that are corrected for multiple testing, which, apparently, limits the applicability of this approach because of computer running-time restrictions. Here, an algorithm is described that avoids such nested simulations. We check the validity of our approach under two disease models for haplotype analysis of family data. The true type I error rate of our algorithm corresponds to the nominal significance level. Furthermore, we observe a strong gain in power with our method to obtain the global *P* value, compared with the Bonferroni procedure to calculate the global *P* value. The method described here has been implemented in the latest update of our program FAMHAP.

## Introduction

Data on densely spaced markers within one gene or haplotype block have become a reality. Although the usefulness of haplotype analysis in such a situation is commonly accepted, there is still no consensus on how the analysis should be performed. Multiple testing is an important problem in this context. With *n* markers, there are 2^{n}-1 marker combinations for which a haplotype-based test can be performed. Hence, for 20 markers, 1,000,000 tests are possible, and it is clear that hardly any *P* value will withstand a Bonferroni correction by that number. A common approach to reducing the number of tests is to use a sliding window—that is, to test only combinations with a fixed number of neighboring markers. However, the size of such windows has to be chosen in advance, which leads to an important loss of flexibility when haplotypes of interest are shorter than or extend over the chosen window size. Furthermore, *cis*-acting effects on the disease of markers that are not neighbors are always missed. Finally, correction is still needed for the number of windows and for the number of markers that are analyzed as single loci.

Here, we describe a method that allows for the analysis of all marker combinations or a set of prespecified marker combinations—for example, all marker combinations with *k* markers. Suppose that, for each marker combination, we have one test the underlying statistic of which depends on the haplotype distribution with respect to these markers. We consider the global null hypothesis—that none of the marker combinations shows association with the disease. It is then natural to try to assess the global significance with the statistic *T*_{max}, which is the maximum of the statistics over all marker combinations. Since the distribution of *T*_{max} is generally unknown, *P* values are obtained through Monte Carlo simulations. The *T*_{max} approach has been used, for instance, by McIntyre et al. (2000) for the single-locus analysis of several markers by use of the transmission/disequilibrium test (TDT) by Spielman et al. (1993). In our situation, however, a maximum statistic will not suffice, since the test statistics for combinations with different numbers of markers are usually not comparable. Therefore, we replace *T*_{max} with *P*^{min}, the smallest raw *P* value found among the combinations. Since the distribution of *P*^{min} is not known, significance has to be assessed with Monte Carlo simulations. This approach was considered, for instance, by Lazzeroni and Lange (1998) and by Jannot et al. (2003). Although the null distribution of the test statistic for a single hypothesis is known in situations described by Lazzeroni and Lange (1998) and by Jannot et al. (2003), in the context of haplotype analysis, even these null distributions are often unknown. For example, a method to perform a haplotype-based TDT-like test for samples of families with a single affected child has been given by Zhao et al. (2000) and has been extended to general nuclear families by Knapp and Becker (2003). A transmission/nontransmission table for haplotypes is constructed, in which the possible transmission patterns of each family are weighted with a relative likelihood based on estimated haplotype frequencies. The distribution of the corresponding test statistic under the null hypothesis is unknown, and the raw *P* value has to be obtained through permutation replicates, in which the transmission/nontransmission status of the haplotypes is randomly permuted. Therefore, a nested simulation strategy seems to be necessary to obtain the distribution of *P*^{min}, since, in each of the permutation replicates, the raw *P* value for each marker combination is needed and has to be obtained through permutation replicates. This will often be computationally unfeasible.

Although the raw *P* value for each test has to be obtained through permutation replicates, it is possible to assess the overall significance for all marker combinations without nested simulations. Ge et al. (2003) have shown that a single set of permutations is sufficient to obtain raw *P* values for each marker combination and to estimate the distribution of *P*^{min}. They applied this idea to a multiple-testing problem that occurs with microarray data. We provide an adaption of their idea, taking into account sample size, running time, and computer-storage requirements that differ between microarray and haplotype data. We check the validity and power of our approach with a simulation study under two disease models. Finally, we demonstrate its applicability by reanalyzing a recently published association study.

## Methods

### Association Testing Using Multiple Tightly Linked Markers

Consider *n* tightly linked markers, and let *A*={1,…,*n*}. Any nonempty subset *B* of *A* is called a “marker combination.” Zhao et al. (2000) proposed a TDT-like association test for the hypothesis *H*^{B}_{0}—that there is no disease association with marker combination *B*—which can be applied to samples consisting of nuclear families with a single affected child. Their approach was subsequently extended to samples of nuclear families with an arbitrary number of children (Knapp and Becker 2003). In brief, this test proceeds with the following steps.

- 1.The maximum-likelihood frequency estimates for each haplotype of marker combination
*B*are computed. - 2.In the case that the observed single-marker genotypes of a family allow for more than one haplotype explanation, the frequency estimates of step 1 are used to calculate weights (which sum to 1 within a family) for each of these possible haplotype explanations.
- 3.With ambiguous families replaced by a set of weighted haplotype explanations, the test statistic of any extension of the TDT for a single-marker locus with >2 alleles can be used to calculate the value
*T*^{B}_{0}for the observed data. Here, we consider the statistic proposed by Spielman and Ewens (1996), which requires calculation of a table of transmitted/nontransmitted haplotypes. Each affected child contributes 2 units to this table. Only families with complete genotype information for all loci of marker combination*B*contribute to this table. - 4.The
*P*value*P*^{B}_{0}is assessed by simulation. In each replicate of this simulation, a sample is constructed in which all marker genotypes of all children are either left unchanged or replaced by the two nontransmitted parental alleles, with equal chance. Since only fully genotyped families are used, this is equivalent to changing the transmission/nontransmission status of each possible haplotype explanation of a family. Let*T*^{B}_{i}denote the value of the test statistic obtained for the*i*th replicate. Then,*P*^{B}_{0}is the fraction of permutation replicates resulting in a test statistic greater than or equal to the test statistic of the real data; that is,where*t*denotes the number of permutation replicates and |{*i*:*T*^{B}_{i}*T*^{B}_{0}}| denotes the number of elements of a set*A*.

### Testing More Than One Hypothesis

Now suppose that not only a single marker combination *B* should be tested for its association with the disease, but that a set of marker combinations should be evaluated. Let be the global null hypothesis—that none of the marker combinations *B* is associated with the disease. Our central goal is to construct a test for this global hypothesis. The preceding section described a method to obtain *P*^{B}_{0} for each hypothesis *H*^{B}_{0}. *P*^{B}_{0} is called “raw *P* value” or “unadjusted *P* value” for hypothesis *H*^{B}_{0}. It seems intuitively reasonable that the testing procedure for the global hypothesis *H*_{0} should depend on ; that is, the global null hypothesis is rejected in the case that the minimum of the raw *P* values is sufficiently small. Thus, it is necessary to determine the distribution of *P*^{min} in the case that *H*_{0} holds true. Again, this can be achieved by simulation. Since, in each replication of this simulation, it is necessary to obtain the *P* value for each *B* (which requires simulations itself), nested simulations seem to be necessary. The obvious drawback of such nested simulations is running time. However, we show here how these nested simulations can be avoided.

The basic idea is to use the same set of permutation replicates to determine the empirical distribution of *P*^{min} that has been used to determine *P*^{B}_{0}. For each hypothesis *B* and for each permutation replicate *i*=1,…, *t*, the uncorrected *P* value of the *i*th permutation replicate is calculated as

(Note that the real data serves as a “permutation replicate” for the calculation of *P*^{B}_{i}.) For *i*>0, let be the minimum of the uncorrected *P* values for all marker combinations in the *i*th permutation replicate. Then, the *P* value for the global hypothesis *H*_{0} is calculated as

Table 1 provides an example that explains the strategy. (Of course, >10 permutation replicates are needed in practice.) For the real data, the smallest raw *P* value is obtained for marker combination *B*={1,2}, so that *P*^{min}=.10. We observe a smaller *P*^{min}_{i} for the permutation replicates *i*{2,3,4}. Thus, the *P* value for the global hypothesis *H*_{0} is estimated to be .30.

The idea to calculate the raw *P* values (*P*^{B}_{0}) and the empirical distribution of *P*^{min} from a single set of permutation replicates is the basis of a method, recently proposed by Ge et al. (2003), for obtaining so-called minP-adjusted *P* values (Westfall and Young 1993). Therefore, the following section summarizes the algorithm of Ge et al. (2003), and the subsequent section describes the details and modifications of our adaption of this basic idea for testing the global null hypothesis.

### Algorithm for Step-Down minP-Adjusted *P* Values, by Ge et al. (2003)

First, it should be noted that the goal of Ge et al. (2003) is slightly different from our goal to construct a test for the global null hypothesis *H*_{0}. Their algorithm aims to obtain so-called minP-adjusted *P* values for each hypothesis *H*^{B}_{0}. If the global null hypothesis *H*_{0} holds true, the probability is α that one or more of these adjusted *P* values is α. Therefore, it is evident that the rule “reject *H*_{0} in the case that the smallest of the adjusted *P* values is α” defines a level-α test for *H*_{0}. Since the smallest adjusted *P* value belongs to the same hypothesis for which the unadjusted *P* value is smallest, for the purpose of testing *H*_{0}, it would be sufficient to calculate only the adjusted *P* value for the hypothesis that possesses the smallest unadjusted *P* value, instead of calculating all adjusted *P* values , *B*. The initial step of the algorithm of Ge et al. (2003, cf. step 0 in box 4 on p. 21) requires obtaining raw *P* values for each hypothesis. However, this requirement is a direct consequence of their goal to obtain for all *B*, and it can be dropped in the case that only the adjusted *P* value for the hypothesis with the smallest unadjusted *P* value has to be calculated.

Second, a minor technical difference between the algorithm of Ge et al. (2003) and our approach is that these authors use

instead of (1) for calculation of uncorrected *P* values of the *i*th permutation replicate. Recently, there has been debate on how to estimate *P* values by Monte Carlo methods. North et al. (2002, 2003) prefer the estimate (*r*+1)/(*n*+1)*,* where *n* is the number of permutation replicates and *r* is the number of these replicates that produce a test statistic greater than or equal to that calculated for the actual data. Broman and Caffo (2003) and Ewens (2003) favor the traditional estimate *r*/*n**.* Now, Ge et al. (2003) employ this traditional estimate in obtaining the raw *P* values *P*^{B}_{0}, whereas their application of equation (3) in obtaining raw *P* values of the *i*th permutation replicate makes use of the proposal by North et al. (2002, 2003). We agree with Broman and Caffo (2003) that neither *r*/*n* nor (*n*+1)/(*r*+1) should be considered incorrect, but we believe that it is reasonable to use the same approach for both purposes—that is, calculation of raw *P* values for the real data and for the permutation replicates. Therefore, we employ equation (1) instead of equation (3). Note that, in the case that all possible permutation replicates are evaluated in obtaining *P*^{B}_{i}, equation (3) is appropriate. Indeed, when all replicates can be conducted, there is one replicate that is identical to the real data. Hence, it is reasonable to compare a replicate to itself, since the real data is compared to itself as well. In contrast to the microarray problem considered by Ge et al. (2003), the set of all permutation replicates is always much too large for our problem of haplotype-association testing.

The main merit of the algorithm by Ge et al. (2003) is that it does not require storage of the whole table *T*^{B}_{i}(*i*=1,…, *t*; *B*) of test statistics per hypothesis *B* and permutation replicate *i*, but only requires storage of the test statistics for a single hypothesis at a time. The algorithm is presented in the context of microarray data, for which the number of hypotheses (i.e., the number of genes for which expression is measured) is usually large compared with the number of individuals considered (typically <20). For haplotype data, the number of marker combinations tested is low for <10 SNPs and can reasonably be limited by consideration of only marker combinations with <4 or 5 markers when more SNPs are typed. Here, a computer memory problem occurs when many permutation replications have to be conducted (e.g., in the case that very small global *P* values have to be estimated), whereas, for microarray data, even the number of all possible permutations of the real data (i.e., permutations of the disease status of the individuals) is small, because the total number of individuals is very small. Therefore, we have implemented an approach to solve the computer memory problem that is more appropriate for haplotype data.

### Algorithm for Testing the Global Null Hypothesis *H*_{0}

The algorithm for testing the global null hypothesis—that none of the marker combinations *B* is associated with the disease—has been implemented in the program FAMHAP (Becker and Knapp 2004), which originally was developed to obtain maximum-likelihood estimates of haplotype frequencies from samples consisting of arbitrary nuclear families. Two options are provided for describing the set of marker combinations *B*: (1) with option ”maxmarker = *k*,” only marker combinations *B* consisting of *k* (*k**n*) loci are considered; (2) with option “window = yes,” only marker combinations of neighboring markers are considered. With a defined , the program proceeds with the following steps.

- 1.Estimate frequencies for the full
*n*-locus haplotypes from the data. For each marker combination*B*, obtain frequencies for each haplotype of*B*by summing all corresponding*n*-locus haplotype frequencies. Calculate and store the table of transmitted/nontransmitted haplotypes (cf. step 3 in association test) for each*B*and for each family with complete marker genotypes at all marker loci of*B*. - 2.For all
*B*, compute the test statistic*T*^{B}_{0}. - 3.For each family, randomly transpose (cf. step 4 in association test) the table of transmitted/nontransmitted haplotypes (cf. step 1) for all
*B*. Recompute and store*T*^{B}_{i}for these modified data. - 4.Repeat step 3
*t*times. - 5.
- 6.

The memory requirement to store *T*^{B}_{i} for each permutation replicate *i*=1,…,*t* and for each marker combination *B* is of the order × *t**.* With 1,000 marker combinations and 100,000 permutation replicates, 10^{8} test statistics have to be stored, and it is obvious that the number of permutation replicates cannot be increased much further. However, a huge number of permutation replicates is needed only for estimating small global *P* values or for performing the test of the global null hypothesis at small type I error rate (North et al. 2002). For this, the program provides the option “alpha = <*value*>,” which enforces the condition that only the <*value*> × *t* permutation replicates with the highest test statistics are stored for each marker combination. This information is sufficient to decide if the *P* value corresponding to the global null hypothesis is *value*. Indeed, *P*^{B}_{i}>α in the case that *T*^{B}_{i} does not belong to the <*value*> × *t* permutation replicates with the highest test statistic for marker combination *B*^{}.It follows that (i.e., the concrete value of *P*^{B}_{i} is irrelevant for calculating *P*^{min}_{i}), or *P*^{min}_{i}>α (i.e., even the concrete value of *P*^{min}_{i} is irrelevant for deciding whether the global *P* value is α). With option “alpha = <*value*>,” the storage requirement is reduced to <*value*> × × *t*. Typically, <*value*> × is <1 in situations in which the number of permutation replicates is too large to allow storage of the whole table of test statistics.

Because of the discreteness of the distribution of *P*^{min}, generally some of the permutation replicates will give *P*^{min}_{i}=*P*^{min}. In accordance with equation (2), all of these permutation replicates have to be counted for determination of the global *P* value for *H*_{0}. As an example, suppose that, in table 1, permutation replicate 2 was the real data and the real data were one of the permutation replicates, with *P*^{min}=.0. Since *P*^{min}_{i}=.0 for permutation replicates 3 and 4 as well, the global *P* value is .2, as calculated by equation (2). Now, it is reasonable to enforce an order for data sets with identical smallest raw *P* value by consideration of the second smallest raw *P* value, which will be denoted *P*^{min2} for the real data and *P*^{min2}_{i} for permutation replicate *i*. Then, the *P* value for the global null hypothesis can be calculated by use of

instead of equation (2). For the example of table 1 (and under the assumption that permutation replicate 2 was the real data), raw *P* values are .0 for two further marker combinations (i.e., *P*^{min2}=.0), whereas the second smallest raw *P* values for permutation replicates 3 and 4 are .1. Calculated by use of equation (4), the global *P* is .0 instead of .2. Of course, it is possible that the second smallest *P* values of two permutation replicates are identical as well and that it would be necessary to consider the third smallest *P* values, and so on. However, the number of such instances (which is printed out by our program) will be small in practice, and our implementation takes into account only the improvement that can be obtained from the consideration of the second smallest raw *P* value.

Note that there is a potential conflict in using equation (4) instead of equation (2) with the option “alpha = <*value*>.” If the second smallest raw *P* value for the real data is > *value*, this value will not have been stored, and the global *P* value has to be determined by use of equation (2). However, if *P*^{min2}*value*, the improved determination of the global *P* value can be applied even in combination with option “alpha = <*value*>.” In addition, the discreteness of the distribution of *P*^{min} is most pronounced in the case that the number of permutation replicates is relatively small, whereas, for a large number of permutation replicates, the difference between equation (2) and equation (4) will be less important. On the other hand, the option “alpha = <*value*>” is not required if the number of permutation replicates is small.

### Simulations

We performed a simulation study for samples of 233 trios and samples of 175 nuclear families with two affected children. We tried to model the situation of many markers within a small region with just five SNP markers. We simulated three core SNPs, which can be viewed as tagging SNPs of a haplotype block, and one SNP on both sides of the block, each of which was only in moderate linkage disequilibrium (LD) with the core block. These SNPs model the borders of the analyzed region. The core region of the three SNPs comprised five haplotypes. We simulated a diallelic disease locus under a dominant and a recessive model, with a prespecified relative risk of 2 and an attributable fraction of 0.20. We assumed a recombination fraction of zero between the disease locus and the core block. The first of the five core haplotypes had the same frequency as the disease allele and was assumed to be in in complete LD with the disease allele. The other haplotypes at the core block were chosen from a uniform distribution. We additionally simulated a neighboring SNP on the left and on the right of the core block, depending on the core haplotype. The extent of the LD between each core haplotype and the neighboring markers can be found in table 2. Under all scenarios, we used 3,000 simulated data sets and computed the global *P* value with 4,000 permutation replicates for each simulated data set. We also checked the size of our test by simulations under the null hypothesis. For this purpose, we enforced linkage equilibrium between the disease locus and the markers. Empirical significance levels and power were computed as the portion of simulated data sets for which the global *P* value or the Bonferroni-corrected smallest raw *P* value, respectively, was .05.

## Results

We have integrated our method into our program FAMHAP, which was developed for haplotype frequency estimation in nuclear families. In particular, haplotype frequencies and lists of haplotype explanations that are necessary for the computation of the underlying test statistic can be computed internally. Thus, time-consuming communication between different software packages can be avoided. The simulation study was performed on a Pentium III PC with 512 megabytes of main memory. The evaluation of a single data set with 4,000 permutation replicates took 10 s, on average, to obtain the raw *P* values for all 31 marker combinations and the corresponding global *P* value. A total of 3,000 simulated data sets had to be evaluated for each of 56 different situations, since we conducted simulations under the null hypothesis and the alternative, under two disease models, for two different data structures and for seven different configurations of “maxmarker” and “window.” In total, our simulation study ran for ~56×3,000×10*s*, that is, ~19.5 d. With nested simulations, 4,000×4,000 permutation replicates would have been necessary to obtain the global *P* for a single simulated data set. Thus, our simulation study with nested simulations would have taken 4,000×19.5 d, which is ~214 years and would not have been feasible.

In table 3, the empirical significance levels for the different data structures and disease models are shown. (The disease model is relevant only for the haplotype frequencies.) In addition, we varied the number of marker combinations tested, by allowing only marker combinations with *k* = 1, 2, 3, and 5 markers (“maxmarker = *k*”), or by allowing only combinations of neighboring markers (“window = yes”). Under all models and marker combinations, the true type I error rate for the test that rejects *H*_{0} in the case that the smallest *P* value is .05 is markedly above .05. As expected, this effect becomes stronger as the number of tested combinations increases. These high true type I error rates show that even with a considerable amount of LD, multiple testing is still an important problem. On the other hand, Bonferroni correction leads to true type I error rates that are too small, because it ignores the LD of the region. Indeed, when many marker combinations are tested, the true level for the Bonferroni correction is ~0.02 rather than 0.05. With our global *P* values, however, under none of the scenarios did we observe a significant deviation from the nominal level α=0.05. Thus, our method is a valid testing procedure that adequately accounts both for the number of tests and for the LD of a region. The nominal level is also maintained with samples consisting of nuclear families with two children, which confirms that we obtain a valid test for association from this family structure. Table 4 shows the power of testing for the uncorrected *P* values of the marker combinations of the core region, whereas table 5 shows the results of the power studies with Bonferroni correction and our global *P* values. In accordance with the simulation setting, the power of the uncorrected tests in table 4 is smallest for the single markers, increases when 2-marker haplotypes are considered, and is best for the 3-marker combination, irrespective of the data structure or disease model. Table 5 shows that the power improvement of the haplotype analysis is maintained even after the multiple testing is taken into account. In general, the power under our global *P* values is considerably higher than under Bonferroni-corrected *P* values and lies about halfway between the power obtained by Bonferroni-corrected and uncorrected *P* values of the best combination, which represents the upper limit of what can be reached (for instance, if one knew the best combination in advance from an independent source). Both for Bonferroni-corrected and for global *P* values, the power increases with the number of considered markers and reaches its optimum when only combinations of 3 neighboring markers are considered (“maxmarker = 3”; “window = yes”). This is consistent with our simulation setup, which modeled a 3-marker disease haplotype of neighboring markers. However, whereas the power of the Bonferroni correction drops when too many combinations are tested (“maxmarker = 5”; “window = no”), tests of too many combinations have only a slight impact on the power when our global *P* values are computed. This shows that our method is able to capture the dependence of the tests and that it is not negatively affected by the consideration of effects of a high order or *cis*-acting effects if there are no such effects. Besides the comparison of the Bonferroni-corrected and our global *P* values, we consistently observe a higher power for the nuclear families with two affected children than for the trio sample, although the sample size for both data structures is equal with respect to genotyping effort. The better power of data from nuclear families with two affected children may partially be explained by the fact that families with multiple affected siblings are genetically more loaded and therefore lead to an increased power (Risch and Teng 1998; Fingerlin et al. 2004).

## Application to a Real Data Set

After an initial finding by Straub et al. (2002), Schwab et al. (2003) have recently reported supportive evidence for association of schizophrenia with multilocus haplotypes in the 6p22.3 gene, *dysbindin*. They typed markers rs3213207 (SNP 1), rs1011313 (SNP 2), rs2619528 (SNP 3), rs760761 (SNP 4), rs2619522 (SNP 5), and rs1018381 (SNP 6) in a sib-pair sample (78 families) and in an independently ascertained trio sample (125 families). Schwab et al. (2003) reported the results of association analyses performed with the methods of Zhao et al. (2000) and Knapp and Becker (2003) for all marker combinations of 5 markers and for the sib-pair sample, the trio sample, and the combined sample. The greatest evidence for association was found in the combined sample (without six Israeli families) for the combination of SNPs 2 and 4. However, with the software that was used at that time, it was not possible to conduct >200,000 permutation replicates. None of these replicates resulted in a test statistic greater than or equal to the statistic of the real data for this marker combination. Schwab et al. (2003) concluded that the raw *P*^{{2,4}}_{0} was almost certainly smaller than 2×10^{-5} for this 2-marker combination and applied a Bonferroni correction. With our new implementation, we were able to conduct 10^{8} permutation replicates, and we obtained a raw *P* value of *P*^{{2,4}}_{0}=2×10^{-6} for this marker combination. For very small *P,* the coefficient of variation (i.e., SD divided by the mean) of a raw *P* value estimated by *t* permutation replicates is . Therefore, *t*=10^{8} permutation replicates are adequate to ensure that the coefficient of variation of the raw *P* value is ~10% in the case that the true raw *P* value for a specific marker combination is ~10^{−6}. Our program (with option “alpha = 0.001”) required 52 h (with a Pentium III PC) to obtain the results summarized in table 6.

*P*Values Based on 10

^{8}Permutation Replicates for the

*Dysbindin*Combined-Family Sample (without Israeli Families)

Note that the full table of test statistics for all marker combinations and all permutation replicates consists of 63×10^{8} entries, which would require 24 gigabytes of memory. The algorithm of Ge et al. (2003) reduces this requirement to 380 megabytes by storing only test statistics for the permutation replicates of a single marker combination at a time. The disadvantage of their approach for this example is related to running time. Ge et al. (2003) discussed two ways to guarantee that the same ordered set of permutation replicates is used for each marker combination. The first way is to reset the random number generator at the same fixed value for each hypothesis. Since the *dysbindin* combined sample consists of 203 families and each permutation replicate requires permutation of transmitted/nontransmitted haplotypes in each family, 203×10^{8} calls of the random number generator are necessary for generating 10^{8} permutation replicates. The time required for these calls is ~1.5 h with our hardware. In the case that these calls have to be repeated for each marker combination, the additional running time is 62×1.5 *h*=93 *h*. The second way discussed by Ge et al. (2003) is to recode and store each permutation. However, 203 bits (~26 bytes) of storage are required to store a single permutation, resulting in a requirement of 2.4 gigabytes to store all 10^{8} permutations. In addition, we expect that the coding and recoding of permutations would substantially increase the running time.

The results presented in table 6 for the *dysbindin* data suggest an advantage of haplotype analysis as compared with single-marker analysis, even after correction for the increased number of tests, since the global *P* value for “maxmarker = 1” is higher by a factor >6 than the global *P* value obtained by consideration of all marker combinations (i.e., “maxmarker = 6”). Also, in accordance with the results of our simulation study, it can be seen from table 6 that the global *P* value does not increase much when too many marker combinations are tested. Since the smallest raw *P* value corresponds to *B*={2,4}, the option “maxmarker = 2” results in the smallest global *P* value. However, consideration of all marker combinations increases the global *P* value by a factor of only <1.5, although the number of single hypotheses increases by a factor of 3. Comparison of the *P* value obtained by Bonferroni correction with the global *P* value obtained by our method reveals that the ratio of these values increases from ~1.5 (for the set of single marker combinations) up to 4 (for the set of all marker combinations).

## Discussion

We implemented a method that adequately accounts for the multiple-testing problem that occurs in the context of haplotype analysis. With the help of Ge et al.'s (2003) idea to reduce a nested permutation algorithm to a single permutation algorithm, computer running time is reduced drastically, and a power study becomes feasible. Our implementation is optimized for the application to haplotype analysis. Computer storage can be reduced by consideration, for each marker combination, of only the replicates with a raw *P* value α. With this feature, the analysis of the *dysbindin* data could be performed with the necessary number of permutation replicates. Furthermore, we obtained a considerable gain in speed. Since it was not necessary to proceed with one hypothesis at a time, we did not have to do repeated callsof the random number generator for each marker combination.

In the simulation study, we have shown that our method adequately accounts both for the LD of a set of tightly linked markers and for the induced multiple-testing problem. In contrast to Bonferroni correction of the smallest *P* value, our method avoids being overly conservative. Consistently, power increases substantially when global *P* values are considered instead of Bonferroni-corrected *P* values. Our method allows for the simultaneous consideration of different marker combinations and comprises, in particular, the analysis of single markers. In this way, our method can be a step toward a unified strategy to judge the significance of the association of a phenotype with a genomic region as a whole. The need for this is highlighted by the strongly increased true significance levels for uncorrected *P* values shown in table 3.

However, some flexibility remains with respect to the set of marker combinations that will be considered. With few markers, it is possible to consider all combinations, and our simulations show that, in general, not much power is lost, even when the marker combination with the strongest association does not consist of all markers. On the other hand, we can hope to detect high-order interactions or *cis*-acting effects if they are present. For a densely spaced marker system of 20–50 markers, it is computationally impossible to consider all combinations, even with our approach. In such a situation, it is useful to consider only marker combinations with 3 markers but also to allow combinations of markers that are not neighbors. In this way, the number of tests is restricted to a level that is computationally feasible, but all single-, two-, and three-locus haplotype effects can be detected, including *cis*-acting effects and haplotype effects that are blurred by markers that arose in the region after the mutation event(s).

Our simulations for family data show that haplotype analysis can be more powerful than single-locus analysis when the background LD is taken into account adequately, as by our method, and also after correction for the multiple testing. Studies for case-control data under a coalescent model with a simple Bonferroni correction did not detect a gain in terms of power for haplotype analysis (Kaplan and Morris 2001).

Of course, our simulation study covers only a limited range of possible disease scenarios, but we do not think that it favors haplotype analysis inadequately. On the contrary, we modeled only a single disease haplotype, but the benefit of haplotype analysis versus single-locus analysis is stronger with multiple disease alleles (Morris and Kaplan 2002). Thus, on average, the power gain is likely to be even stronger than the gain found in our study. In addition, we modeled markers that were in moderate LD, and we would expect our method to perform even better under strong LD, both in comparison with single-marker analysis and in comparison with Bonferroni-corrected *P* values.

Besides these issues, our simulation study suggests that haplotype analysis is particularly favorable for nuclear families with multiple affected siblings. At least as far as single-marker analysis is concerned, case-control data has higher power than TDT analysis of trio data, under many disease scenarios, since the latter pays a price for being robust against population stratification. However, we observed that nuclear families with two affected children were much more powerful than trios. Hence, a comparison of nuclear families with multiple affected siblings and case-control data, on the basis of an equal genotyping effort, would be highly interesting. In general, nuclear families allow for very precise haplotype reconstruction (Becker and Knapp 2002; Schaid 2002), such that the haplotypic information can be fully exploited. Furthermore, as mentioned above, families with multiple affected siblings are genetically more loaded and therefore lead to an increased power (Risch and Teng 1998; Fingerlin et al. 2004). Thus, the comparison can be refined to a comparison with case-control data in which the cases are index cases from affected siblings. We are currently working on an implementation for case-control data, which will yield the possibility to compare the performance of case-control and nuclear-family data. (Note that, for case-control data also, the underlying distribution is often unknown, since asymptotic theory is difficult to apply when the number of different haplotypes is high.) Besides that, the idea of Ge et al. (2003) is quite general and can be applied to obtain global *P* values for different kinds of test statistics, including tests for specific genetic models and models of interaction. For family data, the principle has been implemented in the latest update of our program FAMHAP (Becker and Knapp 2004).

## Acknowledgments

We are very grateful to Professors Sibylle G. Schwab and Dieter B. Wildenauer for providing the *dysbindin* family data. Our work was supported by grant Kn 378/1 (Project D1 of FOR 423) from the Deutsche Forschungsgemeinschaft.

## Electronic-Database Information

The URL for data presented herein is as follows:

## References

*P*values: response to North et al. Am J Hum Genet 72:496 [PMC free article] [PubMed]

*P*values by Monte Carlo methods. Am J Hum Genet 72:496–498 [PMC free article] [PubMed]

*P*values from Monte Carlo procedures. Am J Hum Genet 71:439–441 [PMC free article] [PubMed]

*P*values from Monte Carlo procedures. Am J Hum Genet 72:498–499 [PMC free article] [PubMed]

*DTNBP1*, the human ortholog of the mouse dysbindin gene, is associated with schizophrenia. Am J Hum Genet 71:337–348 [PMC free article] [PubMed]

*P*-value adjustment. John Wiley and Sons, New York

**American Society of Human Genetics**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (131K)

- Multiple testing in the context of haplotype analysis revisited: application to case-control data.[Ann Hum Genet. 2005]
*Becker T, Cichon S, Jönson E, Knapp M.**Ann Hum Genet. 2005 Nov; 69(Pt 6):747-56.* - Haplotype interaction analysis of unlinked regions.[Genet Epidemiol. 2005]
*Becker T, Schumacher J, Cichon S, Baur MP, Knapp M.**Genet Epidemiol. 2005 Dec; 29(4):313-22.* - Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data.[Am J Hum Genet. 2004]
*Morris AP, Whittaker JC, Balding DJ.**Am J Hum Genet. 2004 May; 74(5):945-53. Epub 2004 Apr 7.* - The role of haplotypes in candidate gene studies.[Genet Epidemiol. 2004]
*Clark AG.**Genet Epidemiol. 2004 Dec; 27(4):321-33.* - Analysis of multilocus models of association.[Genet Epidemiol. 2003]
*Devlin B, Roeder K, Wasserman L.**Genet Epidemiol. 2003 Jul; 25(1):36-47.*

- The power comparison of the haplotype-based collapsing tests and the variant-based collapsing tests for detecting rare variants in pedigrees[BMC Genomics. ]
*Guo W, Shugart YY.**BMC Genomics. 15(1)632* - The CF-modifying gene EHF promotes p.Phe508del-CFTR residual function by altering protein glycosylation and trafficking in epithelial cells[European Journal of Human Genetics. 2014]
*Stanke F, van Barneveld A, Hedtfeld S, Wölfl S, Becker T, Tümmler B.**European Journal of Human Genetics. 2014 May; 22(5)660-666* - CLCA4 variants determine the manifestation of the cystic fibrosis basic defect in the intestine[European Journal of Human Genetics. 2013]
*Kolbe EW, Tamm S, Hedtfeld S, Becker T, Tümmler B, Stanke F.**European Journal of Human Genetics. 2013 Jun; 21(6)691-694* - Association of genetic variation in the NR1H4 gene, encoding the nuclear bile acid receptor FXR, with inflammatory bowel disease[BMC Research Notes. ]
*Attinkara R, Mwinyi J, Truninger K, Regula J, Gaj P, Rogler G, Kullak-Ublick GA, Eloranta JJ.**BMC Research Notes. 5461* - Association of single nucleotide polymorphisms in the genes ATM, GSTP1, SOD2, TGFB1, XPD and XRCC1 with risk of severe erythema after breast conserving radiotherapy[Radiation Oncology (London, England). ]
*Raabe A, Derda K, Reuther S, Szymczak S, Borgmann K, Hoeller U, Ziegler A, Petersen C, Dikomey E.**Radiation Oncology (London, England). 765*

- Cited in BooksCited in BooksPubMed Central articles cited in books
- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- SNPSNPPMC to SNP links

- A Powerful Strategy to Account for Multiple Testing in the Context of Haplotype ...A Powerful Strategy to Account for Multiple Testing in the Context of Haplotype AnalysisAmerican Journal of Human Genetics. Oct 2004; 75(4)561PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...