- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC3117224

# Correction for Multiplicity in Genetic Association Studies of Triads: The Permutational TDT

^{1}Biostatistics and Bioinformatics Branch of the Division of Epidemiology, Statistics, and Prevention Research of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH/DHHS, Bld 6100, Bethesda, MD 20892, USA

^{2}Epidemiology Branch of the Division of Epidemiology, Statistics, and Prevention Research of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH/DHHS, Bld 6100, Bethesda, MD 20892, USA

^{*}Corresponding author: James F. Troendle, Office of Biostatistics Research of the National Heart, Lung, and Blood Institute, NIH/DHHS, Bld Rockledge II, Room 9195, Bethesda, MD 20892, USA. Tel: 301-435-0421; Fax: 301-480-1862; Email: vog.hin@t3tj

## Summary

New technology for large-scale genotyping has created new challenges for statistical analysis. Correcting for multiple comparison without discarding true positive results and extending methods to triad studies are among the important problems facing statisticians. We present a one-sample permutation test for testing transmission disequilibrium hypotheses in triad studies, and show how this test can be used for multiple single nucleotide polymorphism (SNP) testing. The resulting multiple comparison procedure is shown in the case of the transmission disequilibrium test to control the familywise error. Furthermore, this procedure can handle multiple possible modes of risk inheritance per SNP. The resulting permutational procedure is shown through simulation of SNP data to be more powerful than the Bonferroni procedure when the SNPs are in linkage disequilibrium. Moreover, permutations implicitly avoid any multiple comparison correction penalties when the SNP has a rare allele. The method is illustrated by analyzing a large candidate gene study of neural tube defects and an independent study of oral clefts, where the smallest adjusted p-values using the permutation procedure are approximately half those of the Bonferroni procedure. We conclude that permutation tests are more powerful for identifying disease-associated SNPs in candidate gene studies and are useful for analysis of triad studies.

**Keywords:**Exchangeable, familywise error rate, linkage disequilibrium, power

## Introduction

Advances in technology have led to an increase in large genetic association studies of disease. Along with the ability to look at large numbers of single nucleotide polymorphisms (SNPs) has come the need for improved methods of statistical correction for multiplicity. The Bonferroni procedure is the simplest and most often used method of correction. However, the Bonferroni procedure is well known to be overly conservative in the presence of correlation (Han et al., 2009).

Permutation tests implicitly account for correlation through the use of the data vectors, thereby improving power over Bonferroni-type methods. Permutation tests are also the only multiple comparison procedures capable of exact error control for small or moderate sample sizes. Multiple comparison procedures based on permutations have several other attractive features. One is that they implicitly reduce the penalty for comparisons when the events are rare (Westfall & Troendle, 2008), such as when a SNP is too uncommon to produce a small enough p-value to affect the permutational correction for multiplicity. The permutational procedure will essentially consider that SNP not tested, effectively reducing the correction factor. This is an important advantage when some of the SNPs under study have rare alleles. Another advantage of permutation tests is their ability to handle multiple tests of the same hypothesis easily. It is quite common in genetic association studies for several different modes of inheritance to be considered, typically dominant, recessive, and multiplicative. Each of these modes of inheritance leads to different tests of the null hypothesis. Permutational procedures need only consider the minimum p-value across all tests and SNPs to produce adjusted p-values that account for both the multiple SNPs and multiple tests applied to each SNP.

Another statistical challenge is providing improved methods for family-based studies. Some diseases, like birth defects, are well suited to collection of genetic information on triads (case child, mother, father). The use of triads avoids two problems: (1) ascertainment bias inherent in control selection (Schlesselman, 1982) and (2) population stratification where the case groups may contain different proportions of an ethnic group than the control group. Both of these lead to excess type I errors in tests of association using case–control designs (Lee & Wang, 2008). Triads allow methodology conditioned on the parental genotypes that is robust to population stratification. A very common genetic association test for a single bi-allelic locus (e.g., SNP), based only on triad data, is the transmission disequilibrium test (TDT) (Spielman et al., 1993). This test can be obtained as a likelihood ratio test for the child's genotype in a multiplicative model, conditioned on the parental genotype.

In this report, we describe a one-sample permutational approach for the inheritance-association hypothesis, and show how it can be used when correcting for multiplicity. The resulting multiple comparison procedure permits testing multiple SNPs and multiple tests of each SNP to allow for different inheritance models. We show that the method strongly controls the familywise error rate (FWE), regardless of sample size. Simulations show the permutational procedure to have more power than the Bonferroni procedure under varying inheritance modes, risk allele proportions, and SNP correlations. We analyze a study of candidate genes in neural tube defect (NTD [MIM #182940]) triads as well as a study of oral cleft (OFC1 [MIM #119530]) triads in Ireland, showing that the smallest adjusted p-values from the permutational procedure can be approximately half that of the Bonferroni procedure.

## Materials and Methods

### One-Sample Permutation Test for Pre-Test/Post-Test Design

We start with the classical one-sample permutation test that arises from a pre-test post-test design. In this design a single sample of subjects is observed before and after some intervention. Let (*Z*_{i}, *W*_{i}) be the measured variable on subject *i* (pre-intervention, post-intervention), *i* = 1*,...,n*. A test to see if the intervention led to different values of the measured variable is based on the values of the differences, *D _{i}* =

*W*−

_{i}*Z*. A permutation test of the hypothesis that the distributions of pre-intervention and post-intervention values are identical is motivated by the realization that under the null hypothesis the random variable (

_{i}*Z*,

_{i}*W*), conditioned on the two values {

_{i}*z*and

_{i}*w*}, takes value (

_{i}*z*,

_{i}*w*) with probability 1/2 and value (

_{i}*w*,

_{i}*z*) with } probability 1/2. In other words to get the distribution of

_{i}*D*under the null hypothesis, the labels that tell you the value

_{i}*z*represents the pre-intervention value and

_{i}*w*represents the post-intervention value can be permuted. A full permutation test considers each of the possible datasets (

_{i}*Z*

_{1},

*W*

_{1}), … (

*Z*,

_{n}*W*) that could have been obtained given the observed

_{n}*z*and

*w*values. For each such dataset, a test statistic,

*TS*is computed and compared to the observed test statistic,

*TS*

^{obs}. Let

*TS*

^{r}be the test statistic computed from the

*r*th permuted dataset,

*r*= 1, …,

*M*. A p-value is then obtained as the proportion of the

*M*permuted datasets for which

*TS*

^{r}≥

*TS*

^{obs}.

### One-Sample Permutations for Triads

Consider the case of testing for genetic association using genotype data on triads. Designate the allele of interest *A*, and let the other allele be denoted *G* (it is not important that the alleles actually be A and G). The data from *n* triads regarding the transmission of these alleles is shown in Table 1. The TDT is based on whether or not the designated allele is transmitted by each heterozygous parent to the case child (Spielman et al., 1993). The null hypothesis is that there is no association between disease and transmission of the designated allele. The standard *χ*^{2} test for this hypothesis has test statistic *TS* = (*b − c* )^{2}/(*b* + *c*), which under the null hypothesis has a ${\chi}_{\left(1\right)}^{2}$ distribution.

The data for the *i*th triad can be represented as (*Z _{i}*,

*W*), where ${Z}_{i}^{t}=\left({z}_{i1}\phantom{\rule{thinmathspace}{0ex}}{z}_{i2}\right)$,

_{i}*z*

_{i1}is an indicator that the first heterozygous parent transmits an

*A*allele to the case child (if there is no heterozygous parent then

*z*

_{i1}= 0),

*z*

_{i2}is an indicator that the second heterozygous parent transmits an

*A*allele to the case child (if there is no second heterozygous parent then

*z*

_{i2}= 0), and the superscript

*t*stands for matrix transpose. The vector

*W*is defined analogously to

_{i}*Z*, but indicating transmission of

_{i}*G*alleles. The null hypothesis is that transmission of the designated allele has no effect on the risk of being a case. This implies that the alleles are exchangeable, or that conditioned on the two values {

*z*and

_{i}*w*},(

_{i}*Z*,

_{i}*W*) takes value (

_{i}*z*,

_{i}*w*) with probability 1/2 and value (

_{i}*w*,

_{i}*z*) with probability 1/2. In other words to get the distribution of

_{i}*TS*under the null hypothesis,

*z*and

_{i}*w*can be permuted. Equivalently, an exact version of the TDT can be obtained by using the Binomial distribution with

_{i}*p*= 0.5 as null distribution for the number of transmitted

*A*alleles from heterozygous parents.

A full permutation test of the null hypothesis considers each of the possible permuted datasets (*Z*_{1}, *W*_{1}), … (*Z _{n}*,

*W*), where

_{n}*n*is the number of triads. For each such dataset, the test statistic,

*TS*is computed and compared to the observed test statistic,

*TS*

^{obs}. Let

*TS*

^{r}be the test statistic computed from the

*r*th permuted dataset,

*r*= 1, …,

*M*. A p-value is then obtained as the proportion of the

*M*permuted datasets for which

*TS*

^{r}≥

*TS*

^{obs}. A level

*α*test rule would then be to reject the null hypothesis if the p-value was ≤

*α*. A full permutation test using this rule has type I error ≤

*α*, regardless of the sample size (Pesarin, 2001). This is in contrast to the ordinary TDT using the ${\chi}_{\left(1\right)}^{2}$ null distribution, which only controls the type I error asymptotically. This is the primary advantage of permutation tests in general, although not necessarily in this case. In our experience the ${\chi}_{\left(1\right)}^{2}$ TDT has reasonable control of the type I error for moderate or larger sampled studies, and so using a permutational (or exact) TDT is not a significant improvement. However, there are other advantages of the permutational version that will be described in the following sections.

In most applications, a full permutation test is not computationally feasible. For example, there are 2^{n} different permutation datasets for a study with *n* triads. In these cases a random sample of permuted datasets are selected. For a random permutation test, let *TS ^{r}* be the test statistic computed from the

*r*th randomly permuted dataset,

*r*= 1, …,

*M*. A p-value is then obtained as the proportion of the

*M*+1 permuted datasets for which

*TS*≥

^{r}*TS*

^{obs},

*r*= 0, 1, …,

*M*, where

*TS*

^{0}=

*TS*

^{obs}. The p-value from a random permutation test is an approximation to the p-value from a full permutation test, where we can control the likely error of the approximation by adjusting

*M*, the number of random permutations used.

### Multivariate Permutations for Triads

Suppose now that there are data from *n* triads on *k* SNPs. One might want to test the null hypothesis that there is no genetic association of disease with any of the *k* SNPs. This leads quite naturally to a multivariate permutation test. Suppose for each SNP we have a designated allele, which we denote *A*, and the other allele is denoted *G*. The data for the *i*th triad on the *j*th SNP can now be represented as (*Z _{ij}*,

*W*), where ${Z}_{\mathit{ij}}^{t}=\left({z}_{\mathit{ij}1}\phantom{\rule{thinmathspace}{0ex}}{z}_{\mathit{ij}2}\right)$,

_{ij}*z*

_{ij1}is the indicator of

*A*allele transmission from the first heterozygous parent to the case child (if there is no heterozygous parent then

*z*= 0),

_{ij1}*z*is the indicator of

_{ij2}*A*allele transmission from the second heterozygous parent to the case child (if there is no second heterozygous parent then

*z*= 0). The vector

_{ij2}*W*is defined analogously to

_{ij}*Z*, but indicating

_{ij}*G*allele transmission. Let

and

The null hypothesis is that transmission of the designated allele on any SNP has no effect on the risk of being a case. Many test statistics could be chosen to test this composite or overall null hypothesis. However, a simple and very effective choice is the maximum of the TDT test statistics from each SNP individually. Therefore, denote *TS* = max{*TS _{j}* :

*j*= 1, …,

*k*}, where

*TS*

_{j}is the TDT test statistic on SNP

*j*. The null hypothesis implies that the alleles are exchangeable, or that conditioned on the two values {

*z*and

_{i}*w*},(

_{i}*Z*,

_{i}*W*

_{i}) takes value (

*z*,

_{i}*w*) with probability 1/2 and value (

_{i}*w*,

_{i}*z*) with probability 1/2. In other words to get the distribution of

_{i}*TS*under the null hypothesis, the labels that tell you the value

*z*represents the

_{i}*A*allele transmission indicators and

*w*represents the

_{i}*G*allele transmission indicators can be permuted. Note that the names A and G are arbitrary so that the method does not depend on the alleles actually being A and G for each SNP, or that the same two alleles are being noticed by different SNPs. Multivariate permutation tests work exactly like univariate permutation tests except that one is permuting a vector (in the case of the TDT even in the single SNP case we were permuting a vector of two indicators, in the multi-SNP case we are permuting a vector of two indicators for each of

*k*SNPs).

The multi-SNP case illustrates one of the advantages of permutations. Permutations estimate the exact conditional distribution of the *TS* under the null hypothesis. In contrast, a parametric approach leaves one trying to estimate the distribution of the maximum of *k* correlated ${\chi}_{\left(1\right)}^{2}$ variables. This is not a problem that can be solved without resorting to asymptotics or approximations. Neither asymptotics nor approximations work well as *k* increases.

#### An Example

Suppose there are *n* triads on two SNPs. Here we will show in detail what the permutations might look like and how they are obtained. Figure 1 shows the first three triads from a hypothetical dataset. According to the notation given in the previous section, the data vectors for the first three triads are

and

The first two rows of the *Z* and *W* vectors correspond to the information from the parents about SNP1 transmission, whereas the later two rows correspond to SNP2 transmission. Notice that rows of the *Z* and *W* vectors for which both the *Z* and *W* component is 0 correspond to nonheterozygous parents. Permutations are made to the corresponding *Z* and *W* pairs. Thus, if we let ${Z}_{1}^{\ast}$ be the permuted *Z*_{1} vector, ${Z}_{1}^{\ast}$ will either be $\left(\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 1\hfill \\ \hfill 0\hfill \end{array}\right)$ or $\left(\begin{array}{c}\hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \\ \hfill 0\hfill \end{array}\right)$ with equal probability. After each *Z* and *W* pair are independently permuted, one has a new dataset ${Z}_{1}^{\ast},{W}_{1}^{\ast},\dots ,{Z}_{n}^{\ast},{W}_{n}^{\ast}$, which is then used to obtain *TS*^{1}. The test statistic for the TDT, given previously and expressed in terms of the *Z* and *W* vectorsis (*b* − *c* )^{2}*/*(*b* + *c*), where *b* is the sum of the *Z* components (over all of the triads) and *c* is the sum of the *W* components. This process is repeated *M* times to obtain *TS ^{r}* for

*r*= 1, …,

*M*(Figure 2).

### Multiple SNPs

Consider again the situation described in the previous section where we have data from *n* triads on *k* SNPs. Let *p _{j}* be the exact binomial TDT p-value for SNP

*j*,

*j*= 1, …,

*k*. Suppose now that we want to test each of the null hypotheses that the

*j*th SNP has no genetic association with disease for

*j*= 1, …,

*k*. We wish to control the FWE. First we will get adjusted p-values for each hypothesis. A multiple comparison procedure test rule would then be to reject a null hypothesis if the adjusted p-value was ≤

*α*. The simplest and most commonly used correction for multiple comparison is the Bonferroni procedure. In this case the adjusted p-value from the Bonferroni procedure on the

*j*th SNP is ${p}_{j}^{B}=k\cdot {p}_{j}$.

*p*if the result is less than 1 and equals 1 otherwise. The Bonferroni procedure controls the FWE strongly.

_{j}Sequential versions of the Bonferroni procedure like the Holm procedure (Holm, 1976), provide improvements. However, in large genetic association studies there are typically few SNPs that survive correction and the improvement over the Bonferroni is small unless the fraction of rejected null hypotheses is substantial. In this paper, we only consider single step procedures that treat all hypotheses without regard to rejection of any other hypotheses.

A permutational procedure is easily applied to this problem by slightly modifying the multivariate permutation version of the TDT described in the previous section. Let *TS ^{r}* = max{

*TS*:

_{j}*j*= 1, …,

*k*} be the max test statistic computed from the

*r*th randomly permuted dataset,

*r*= 0, …,

*M*(

*r*= 0 is the observed). this case one obtains adjusted p-values for the

*j*

*th SNP hypothesis as*${p}_{j}^{P}$ = the proportion of the

*M*+1 permuted datasets for which

*TS*≥

^{r}*TS*,

_{j}*r*= 0, 1, …,

*M*. For this procedure to control the FWE strongly, we need a joint distributional condition (JDC)(Westfall & Troendle, 2008).

#### Joint Distributional Condition

Let *H _{j}* be the null hypothesis for SNP

*j*. Suppose

*H*

_{r1}, …,

*H*are true null hypotheses for

_{rt}*r*

_{1}, …,

*r*{1, 2, …,

_{t}*k*}. Then under

*H*

_{r1}…

*H*the joint distribution of {

_{rt}*TS*

_{r1}, …,

*TS*is obtained by the multivariate permutational distribution.

_{rt}The JDC here says essentially that if the null hypothesis holds for individual SNPs, then it holds in a multivariate sense for those same SNPs taken as a collection. The advantage of the JDC condition is that with the condition, multivariate permutation gives the exact joint distribution of test statistics under the null hypothesis. Without the JDC, the multivariate distribution of test statistics would be unknown even though each one would have a known marginal distribution.

When the JDC holds, it is easy to see that the procedure controls the FWE regardless of the true hypotheses and regardless of sample size. For the genetic association tests of multiple SNPs we are considering, we show now that the JDC does hold. The reasoning is that because each *H _{j}*,

*j*{

*r*

_{1}, …,

*r*} is true we have that the marginal distribution of each

_{t}*TS*,

_{j}*j*{

*r*

_{1}, …,

*r*} is given by the permutational distribution. For ease of notation}

_{t}*Z*will now represent the subvector consisting of only those components corresponding to SNPs with indicies

_{i}*r*

_{1}, …,

*r*(

_{t}*W*is analogously defined). The only way that the joint distribution of the test statistics would then differ from the multivariate permutational distribution is if the correlation of the subvectors

_{i}*Z*and

_{i}*W*were different. However,

_{i}*Z*consists of Bernoulli components with probability of success 0.5 (along with some components that are constants equal to 0 when there is no corresponding heterozygous parent for that particular SNP; these components are ignored because they have no correlation with any other component). Moreover,

_{i}*W*= 1 −

_{i}*Z*. One can then see that

_{i}*cov*(

*Z*) =

_{i}*cov*(

*W*), and thus

_{i}*Z*and

_{i}*W*are exchangeable. This implies that the multivariate permutation distribution is the joint distribution of the test statistics, and so the JDC holds.

_{i}The JDC does not always hold. In fact, in the usual two-sample case of case–control comparisons on multiple SNPs, it would not be expected to hold in general. In that case, correction for multiplicity based on multivariate permutation of the cases and controls does not strongly control the familywise error without assuming the JDC. If for any reason the covariance structure of the SNPs was different for cases than controls, the JDC would not hold. One way in which such a differential correlation might arise would be in a particular type of interaction between SNPs. However, in the case of interaction it might be seen as a benefit rather than a drawback that the method might lead to rejection of the null hypothesis for certain SNPs that are part of an interaction, although this would technically be a familywise error for the multiple comparison procedure.

### Multiple Tests per Hypothesis

Often in genetic association studies one would like to use several tests of the same null hypothesis. Typically, dominant, recessive, and multiplicative inheritance models for a given SNP are assumed, leading to different tests of the no association null hypothesis. Regardless of what inheritance models one might decide to use in testing the hypothesis of no association for the *j*th SNP with disease, the only necessary modification of the permutational procedure is that the max in the definition of *TS ^{r}* extends also over the different tests applied to the

*j*th SNP. Then the procedure will adjust for all tests on all SNPs, where one can reject any hypothesis for which any test has adjusted p-value ≤

*α*.

## Results

Monte Carlo simulations were used to assess the FWE and power of the permutational multiple testing procedure, and compare it with use of the Bonferroni procedure. A genotype relative risk model was assumed at each SNP *j*,where *ψ*_{1} represents the risk of disease with one copy of the allele of interest divided by the risk of disease with no copies. Similarly, *ψ*_{2} represents the risk of disease with two copies of the allele of interest divided by the risk of disease with no copies. Correlated multivariate genotype data for triads was generated by first obtaining haplotype data in linkage disequilibrium (LD) for the parents and then applying an inheritance model.

To generate haplotype data in LD for the parents of cases, SNPs on the same strand were assumed to be in linked blocks of length *n _{b}*, with SNPs between blocks independent. For the purpose of the simulations the proportion of the allele of interest in the population,

*f*, is assumed to be the same for each SNP. Based on the probabilities of each mating type given a diseased case (given in Table 1 of Schaid & Sommer, 1993), the proportion of the allele of interest for parents of cases,

*p*, is calculated. A multivariate haplotype is formed one SNP at a time, starting by determining that the first SNP contains the allele of interest with probability

*p*. Consecutive SNPs in the same block are assumed to have LD parameter,

*D*

_{00}=

*p*

_{00}−

*p*

_{0· }

*p*

_{·0}, where

*p*represents (for

_{ij}*i*,

*j*= 0, 1, or ·) the proportion of haplotypes that have

*i*alleles of interest for the first SNP and

*j*alleles of interest for the second SNP and where if

*i*or

*j*equals · then the proportion is evaluated for the corresponding SNP to have either allele. Subsequent SNPs in the block are then evaluated conditionally on the previous SNP, based on the probabilities

and

Each parental chromosome is generated independently.

Once haplotypes for the parents are generated, genotypes for the children are generated using a random crossover model. In this model, the child inherits from the same strand until a random crossover event occurs with probability *p _{c}*, where inheritance is then from the other strand.

A total of 500 triads were simulated for each replication of the simulation experiment, with 100 SNPs in blocks of size *n _{b}* = 5. The LD parameter was

*D*

_{00}= 0.5 for the parental haplotypes within a block. The LD for the case children was controlled by the value of

*p*.

_{c}Table 2 shows the results of the null simulations for testing at level 0.05. Each simulation consisted of 100,000 replications. Both procedures control the FWE at the desired level of 5%. However, we note that the Bonferroni procedure becomes overly conservative when the correlation between SNPs within blocks is increased (correlation increases as *p _{c}* decreases).

Table 3 shows the results of simulations under three different nonnull models. The nonnull models correspond to dominant, recessive, and multiplicative disease inheritance patterns. Each simulation consisted of 10,000 replications. In each simulation, there were 5 nonnull SNPs out of 100. The average power over the nonnull SNPs is reported. In the cases of low correlation, there is very little difference in power between the methods. However, in the cases of high correlation, the power is substantially higher in the permutational method. This is in agreement with similar comparisons between the Bonferroni and two-sample permutational procedures for controlling the FWE.

### Neural Tube Defects and Clefts in Ireland

As part of a candidate gene study, we analyzed 1339 SNPs from 93 genes on 277 complete NTD triads from the Republic of Ireland. Birth defects are ideal candidates for triad studies because the parents are usually easily identified and likely to be willing to agree to participate. Using multiplicity correction is extremely harsh and the Bonferroni corrected p-values are all 1.0 after truncation, despite the smallest unadjusted p-value being 0.002233. The Bonferroni multiplier is 1339, so the Bonferroni adjusted p-value for the most significant SNP is not even close to being below 1.0 as 1339 × 0.002233 = 3.0. In contrast the permutational procedure gives adjusted p-values below 1.0 (smallest permutational adjusted p-value was 0.74), much smaller than the Bonferroni. Thus, although none of the adjusted p-values are close to being significant, it is clear that the adjusted p-values from the Bonferroni procedure are extremely conservative when compared to those from the permutational procedure.

As an example to see how the permutational adjustment compares to the Bonferroni when some of the adjusted p-values are relatively small, we present a subanalysis of the above experiment. This is presented as an example to examine the relative size of the adjusted p-values, and not to represent what we consider appropriate control for multiplicity. We consider 18 SNPs from a single gene on 277 complete NTD triads. The p-values adjusted only for the 18 SNPs are presented in Table 4. One sees again that the permutational procedure gives smaller adjusted p-values than the Bonferroni. Moreover, the improvement is quite large when considered as a proportion of the Bonferroni adjusted p-value. For the smallest unadjusted p-value, the permutational procedure yields an adjusted p-value more than 40% smaller than the corresponding Bonferroni adjusted p-value.

A final example is given from an independent study. As part of an analysis of oral clefts in Ireland (Carter et al., 2010), 31 SNPs on 250 complete cleft palate only case triads were analyzed. Again, this is presented as an example to compare the adjusted p-values of the procedures, and not to represent what we consider appropriate control for multiplicity, or to represent a complete analysis of cleft cases on these SNPs. The p-values corresponding to the 10 most significant SNPs, adjusted for all 31 SNPs are presented in Table 5. In this case, the smallest adjusted p-value of the permutational procedure is almost 50% smaller than the corresponding Bonferroni adjusted p-value.

## Discussion

We have shown that permutations can be used to approximate the null distribution under the TDT null hypothesis, and that this leads to a one-sample permutational test. This extends to tests of multiple SNPs by permuting vectors of genotypes. Furthermore, we have shown how an FWE-controlling multiple comparison procedure can be obtained quite simply and have proven strong control of the FWE. The methodology extended easily to allow for multiple tests per hypothesis, which accommodates testing via multiple inheritance models in the same study. The permutational approach given here may also be used in more complex family-based designs that include either affected or unaffected siblings. Simulations show that the permutational procedure has the desired FWE level, and that it has a substantial power advantage over the Bonferroni procedure when the SNPs are in LD. This is important when a segment of a gene with multiple SNPs in LD is being examined. Although the power advantages of our approach compared to the Bonferroni procedure were most notable in cases of LD, there is an additional, perhaps more important, reason why this approach may be valuable in genome-wide association studies (GWAS). Our simulations did not contain rare SNPs, a situation where permutational adjustments are more powerful compared to the Bonferroni.

In the future technology will doubtless become available to examine more genetic variants than can be studied currently. This advance will create more problems for statisticians dealing with multiple comparisons. The permutation procedure described here will aid in dealing with these problems because of its ability to handle rare alleles and LD more efficiently than currently used methods (e.g., Bonferroni). We conclude that using a permutational version of the TDT is feasible, and leads to more powerful detection of associated SNPs in candidate gene studies of triads.

## Acknowledgements

This research was supported in part by the Intramural Research Program of the NIH, NICHD. We thank Dr. Lawrence Brody for his insightful advice.

## References

- Carter TC, Molloy AM, Pangilinan F, Troendle JF, Kirke PN, Conley MR, Orr DJA, Earley M, McKiernan E, Lynn EC, Doyle A, Scott JM, Brody LC, Mills JL. Testing reported associations of genetic risk factors for oral clefts in a large Irish study population. Birth Defects Res A. 2010;88:84–93. [PMC free article] [PubMed]
- Han B, Kang HM, Eskin E. Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet. 2009;5:e1000456. 1–13. [PMC free article] [PubMed]
- Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1976;6:65–70.
- Lee W-C, Wang L-Y. Simple formulas for gauging the potential impact of population stratification bias. Am J Epi. 2008;167:86–89. [PubMed]
- Pesarin F. Multivariate Permutation Tests: With Applications in Biostatistics. Wiley; Chichester: 2001.
- Schaid DJ, Sommer SS. Genotype relative risks: Methods for design and analysis of candidate-gene association studies. Am J Hum Genet. 1993;53:1114–1126. [PMC free article] [PubMed]
- Schlesselman JJ. Case-Control Studies: Design, Conduct, Analysis. Oxford University Press; New York: 1982.
- Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506–516. [PMC free article] [PubMed]
- Westfall PH, Troendle JF. Multiple testing with minimal assumptions. Biometr J. 2008;50:745–755. [PMC free article] [PubMed]

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (752K)

- Analysis of single-locus tests to detect gene/disease associations.[Genet Epidemiol. 2005]
*Roeder K, Bacanu SA, Sonpar V, Zhang X, Devlin B.**Genet Epidemiol. 2005 Apr; 28(3):207-19.* - Power estimation of multiple SNP association test of case-control study and application.[Genet Epidemiol. 2004]
*Hao K, Xu X, Laird N, Wang X, Xu X.**Genet Epidemiol. 2004 Jan; 26(1):22-30.* - Resampling-based multiple hypothesis testing procedures for genetic case-control association studies.[Genet Epidemiol. 2006]
*Chen BE, Sakoda LC, Hsing AW, Rosenberg PS.**Genet Epidemiol. 2006 Sep; 30(6):495-507.* - So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests.[Am J Hum Genet. 2007]
*Conneely KN, Boehnke M.**Am J Hum Genet. 2007 Dec; 81(6):1158-68.* - A conditional inference framework for extending the transmission/disequilibrium test.[Hum Hered. 1998]
*Lazzeroni LC, Lange K.**Hum Hered. 1998 Mar-Apr; 48(2):67-81.*

- Evaluation of common genetic variants in 82 candidate genes as risk factors for neural tube defects[BMC Medical Genetics. ]
*Pangilinan F, Molloy AM, Mills JL, Troendle JF, Parle-McDermott A, Signore C, O’Leary VB, Chines P, Seay JM, Geiler-Samerotte K, Mitchell A, VanderMeer JE, Krebs KM, Sanchez A, Cornman-Homonoff J, Stone N, Conley M, Kirke PN, Shane B, Scott JM, Brody LC.**BMC Medical Genetics. 1362* - Genotyping of a tri-allelic polymorphism by a novel melting curve assay in MTHFD1L: an association study of nonsyndromic Cleft in Ireland[BMC Medical Genetics. ]
*Minguzzi S, Molloy AM, Peadar K, Mills J, Scott JM, Troendle J, Pangilinan F, Brody L, Parle-McDermott A.**BMC Medical Genetics. 1329*

- PubMedPubMedPubMed citations for these articles