- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- Hum Hered
- PMC3322627

# A Robust Method for Testing Association in Genome-Wide Association Studies

^{a}Biostatistics Epidemiology Research Design Core, Center for Clinical and Translational Sciences (CCTS), The University of Texas Health Science Center at Houston, Houston, Tex., USA

^{b}Department of Statistical Science, Southern Methodist University, Dallas, Tex., USA

## Abstract

In genetic association studies, due to the varying underlying genetic models, no single statistical test can be the most powerful test under all situations. Current studies show that if the underlying genetic models are known, trend-based tests, which outperform the classical Pearson χ^{2} test, can be constructed. However, when the underlying genetic models are unknown, the χ^{2} test is usually more robust than trend-based tests. In this paper, we propose a new association test based on a generalized genetic model, namely the generalized order-restricted relative risks model. Through a Monte Carlo simulation study, we show that the proposed association test is generally more powerful than the χ^{2} test, and more robust than those trend-based tests. The proposed methodologies are also illustrated by some real SNP datasets.

**Key Words:**Genetic association, Robust test, Trend test, SNP

## 1. Introduction

In case-control genome-wide association (GWA) studies, hundreds of thousands of single nucleotide polymorphisms (SNPs) are tested to determine whether they are associated with the common disease of interest. If a SNP is in linkage disequilibrium with a disease locus, it will not be independent of the status of the disease. Although there may exist gene-gene or gene-environment interactions, the first and crucial step in GWA studies is to identify single SNPs that are associated with disease.

For a SNP with two alleles, *A* and *a*, which is assumed to be at risk, there are three genotypes: *AA*, *Aa*, and *aa*. Suppose that there are *r* cases and *s* controls in the study. In the *r* cases, there are *r*_{0}, *r*_{1}, and *r*_{2} affected people with genotypes *AA*, *Aa*, and *aa*, respectively. There are *s*_{0}, *s*_{1}, and *s*_{2} people with genotypes *AA*, *Aa*, and *aa*, respectively, in the *s* unaffected controls.

Testing whether there is an association between the genotype and the disease status is equivalent to testing the association in the 2 × 3 contingency table. Pearson's χ^{2} test with 2 degrees of freedom (df) is one of the most commonly used statistical methods for testing the association in a contingency table. Note that for SNP data, it is reasonable to assume that the relative risk associated with *Aa* is between the risks associated with *AA* and *aa*. However, Pearson's χ^{2} test does not utilize this feature of the order-restricted risks in the SNP data. On the contrary, the Cochran-Armitage trend test (CATT) was designed to incorporate the trend to increase the detecting power. The CATT with appropriate scores can be more powerful than Pearson's χ^{2} test if the underlying genetic model is known [1, 2, 3, 4]. In general, the scores used in the CATT for SNP data are (0, *x*, 1), where *x* is a number between 0 and 1 and the optimal value of *x* depends on the true underlying genetic model. For instance, if the genetic models are recessive, additive/multiplicative (log additive), and dominant, the optimal values of *x* in the CATTs are 0, 0.5 and 1, respectively. The CATTs with optimal scores have been shown to be more powerful than Pearson's χ^{2} test, provided that the underlying genetic model is known [4].

However, the genetic model is usually unknown in practice and the CATT with a non-optimal score may perform poorly. In other words, the CATT is not as robust as the χ^{2} test, and it is sensitive to the departure of assumed genetic models. To increase the robustness of CATT, several trend-based methods have been proposed for the situations where the underlying genetic models are unknown [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. For instance, the maxmin efficiency robust test (MERT) by Gastwirth [18, 19], and the maximum of the three optimal CATTs under recessive, additive, and dominant models (MAX3) have been studied [7]. Zheng and Ng [16] also proposed a two-phase procedure (GMS) with the selection of the genetic model based on the data in the first stage and then used the optimal score based on the chosen model for the CATT in the second stage. Although the above CATT-based methods have been shown to be robust compare to the CATT, there are some limitations for these methods when the analytic null distributions are either unavailable or too complicated. Consequently, Monte Carlo or numerical methods are required to compute the p values of these test procedures.

This paper is organized as follows. First, in Section 2.1, we propose a generalized genetic model for the SNP data, namely a generalized *order-restricted relative risks* (ORRR) model, in which we assume that the two relative risks are monotonically increasing or decreasing. This ORRR model covers a wide range of ideal models. For instance, the recessive, additive, multiplicative and dominant models are special cases of the ORRR model. Then, we propose a statistical test based on the ORRR model in Section 2.2. Moreover, a restricted likelihood ratio test under the ORRR model is also considered in Section 2.2. Since the new test uses the order-restricted property of the relative risks, it is expected to be more powerful than the χ^{2} test under many situations in general. On the other hand, unlike the CATT, the proposed test does not assume a specific genetic model; it is not sensitive to the misspecification of the underlying genetic models and therefore is more robust than CATT. In Section 3, a Monte Carlo simulation study is used to study the performance of the proposed procedure. We show that our proposed method is more robust than the existing methods and has decent power properties. The proposed methodologies are illustrated using some real SNP data in Section 4. Conclusions are provided in Section 5.

## 2. Proposed Test Procedure

### 2.1. A Generalized Genetic Model and Existing Methods

Table Table11 gives the data structure of a case-control GWA study. The relative risks of genotypes *Aa* and *aa* to *AA* are defined as:

For many genetic models, we can reasonably assume Pr(case|Aa), the disease risk associated with genotype *Aa*, is between Pr(case|AA) and Pr(case|aa). Specifically, if a is the at-risk allele, the relative risks satisfy: λ_{1} ≥ 1 and λ_{2} ≥ λ_{1} with at least one of the inequalities being strictly greater. If *A* is the at-risk allele, the relative risks satisfy 1 ≥ λ_{1} and λ_{1} ≥ λ_{2} where at least one of the inequalities is strict. The monotonicity of the relative risks is also known as order-restricted relative risks. Here, a genetic model with order-restricted relative risks is called a generalized ORRR model. We can see that the aforementioned ideal models (assuming the at-risk allele is *a*), that is, recessive (λ_{1} = 1, λ_{2} > λ_{1}), additive (λ_{1} = (1 + λ_{2})/2), multiplicative (λ_{2} 1 λ_{1}^{2}), and dominant (λ_{1} = λ_{2} > 1), are all special cases of the generalized ORRR model.

As mentioned in Section 1, in addition to the χ^{2} test, existing statistical test procedures, including CATT, MAX3, GMS and MERT, for the null hypothesis that there is no association between disease and the genotype are also considered here. The CATT statistic can be written as [10]:

where (x_{0}, x_{1}, x_{2},) (0, x, 1).

The statistic for MAX3 is [7]:

The statistic for GMS is [16, 17]:

GMS =

where *I* is the indicator function, and the Hardy-Weinberg disequilibrium trend test (HWDTT) statistic is given by [20]:

^* _{P} = r_{2}/r + r_{1}/(2r))^{2}, ^_{Q} = s_{2}/s-(s_{2}/s + s_{1}(2s))^{2}*, and

*c*is a constant and usually chosen as 1.645.

The statistic for MERT is [19]:

### 2.2. The Proposed Test

The proposed test is designed to detect the alternative hypothesis that the underlying genetic model belongs to the generalized ORRR model. Suppose the allele frequencies for *AA, Aa*, and *aa* are *p*_{0}, *p*_{1}, *p*_{2} for case and *q*_{0}, *q*_{1}, and *q*_{2} for control, respectively.

Under the null hypothesis that there is no association between disease and the genotype, we have *p*_{0} = *q*_{0}, *p*_{1} = *q*_{1}, and *p*_{2} = *q*_{2}.

Equation 1 can be expressed as

where Pr(*AA*) = Pr(*AA*|case)Pr(case) + Pr(*AA*|control)Pr(control) = *kp*_{0} + (1 − k)*q*_{0} and *k* is the disease prevalence. Similarly we have Pr(*Aa*) = *kp*_{1} + (1 − k) *q*_{1} and Pr(*AA*) = *kp*_{2} + (1 − k) *q*_{2}.

Then, equation 1 can be written as

Assuming the at-risk allele is a, the alternative hypothesis we want to test is based on the generalized ORRR genetic model for which the relative risks satisfy λ_{1} ≥ 1 and λ_{2} ≥ λ_{1}, with at least one of the inequalities being strict.

From equation 3, we can write λ_{1} ≥ 1 and λ_{2} ≥ λ_{1} as

Let

then detecting the alternative hypothesis is equivalent to detecting that both *T*_{1} and *T*_{2} are non-negative and at least one of them is strictly greater than 0. Therefore, we propose a statistical test procedure based on *T*_{1} and *T*_{2}.

Given the observed data presented in table table1,1, the sample estimates of *T*_{1} and *T*_{2} can be obtained as

respectively. It can be shown that under the null hypothesis of no association (i.e. *p*_{i} = q_{i}, i = 0, 1, 2), the expected values of ${\widehat{T}}_{1}$ and ${\widehat{T}}_{2}$ are zeros (i.e. $E\left({\widehat{T}}_{1}\right)\text{\hspace{0.17em}}=\text{\hspace{0.17em}}E\left({\widehat{T}}_{2}\right)\text{\hspace{0.17em}}=0\text{\hspace{0.17em}}$) and the variance-covariance matrix of ${\widehat{T}}_{1}$ and ${\widehat{T}}_{2}$ is

where

Since the variance-covariance matrix is unknown, it can be estimated by replacing the *p*_{i}s by their consistent estimators, ${\widehat{P}}_{i}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}{n}_{i}/n,\text{\hspace{0.17em}}i=0,\text{\hspace{0.17em}}1,\text{\hspace{0.17em}}2.$ The estimated variance-covariance matrix can be expressed as

Because $\widehat{\Sigma}$ is a positive definite square matrix, eigen-decomposition of $\widehat{\Sigma}$ gives $\widehat{\Sigma}$ = *PDP'*, where D is a diagonal matrix

*d*_{1} and *d*_{2} are the eigenvalues of $\widehat{\Sigma}$, and the columns of *P* are the corresponding eigenvectors of $\widehat{\Sigma}$ which satisfies *PP′ = I*.

under the null hypothesis, *Z*_{1} and *Z*_{2} are asymptotically independent standard normally distributed random variables, where

It can be shown that all elements in ${\widehat{\Sigma}}^{-1/2}$ are non-negative. Therefore, under the alternative hypothesis, we would expect E(*Z*_{1}) and E(*Z*_{2}) to be non-negative where at least one of them is strictly positive. Note that if the at-risk allele is A instead of a in table table11 (i.e. column 1 and column 3 are being switched), statistics *Z*_{1} and *Z*_{2} are still valid with E(*Z*_{1}) and E(*Z*_{2}) being non-positive and with at least one of them being strictly negative under the alternative hypothesis.

By taking the order-restricted property of the SNPs data into account, we consider the statistics

where $\Phi (\xb7)$ is the cumulative distribution function of the standard normal distribution. The asymptotic distributions of *C*_{1} and *C*_{2} are as follows.

Theorem 1

Under the null hypothesis, *C*_{1} and *C*_{2} are asymptotically χ^{2} distributed with 4 df.

The proposed test statistic is the maximum of *C*_{1} and *C*_{2} denoted as

It should be noted that *C*_{1} and *C*_{2} are not independent. However, the following asymptotic property can lead us to an approximation of the p value associated with statistic *W*.

Theorem 2

Under the null hypothesis of no association, the survival function of *W* is asymptotically bounded by

where γ = 1 − χ^{2}_{4}(*w*) and χ^{2}_{4}(·) is the cumulative distribution function of the χ^{2} distribution with 4 df.

Theorem 2 suggests that we can estimate the p value of the test procedure based on *W* by 2γ, and with small γ, the approximation is very accurate.

Under the ORRR genetic models, some other statistical tests can also be applied. For instance, the restricted likelihood ratio test (RLRT) has been proposed to detect the association for the 2 × *k* contingency table [21, 22]. For SNP data with *k* = 3, the RLRT statistic is:

where $\widehat{P}\text{\hspace{0.17em}}=\text{\hspace{0.17em}}r/n,$ and ${\widehat{\pi}}_{i}=\text{\hspace{0.17em}}{r}_{i}/{n}_{i},\text{\hspace{0.17em}}i=0,\text{\hspace{0.17em}}1,\text{\hspace{0.17em}}2$, are order-restricted MLEs satisfying ${\widehat{n}}_{0}\text{\hspace{0.17em}}\le \text{\hspace{0.17em}\hspace{0.17em}}{\widehat{n}}_{1}\le \text{\hspace{0.17em}}{\widehat{n}}_{2}\text{\hspace{0.17em}or\hspace{0.17em}}{\widehat{n}}_{0}\ge \text{\hspace{0.17em}}{\widehat{n}}_{1}\text{\hspace{0.17em}}\ge \text{\hspace{0.17em}}{\widehat{n}}_{2}$

Usually ${\widehat{\pi}}_{i}$ is estimated using the pool adjacent violators algorithm (PAVA) [23], and the above statistic has a weighted χ^{2} distribution (χ^{2}) under the null hypothesis [24]. Its p value is $\mathrm{Pr}\left(RLRT>c\right)={w}_{1}\mathrm{Pr}\left({\text{\chi}}_{1}^{2}>c\right){w}_{2}\text{pr}\left({\text{\chi}}_{2}^{2}>c\right)$. For the SNP data, *k* = 3 and the weights can be estimated using [22] ${w}_{1}=0.5,{w}_{2}=0.5-{\mathrm{cos}}^{-1}\left(-\sqrt{{r}_{1}{r}_{2}/\left[\left(r-{r}_{1}\right)\left(r-{r}_{2}\right)\right]}\right)/\left(2\text{\pi}\right)$. For association studies, the order (increase or decrease) is usually unknown before we observe the data. According to Barlow et al. [24], we can first compute the p values for both increasing and decreasing alternatives and then compute the overall p value as two times the smaller one.

### 3. Monte Carlo Simulation Study

A Monte Carlo simulation study is used to study the performance and the power properties of the proposed procedure as well as some existing procedures in the literature. We assume that the rows of case and control in table table11 follow multinomial distributions with probabilities *p* = (*p*_{0}, *p*_{1}, *p*_{2}) and *q* = (*q*_{0}, *q*_{1}, *q*_{2}), respectively.

Let

where λ_{1} and λ_{2} are the relative risks. Since *p*_{0} + *p*_{1} + *p*_{2} = 1, we have

Therefore, for given qi s, and λ_{1}, λ_{2}, the values of the corresponding *p*_{i}s can be obtained from the above formulas.

For the controls, we first assume Hardy-Weinberg equilibrium (HWE) holds and the minor allele frequencies (MAF) are 0.3 and 0.5. The numbers of cases (*r*) and controls (*s*) both equal 2,500 in our simulations. We use different λ_{1} and λ_{2} to compare the performance of our proposed method with those of GMS, MERT, MAX3, Pearson's χ^{2} test, CATT with *x* = 0.5, and the RLRT. In our simulation study, we use significance level α = 10^{−5} to reflect the real situation of GWA studies where the total number of SNPs are large and the point-wise significance levels are usually very small. For each setting, we used 1,000,000 realizations to estimate the type I error rates (sizes) and power values of those test procedures. To estimate the p values of MAX3, GMS, and MERT, we used R package ‘Rassoc’ with option ‘asy’ which uses the asymptotic null distribution [17].

Note that under the null hypothesis that λ_{1} = λ_{2} = 1, the estimated rejection rates are the estimated type I error rates. These estimated type I error rates of different test procedures are presented in table table2.2. Figures Figures11 and and22 plot the estimated rejection rates of different test procedures when HWE holds. We also considered the situations where HWE does not hold for controls. Specifically, we assume the probabilities for genotypes (*AA, Aa, aa*) are (0.1, 0.3, 0.6) or (0.6, 0.3, 0.1) in controls. Figures Figures33 and and44 plot the estimated rejection rates for these two settings, respectively.

_{1}= 1.0, 1.1, 1.2, 1.3, 1.4, and λ

_{2}= 1.4.

_{1}= 1.0, 1.1, 1.2, 1.3, 1.4 and λ

_{2}= 1.4.

*AA, Aa, aa*) are (0.1, 0.3, 0.6) in controls with λ

_{1}= 1.0, 1.1, 1.2, 1.3, 1.4, and λ

_{2}= 1.4.

*AA, Aa, aa*) are (0.6, 0.3, 0.1) in controls with λ

_{1}= 1.0, 1.1, 1.2, 1.3, 1.4, and λ

_{2}= 1.4.

From the simulation results, for all of the methods except for RLRT, the methods control type I error rates quite well. In general, RLRT and our proposed method have similar powers. However, our simulation study shows that sometimes RLRT had inflated type I error rates. For example, when HWE holds and MAF = 0.3, the estimated size using RLRT was 2.3 × 10^{−5}, which is statistically significantly different than the nominal level 1 × 10^{−5} at significance level 0.05. The situation can be even worse when MAF is smaller. If we assume HWE holds and MAF = 0.2 and 0.1, the estimated sizes from RLRT were 2.5 × 10^{−5}, and 2.8 × 10^{−5}, respectively. Also we can see that MAX3 and GMS have similar performances, while CATT with score *x* = 0.5 and MERT have power values close to each other.

For additive models (λ_{1} = 1.2, λ_{2} = 1.4), MERT and CATT are usually more powerful than other methods, as expected. However, when the true genetic model is dominant (λ_{1} = λ_{2} = 1.4) or recessive (λ_{1} = 1, λ_{2} = 1.4), MERT and CATT perform much worse than other methods. This indicates that MERT and CATT are sensitive to the underlying genetic models and therefore they are not robust. In contrast, MAX3 and GMS perform much better than MERT and CATT for dominant and recessive models; while they both have low power values for additive models. Under models other than recessive, additive, and dominant (i.e. λ_{1} = 1.1, λ_{2} = 1.4 and λ_{1} = 1.3, λ_{2} = 1.4), the proposed test and RLRT have among the three largest power values. Furthermore, if our proposed method is not the most powerful test for a given situation, it always has power value close to the largest one (usually the second or the third largest in power values). This indicates that our proposed method is robust in the sense that it has comparable power under different situations considered in the simulation study. This robustness property is one of the merits of the proposed method because the underlying genetic models are seldom known in practice.

Figures Figures11–4 clearly show that the proposed methods have highest or close to highest powers for all the situations considered in the simulations. It should be noticed that for figures figures11 and and2,2, when the estimated power values for MAX3 and GMS are very close to each other, the differences between the two lines for these two methods are not appreciable. From our simulation study, we observe that the power values of some methods not only depend on the genotypic frequencies, but also the genetic model. For example, in figure figure2,2, except for CATT and MERT, the power values of all other methods decrease when λ_{1} increases for λ_{1} < 1.2 (λ_{1} = 1.2 is the additive model) and the power values increase when λ_{1} increases for λ_{1} > 1.2.

### 4. Numerical Illustrations

In this section, we apply our proposed method with others to some real SNPs reported from four GWA studies with 100,000–500,000 SNPs for age-related macular degeneration (AMD) [25], two cancer studies [26, 27], and a hypertension study [28]. The datasets are summarized in table table3,3, which were taken from [17].

Table Table44 reports the p values for each SNP from the different methods. It can be observed that when the genetic model is between recessive and dominant (*Z*_{1} > 0 and *Z*_{2} > *Z*_{1}, or *Z*_{1} < *Z*_{2} and *Z*_{2} < 0), the proposed method has similar p values with those from CATT, which is usually more powerful than other methods. However, when the genetic model is close to recessive or dominant, CATT performs poorly, but GMS, MAX3 and our proposed method have similar p values and better than CATT. This indicates that GMS, MAX3 and our proposed method perform similar under this situation. For SNPs having large absolute values but with different signs of *Z*_{1} and *Z*_{2}, their genetic models do not belong to the generalized genetic model. Under those situations, Pearson's χ^{2} test has the smallest p value as expected, since it is more robust than any other method, while CATT and MERT have large p values. The p values of the proposed method are similar to those from GMS and MAX3. Note that the p values from RLRT are usually similar to or smaller than those from our proposed test. However, we found that ten out of the seventeen estimated MAF values from controls are less than 0.3; these small p values may be due to the liberal nature of the RLRT for highly unbalanced data as mentioned before.

_{1}and Z

_{2}from the proposed method

In these illustrations, we can also see that the two observed statistics *Z*_{1} and *Z*_{2} can be used to determinate the genetic model and the at-risk allele. For example, for SNP rs380390 in table table4,4, *Z*_{1} = −4.04, *Z*_{2} = −3.11, since both *Z*_{1} and *Z*_{2} are negative, the at-risk allele is A instead of a; the genetic model should be neither recessive nor dominant, but between the two since the absolute values of *Z*_{1} and *Z*_{2} are much larger than 1. From table table4,4, we also see that some SNPs have *Z*_{1} and *Z*_{2} with different signs. Three out of six SNPs from the breast cancer data fall into this category. This situation deserves special attention. It is possible that the underlying genetic models are over- or under-dominant. But it is also possible that this happened merely due to chance or something else, such as population substructure, which needs further investigation. The breast cancer SNP data in table table33 were taken from the Nurses’ Health Study (NHS) and three additional studies have been conducted by the authors [26]. The p values for SNP rs17157903 from the other three studies are 0.72, 0.49 and 0.92, respectively, using the χ^{2} test. Therefore the association between the SNP rs17157903 and breast cancer needs to be validated by future studies.

## 5. Conclusions

In GWA studies, since the underlying genetic models are usually unknown, choosing a powerful statistic test is desirable. There is no single test performing uniformly better than the other competitors, and most of the existing methods may suffer from serious power lost under some models. Through Monte Carlo simulations and the study of real SNP data, we have seen that our proposed method is more robust and powerful than existing methods in many situations.

The robustness of the proposed test is expected since, unlike the CATT test, it does not require a complete genetic model specification except that we assume the model belongs to a generalized ORRR model. Therefore it is not very sensitive to model mis-specification. Meanwhile, the proposed test correctly incorporates the property of the monotonicity of the relative risks for SNP data in GWA studies which results in power gains. Moreover, based on the simulation results, we observed that when the genetic models are not one of the perfect models (i.e. recessive, additive, and dominant models), the proposed test usually has the highest or second highest power. In real world applications, the perfect models may be rare if not impossible; hence the proposed method is certainly preferable. Finally, through simulation (data not shown) and real data (see table table4),4), when the genetic model is not ORRR, e.g. over- or under-dominant, the proposed method has reasonable power. Beside the robustness of the proposed method, another advantage of the proposed method is that the p value can be easily approximated with very high accuracy. Although sometimes RLRT can also be applied to generalized ORRR models, it should be used with caution as it inflates type I error rates when the data are highly unbalanced (e.g. HWE with small MAF).

Since some of SNPs are highly correlated due to linkage disequilibrium, the p values obtained from individual SNPs are also correlated. Traditional multiple tests correction methods, such as the Bonferroni procedure, are not appropriate. One may choose instead to use the recently proposed method which is based on the concept of effective number [29].

## Appendix

Proof of Theorem 1

For large sample sizes, which are usually available for GWA studies, we can assume *Z*_{1} and *Z*_{2} are independently and identically distributed as standard normal, so that Φ(*Z*_{1}) and Φ(*Z*_{2}) are independently and identically distributed uniformly between 0 and 1. Therefore, according to Fisher [30], *C*_{1} is χ^{2} distributed with df = 4. Similarly, *C*_{2} is χ^{2} distributed with df = 4.

Proof of Theorem 2

For a large sample size, from Theorem 1, *C*_{1} and *C*_{2} are both χ^{2} distributed with df = 4. Using the concept of associated random variables by Esary et al. [31] and Theorem 2 by Owen [32], we have

## Acknowledgements

The first author would like to thank for the support from the NIH grant (UL1 RR024148) awarded to the University of Texas Health Science Center at Houston. The authors would also like to thank Professor Wayne Woodward for his suggestions which resulted in a much improved version of the manuscript.

## References

**Karger Publishers**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (427K)

- A new association test based on Chi-square partition for case-control GWA studies.[Genet Epidemiol. 2011]
*Chen Z.**Genet Epidemiol. 2011 Nov; 35(7):658-63. Epub 2011 Aug 26.* - A generalized sequential Bonferroni procedure using smoothed weights for genome-wide association studies incorporating information on Hardy-Weinberg disequilibrium among cases.[Hum Hered. 2012]
*Gao G, Kang G, Wang J, Chen W, Qin H, Jiang B, Li Q, Sun C, Liu N, Archer KJ, et al.**Hum Hered. 2012; 73(1):1-13. Epub 2011 Dec 30.* - Robust Mantel-Haenszel test under genetic model uncertainty allowing for covariates in case-control association studies.[Genet Epidemiol. 2011]
*Zang Y, Fung WK.**Genet Epidemiol. 2011 Nov; 35(7):695-705. Epub 2011 Aug 26.* - MAX-rank: a simple and robust genome-wide scan for case-control association studies.[Hum Genet. 2008]
*Li Q, Yu K, Li Z, Zheng G.**Hum Genet. 2008 Jul; 123(6):617-23. Epub 2008 May 20.* - Design and analysis of multiple diseases genome-wide association studies without controls.[Gene. 2012]
*Chen Z, Huang H, Ng HK.**Gene. 2012 Nov 15; 510(1):87-92. Epub 2012 Aug 23.*

- A new association test based on disease allele selection for case-control genome-wide association studies[BMC Genomics. ]
*Chen Z.**BMC Genomics. 15(1)358* - Detecting differentially methylated loci for multiple treatments based on high-throughput methylation data[BMC Bioinformatics. ]
*Chen Z, Huang H, Liu Q.**BMC Bioinformatics. 15142* - Association Testing Strategy for Data from Dense Marker Panels[PLoS ONE. ]
*Lee D, Bacanu SA.**PLoS ONE. 8(11)e80540* - Incorporating parental information into family-based association tests[Biostatistics (Oxford, England). 2013]
*Yu Z, Gillen D, Li CF, Demetriou M.**Biostatistics (Oxford, England). 2013 Jul; 14(3)556-572* - Age-adjusted nonparametric detection of differential DNA methylation with case-control designs[BMC Bioinformatics. ]
*Huang H, Chen Z, Huang X.**BMC Bioinformatics. 1486*

- PubMedPubMedPubMed citations for these articles