• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS One. 2010; 5(8): e11384.
Published online Aug 18, 2010. doi:  10.1371/journal.pone.0011384
PMCID: PMC2909900

A General Model for Multilocus Epistatic Interactions in Case-Control Studies

Zvia Agur, Editor

Abstract

Background

Epistasis, i.e., the interaction of alleles at different loci, is thought to play a central role in the formation and progression of complex diseases. The complexity of disease expression should arise from a complex network of epistatic interactions involving multiple genes.

Methodology

We develop a general model for testing high-order epistatic interactions for a complex disease in a case-control study. We incorporate the quantitative genetic theory of high-order epistasis into the setting of cases and controls sampled from a natural population. The new model allows the identification and testing of epistasis and its various genetic components.

Conclusions

Simulation studies were used to examine the power and false positive rates of the model under different sampling strategies. The model was used to detect epistasis in a case-control study of inflammatory bowel disease, in which five SNPs at a candidate gene were typed, leading to the identification of a significant three-locus epistasis.

Introduction

The complexity of biological systems arises from the highly interactive relationships of their components [1], [2]. Thus, it is likely that the metabolic pathways for a phenotypic trait or disease involve multiple interacting gene products and regulatory loci that could generate a complex network of genetic actions and interactions [3], [4]. Current genome-wide linkage or association studies have been able to detect genetic actions of individual genes involved in the phenotypic diversity of a complex trait [5][12]. Given its ubiquitousness in controlling complex traits and diseases, epistasis resulting from interactions between alleles at different genes has now received increasing attention in genetic studies [13], [14]. However, many of these studies focus on the identification of low-order pairwise epistasis, leaving epistatic interactions of high orders, their frequency and impact on genetic variation, unexplored.

More recently, Stich et al. [15] developed a linkage mapping approach to uncover three-way interactions among different quantitative trait loci (QTLs) using a mating design. Beerenwinkel et al. [16] proposed a mathematical approach for describing multi-way genetic interactions and employing it to study the genetic structure of fitness landscapes for Escherichia coli. Based on the analysis of pathway fragments, Imielinski and Belta [17] used a genome-scale knockout design to detect high-order epistatic relationships between components of large metabolic networks. Hansen and Wagner [18] showed that higher-order genetic interactions are potentially important if the total genomic mutation rate is large and the interaction density among loci is not too low. With the widespread availability of high-throughpout genotyping technology, there is a pressing need to estimate higher-order epistasis involving any number of genes and assess the role of epistasis in the creation and maintenance of genetic variation for complex traits.

The motivation of this study is to develop a general model for estimating epistasis of any order with multilocus single nucleotide polymorphism (SNP) data in case-control studies. In particular, the model allows the estimation and testing of high-order epistasis. Because of its easy sample collection, a population-based case-control design has been widely used in candidate gene or genome-wide association studies [19][21]. By comparing genotype frequencies for a gene in unrelated individuals with the disease and healthy controls, this design has power to test the significance of the association between the gene and disease. However, only a few studies used a case-control design to characterize epistasis [19] and, also, the epistasis they defined on the basis of logistic regression models presents a computational complexity. The new model described in this article has, for the first time, embedded quantitative genetic principles into a chi-square test framework, allowing the dissection of overall multilocus genetic effects into various components including epistatic interactions of high orders. The model was validated through simulation studies and a real data analysis.

Large Quantitative Genetic Models for Epistasis

Epistasis was originally defined as the expression of an allele at one locus masked by an allele at another locus [22]. This concept was then explained in a statistical manner by Fisher [23] as the deviation of genetic action from additivity in a linear model. Fisher's definition allows epistasis to be quantified in different forms based on its biological meaning determined by Bateson [22]. For a two-locus epistasis, all possible forms of epistasis include the interactions between additive effects at the two loci, additive effect at the first locus and dominant effect at the second locus, dominant effect at the first locus and additive effect at the second locus, and dominant effects at the two loci. Each of these epistatic forms contributes differently to the overall genetic value of a two-locus genotype. We used Mather and Jinks' formulation [24] to partition a genotypic value into its different components including epistasis.

Two-locus Epistasis

Suppose there are two loci, A with two alleles An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e001.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e002.jpg and B with two alleles An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e003.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e004.jpg, which form nine two-locus genotypes. Let An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e005.jpg denote the genetic value of an arbitrary genotype An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e006.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e007.jpg for genotypes An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e008.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e009.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e010.jpg; An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e011.jpg for genotypes An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e012.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e013.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e014.jpg, respectively). We dissect An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e015.jpg into different components as table 1.

Table 1
The genetic effect components of two-locus genotypes.

Where An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e053.jpg is the overall mean, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e054.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e055.jpg are the additive effect at genes A and B, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e056.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e057.jpg are the dominant effect at genes A and B, respectively, and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e058.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e059.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e060.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e061.jpg are the additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e062.jpg additive, additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e063.jpg dominant, dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e064.jpg additive, and dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e065.jpg dominant epistatic interactions between the two genes, respectively.

The dissection of genotypic values is expressed, in matrix form, as

equation image
(1)

The genetic effect parameters can be solved using

equation image
(2)

Three-locus Epistasis

Adding a locus, C with two alleles An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e068.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e069.jpg, to the two-gene model generates 27 three-locus genotypes, expressed as An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e070.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e071.jpg for genotypes An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e072.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e073.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e074.jpg, respectively). A three-genotypic value (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e075.jpg) is dissected into the following components:

  1. the overall mean An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e076.jpg;
  2. the main genetic effects including the three additive effects (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e077.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e078.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e079.jpg) at genes A, B, and C, and the three dominant effects (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e080.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e081.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e082.jpg) at genes A, B, and C;
  3. the two-way interaction effects including the additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e083.jpg additive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e084.jpg), additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e085.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e086.jpg), dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e087.jpg additive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e088.jpg), and dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e089.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e090.jpg) epistasis between genes A and B, the additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e091.jpg additive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e092.jpg), additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e093.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e094.jpg), dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e095.jpg additive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e096.jpg), and dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e097.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e098.jpg) epistasis between genes A and C, and additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e099.jpg additive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e100.jpg), additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e101.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e102.jpg), dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e103.jpg additive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e104.jpg), and dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e105.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e106.jpg) epistasis between genes B and C;
  4. the three-way interaction effects including the additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e107.jpg additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e108.jpg additive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e109.jpg), additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e110.jpg additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e111.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e112.jpg), additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e113.jpg dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e114.jpg additive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e115.jpg), dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e116.jpg additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e117.jpg additive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e118.jpg), additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e119.jpg dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e120.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e121.jpg), dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e122.jpg additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e123.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e124.jpg), dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e125.jpg additive An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e126.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e127.jpg), and dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e128.jpg dominant An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e129.jpg dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e130.jpg) epistasis among genes A, B, and C.

Mather and Jinks' theory is used to formulate the relationships between genotypic values and genetic effects, expressed as

equation image
(3)

The genetic effect parameters are then solved from the genotypic values:

equation image
(4)

N-locus Epistasis

We propose a general model for describing genetic components for a genotype composed of any number of loci. Consider An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e133.jpg loci which form An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e134.jpg genotypes. The value of a An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e135.jpg-locus genotype is composed of the overall mean, the additive and dominant effects for each locus, and epistasis of different kinds and orders among these loci. Let the space of the genetic effects at individual loci be defined as An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e136.jpg for gene 1, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e137.jpg for gene 2, …, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e138.jpg for gene An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e139.jpg. Thus, we can define all possible genetic effects (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e140.jpg) as

  • If An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e141.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e142.jpg;
  • If An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e143.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e144.jpg;
  • …;
  • If An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e145.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e146.jpg;
  • …;
  • If An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e147.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e148.jpg, …,An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e149.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e150.jpg.
  • …;
  • If An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e151.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e152.jpg, …,An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e153.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e154.jpg.

By letting An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e155.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e156.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e157.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e158.jpg), we express the value of a general multi-locus genotype as

equation image
(5)

where

equation image
(6)

with

equation image
(7)

An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e162.jpg is a logical judgment function that can return 1 if the condition is true otherwise return 0.

The genetic effect parameters can be estimated by solving the linear equations using

equation image
(8)

where

equation image
(9)

with

equation image
(10)

Equation (8) gives a general form for main and interaction genetic effects among an arbitrary number of loci. Mathematical algorithms for solving epistatic equations are given in Text S1.

Testing Epistasis

Based on the definitions, we now provide a procedure for testing epistasis of different kinds and orders with multilocus genetic data. Consider a case-control study in which An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e166.jpg cases (there is a disease) and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e167.jpg controls (there is no disease) are selected randomly from a natural population. Case and control groups are matched for demographical factors such as age, race, gender, life style, and body mass. All subjects from the case and control groups are genotyped genome-wide or for particular chromosomal regions of interest, depending on the purpose of the study. Let An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e168.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e169.jpg denote the observations of a general genotype An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e170.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e171.jpg) derived from three markers A, B, and C. Based on Mather and Jinks' partition of genotypic values [24], we calculate genetic effect parameters from genotypic values using equation (4). For both cases and controls, the genotypic values used to calculate each effect parameter are dissolved into two groups, plus and minus, which forms a 2 (cases and control)An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e172.jpg2 (plus and minus) contingency table. For example, the contingency table for testing the additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e173.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e174.jpgadditive epistatic effect is expressed as table 2.

Table 2
The An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e175.jpg test statistics for the additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e176.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e177.jpgadditive epistatic effect.

From the table, the An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e182.jpg test statistic is calculated and compared with the critical threshold with one degree of freedom. We proved that the test statistics under the null hypothesis calculated from the above contingency table follows a An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e183.jpg distribution with less than one degree of freedom [25].

The contingency tables for testing the other parameters can be made similarly. For a particular group An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e184.jpg (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e185.jpg = 1 for cases, 2 for controls), the genotypic values used to calculate the three-way epistatic effect parameters are tabulated as table 3.

Table 3
The An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e186.jpg test statistics for the three-way epistatic effect parameters.

The thresholds for testing each of these three-locus epistases are derived, which are An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e211.jpg = 3.84, 3.20, 3.20, 3.20, 2.60, 2.60, 2.60, and 2.14, respectively. The genotypic values used to calculate the two-way epistatic effect parameters are tabulated as table 4.

Table 4
The An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e212.jpg test statistics for the two-way epistatic effect parameters.

The thresholds for testing each of these two-locus epistases are derived, which are An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e249.jpg = 3.84, 3.84, 3.84, 2.50, 2.50, 2.50, 3.20, 3.20, 3.20, 3.20, 3.20 and 3.20, respectively. The genotypic values used to calculate the main genetic effect parameters are tabulated as table 5.

Table 5
The An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e250.jpg test statistics for the main epistatic effect parameters.

The thresholds for testing each of these two-locus epistases are derived, which are An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e269.jpg = 3.84, 3.84, 3.84, 2.60, 2.60 and 2.60, respectively. For an arbitrary number of markers, the genotypic values used to calculate the main and epistatic (of different orders) genetic effect parameters can be similarly divided into plus and minus groups, from which the An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e270.jpg test statistics are calculated.

Results

The model was used to analyze a case-control study aimed to detect genetic variants for inflammatory bowel disease (IBD) with candidate gene approaches [26]. As a member of the membrane associated guanylate kinase family, TDiscs large homolog (DLG5) plays a central role in maintaining cell junctions and cell shape and in clustering channel proteins at the cell surface [27]. Five single nucleotide polymorphisms (SNPs), Arg30Gln, Glu514Gln, Pro979Leu, Gly1066Gly, and Pro1371Gln, genotyped at DLG5 for both cases and controls are hoped to be associated with IBD. The cases include 115 sporadic IBD patients, aged from 22 to 66 years old, from the Milton S Hershey Medical Center, whereas the controls are 172 unrelated healthy individuals, aged from 15 to 81 years, from the Milton S Hershey Medical Center and Philadelphia gift of Life Donor Program. All the human tissues used for pathological studies and genetic analysis were approved by the Human Subjects Protection Offices of The Pennsylvania State University College of Medicine, and were undertaken with the understanding and written consent of each subject.

Because of a modest sample size used, our analysis will focus on a three-SNP analysis, although the model can deal with any number of SNPs. None of the five SNPs displays an additive genetic effect, but Arg30Gln, Pro979Leu, and Gly1066Gly were each found to trigger a significant dominant effect on the disease (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e271.jpg) (table 6). There are 10 possible pairs for the five SNPs, with each pair subject to a two-locus epistatic analysis. The number and distribution of two-locus epistasis are given in table 7. It is interesting to see that significant two-locus epistasis was observed only between Arg30Gln and other SNPs including Pro979Leu with a significant main dominant effect and two non-significant SNPs (Glu514Gln and Pro1371Gln). The form of significant epistasis is limited to the interactions between the dominant effect at Arg30Gln and the additive/dominant effects at the other SNPs.

Table 6
The An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e272.jpg test statistics calculated to test the additive and dominant effects at each SNP genotyped from DLG5.
Table 7
The An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e276.jpg test statistics calculated to test the two-SNP epistasis between each pair of SNPs genotyped from DLG5.

The five SNPs produce 10 three-locus combinations which were analyzed by a three-locus epistasis model. Each combination has eight forms of three-SNP epistasis. Table 8 lists the test statistics for all possible combinations and forms of epistasis, with significant epistasis highlighted in boldface. The interactions among the additive effects at any three of the five SNPs were not significant; the same was also observed for the three-way dominant interactions. The significant three-locus epistasis must include both the additive and dominant effect at three SNPs. In general, Arg30Gln have more significant three-locus interactions and display higher three-locus significance level than the other SNPs. Arg30Ln, Glu514Gln, and Pro979Leu produce the most numerous forms of epistasis (3), followed by the combinations of Arg30Ln, Gly1066Gly, and Pro1371Gln (2), Glu514Gln, Gly1066Gly, and Pro1371Gln (2), Arg30Ln, Glu514, and Pro1371Gln (1), Arg30Ln, Pro979Leu, and Pro1371Gln (1), Pro979Leu, Gly1066Gly, and Pro1371Gln (1). The three SNPs with significant main effects (Arg30Ln, Pro979Leu, and Pro1371Gln) do not produce a significant three-locus epistatic interaction. The two SNPs displaying non-significant main effects (Glu514Gln and Pro1371Gln) could generate significant three-locus interactions with SNPs Arg30Ln and Gly1066Gly but not with Pro979Leu (table 8).

Table 8
The An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e292.jpg test statistics calculated to test the three-SNP epistasis between each pair of SNPs genotyped from DLG5.

After significant high-order epistasis is detected, the next step is to make a biological interpretation of such epistasis. To interpret it, we will use the dominant (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e354.jpg)An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e355.jpgadditive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e356.jpg)An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e357.jpgadditive (An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e358.jpg) epistasis among Arg30Ln, Glu514Gln, and Pro979Leu as an example. Table 9 gives the structure of genetic effects for each three-locus genotypic value in terms of the additive, dominant, and epistatic effects of different orders. The An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e359.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e360.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e361.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e362.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e363.jpg epistasis only contributes to the genotypic value of An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e364.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e365.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e366.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e367.jpg (table 9). For each of these four genotypes, their values are partitioned into different effect components for both cases and controls (table 10). As can be seen, the An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e368.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e369.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e370.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e371.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e372.jpg epistasis increases, by 9 cases, the incidence of IBD for those with genotype An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e373.jpg or An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e374.jpg, but decreases the IBD incidence of those carrying genotype An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e375.jpg or An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e376.jpg with the same extent.

Table 9
Genetic effect components of different three-locus genotypic values.
Table 10
The genetic effect components of four particular genotypes, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e492.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e493.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e494.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e495.jpg at three SNPs, Arg30Ln, Glu514Gln, and Pro979Leu, which contain the dominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e496.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e497.jpgadditive three-locus epistasis.

Computer Simulation

Simulation studies were undertaken to examine the statistical behavior of the new model. We will focus on the investigation of the power and false positive rates (FDR) for the detection of three-locus epistasis. Three different simulation schemes will be used with varying numbers of cases vs. controls, 200 vs. 200, 400 vs. 400, and 1000 vs. 1000. The eight possible forms of three-locus epistasis can be sorted into four presentative ones, (1) additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e563.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e564.jpgadditive (no dominant effect), (2) additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e565.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e566.jpgdominant, additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e567.jpgdominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e568.jpgadditive, and dominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e569.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e570.jpgadditive (no one dominant effect), (3) additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e571.jpgdominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e572.jpgdominant, dominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e573.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e574.jpgdominant, and dominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e575.jpgdominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e576.jpgadditive (two dominant effects), and (4) dominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e577.jpgdominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e578.jpgdominant (three dominant effects).

For a real data set, different SNPs may be associated or independent of each other. We will investigate how SNP-SNP associations affect the behavior of the new model. In one data set, three SNPs with the same allele frequency were simulated with pair-wise and three-locus linkage disequilibria. Among the three SNPs, only additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e579.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e580.jpgadditive, additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e581.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e582.jpgdominant, additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e583.jpgdominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e584.jpgdominant, and dominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e585.jpgdominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e586.jpgdominant were assumed to exist. This can be done by simulating a contingency table with constraints An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e587.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e588.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e589.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e590.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e591.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e592.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e593.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e594.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e595.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e596.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e597.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e598.jpg and the test statistics for the other effects An external file that holds a picture, illustration, etc.
Object name is pone.0011384.e599.jpg the corresponding thresholds. The same parameters, except that there is no linkage disequilibrium, were used to simulate the second data set containing three SNPs.

Table 11, table 12, and table 13 give the power and false positive error rates (FPR) of the three-locus interaction detection by the new epistatic models. The power to detect the three-locus epistasis increase remarkably with sample size in a case-control study. With sample sizes of 200 vs. 200, there is power of about 0.51–0.61, with the additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e600.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e601.jpgadditive epistasis detected most easily, followed by the additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e602.jpgadditiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e603.jpgdominant epistasis, the additiveAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e604.jpgdominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e605.jpgdominant epistasis, and the dominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e606.jpgdominantAn external file that holds a picture, illustration, etc.
Object name is pone.0011384.e607.jpgdominant epistasis. When sample sizes increase to 400 vs. 400, the power for the three-locus epistasis detection will surpass three quarters. If sample sizes 1000 vs. 1000 are used, the power reaches 0.99 or more. In general, whether the SNPs are associated or independent does not affect the power substantially, although in some cases the power is higher for associated SNPs than independent SNPs.

Table 11
Power and false positive rates (FPR) for the detection of three-locus epistasis among associated and independent SNPs for 200 cases and 200 controls.
Table 12
Power and false positive rates (FPR) for the detection of three-locus epistasis among associated and independent SNPs for 400 cases and 400 controls.
Table 13
Power and false positive rates (FPR) for the detection of three-locus epistasis among associated and independent SNPs for 1000 cases and 1000 controls.

The power displays a small FPR (tables 11, ,12,12, and and13).13). Even if small sample sizes 200 vs. 200 are used, there is still a small chance that the model provides a false positive result for the three-locus epistasis detection. The FPR was found to be consistent, regardless of sample sizes and the degree of SNP-SNP associations.

Discussion

The phenotypic variation of a trait or disease is highly complex given its polygenic inheritance and environmental influence. Most original quantitative genetic models generally assume that allelic effects are additive, with the size linearly proportional to the number of alleles. These models are modified by considering that there are genetic interactions between different alleles at the same locus (dominance). It is now recognized that the interactions between different loci (epistasis) within gene networks may play an important role [14], [15]. More recent evidence shows that high-order epistasis among more than two genes may form a crucial component in genetic interaction networks [9], [11], [17], [18]. In fact, quantitative genetic analyses have detected high-order epistatic effects in plants. For example, high-order epistasis could be correlated with the aggressiveness of the isolate of Phytophthora capsici through influencing double crosses among different loci at meiosis [28]. Wu [29] used a mating design with clonal replicates to identify the significant contribution of high-order epistasis to genetic variation in stem wood growth traits in poplars.

An increasing availability of high-throughput SNP data has led to the development of various statistical approaches for effectively analyzing epistasis among multiple polymorphisms, including logistic regression, multifactor dimensionality reduction (MDR), Bayesian analysis, and machine learning [15], [21], [30][32]. In this article, we developed a general model for detecting the episatsis of any order in case-control genetic association studies by integrating traditional quantitative genetic principles. Despite the existence, high-order epistasis may be obscured by metabolic network redundancy [17]. The integration of quantitative genetic principles makes our approach capable to identify high-order epistatic interactions with genetic relevance. The model was tested by simulation studies. It displays adequate power for the detection of high-order epistasis with a modest sample size; for example, 400 cases vs. 400 controls. When sample sizes of cases and controls increase to 1000 vs. 1000, which is currently not a problem for most genetic association studies, the model has almost full power to detect three-locus epistasis of different forms. Even if a small size of samples (say 200 vs. 200), the new model has a low false positive rate for epistatic detection. The practical application of the model is validated by analyzing a real data set for the genetic study of inflammatory bowel disease. The model detected significant three-locus epistatic interactions among different SNPs genotyped from a candidate gene DLG5 [27].

Our model allows the characterization of epistasis of any order. Its implementation into a practical setting of genome-wide association studies is challenged by an exponentially increasing number of SNP-SNP combinations. To make this tractable, one may incorporate optimization techniques into our model, allowing the selection of the most important combinations. An additional issue is to determine the critical threshold with multiple correlated SNPs in genome-wide association studies. An empirical approach for determining a genome-wide threshold is to employ non-parametric permutation testing (see ref. [21], [30], [33][35]). Lastly, the model is developed to detect multilocus epistasis at the SNP level, but given recent discoveries for the importance of haplotypes in trait control [36][39], the model should be extended to consider high-order interactions expressed by different haplotypes. In the current model specification, we choose controls that are matched for cases in terms of biological, environmental, or demographical factors. When such matches are not possible, we need to embed these factors as covariates into the model, in which the interactions between genes and these factors can be tested. Third, the model can be extended with multiple diseases to consider the pleiotropic effect of a gene. The results about high-order epistasis detection using the this and extended models could be used for iterative model building and functional annotation of genes. Future applications of these results includes analysis of the metabolic networks of pathogenic organisms and generation of epistatic candidate models for genome-wide association studies.

Supporting Information

Text S1

Mathematical algorithm.

(0.14 MB PDF)

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This work is supported by National Science Foundation (NSF) grant DMS/NIGMS-0540745 and the Changjiang Scholars Award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Weng G, Bhalla U, Iyengar R. Complexity in biological signaling systems. Science. 1999;284:92–96. [PMC free article] [PubMed]
2. Hlavacek W, Faeder J. The Complexity of Cell Signaling and the Need for a New Mechanics. Science's STKE. 2009;2 [PubMed]
3. Huang L, Sternberg P. Genetic dissection of developmental pathways. Methods in cell biology. 1995;48:97–122. [PubMed]
4. McMullen M, Byrne P, Snook M, Wiseman B, Lee E, et al. Quantitative trait loci and metabolic pathways. Proceedings of the National Academy of Sciences of the United States of America. 1998;95:1996. [PMC free article] [PubMed]
5. Ritchie M, Hahn L, Roodi N, Bailey L, Dupont W, et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. The American Journal of Human Genetics. 2001;69:138–147. [PMC free article] [PubMed]
6. Martin M, Gao X, Lee J, Nelson G, Detels R, et al. Epistatic interaction between KIR3DS1 and HLA-B delays the progression to AIDS. Nature genetics. 2002;31:429–434. [PubMed]
7. Gabutero E, Moore C, Mallal S, Stewart G, Williamson P. Interaction between allelic variation in IL12B and CCR5 affects the development of AIDS: IL12B/CCR5 interaction and HIV/AIDS. AIDS. 2007;21:65. [PubMed]
8. Hirschhorn J, Daly M. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics. 2005;6:95–108. [PubMed]
9. Marchini J, Donnelly P, Cardon L. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature genetics. 2005;37:413–417. [PubMed]
10. Wang W, Barratt B, Clayton D, Todd J. Genome-wide association studies: theoretical and practical concerns. Nature Reviews Genetics. 2005;6:109–118. [PubMed]
11. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81:559–575. [PMC free article] [PubMed]
12. Wan X, Yang C, Yang Q, Xue H, Tang N, et al. Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics. 2010;26:30. [PubMed]
13. Phillips P. Epistasis-the essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics. 2008;9:855–867. [PMC free article] [PubMed]
14. Moore J, Williams S. Epistasis and its implications for personal genetics. The American Journal of Human Genetics. 2009;85:309–320. [PMC free article] [PubMed]
15. Stich B, Yu J, Melchinger A, Piepho H, Utz H, et al. Power to detect higher-order epistatic interactions in a metabolic pathway using a new mapping strategy. Genetics. 2007;176:563. [PMC free article] [PubMed]
16. Beerenwinkel N, Pachter L, Sturmfels B, Elena S, Lenski R. Analysis of epistatic interactions and fitness landscapes using a new geometric approach. BMC Evolutionary Biology. 2007;7:60. [PMC free article] [PubMed]
17. Imielinski M, Belta C. Exploiting the pathway structure of metabolism to reveal high-order epistasis. BMC Systems Biology. 2008;2:40. [PMC free article] [PubMed]
18. Hansen T, Wagner G. Epistasis and the mutation load: a measurement-theoretical approach. Genetics. 2001;158:477. [PMC free article] [PubMed]
19. Zhang Y, Liu J. Bayesian inference of epistatic interactions in case-control studies. Nature genetics. 2007;39:1167–1173. [PubMed]
20. Nunkesser R, Bernholt T, Schwender H, Ickstadt K, Wegener I. Detecting high-order interactions of single nucleotide polymorphisms using genetic programming. Bioinformatics. 2007;23:3280. [PubMed]
21. Gayán J, González-Pérez A, Bermudo F, Sáez M, Royo J, et al. A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis. BMC genomics. 2008;9:360. [PMC free article] [PubMed]
22. Bateson W. Mendels principles of heredity. Molecular and General Genetics MGG. 1910;3:108–109.
23. Kempthorne O. The correlation between relatives on the supposition of mendelian inheritance. American Journal of Human Genetics. 1968;20:402.
24. Workman P. Biometrical genetics. The study of continuous variation. American Journal of Human Genetics. 1973;25:461.
25. Liu T, Thalamuthu A, Liu J, Chen C, Wu R. A model for testing epistatic interactions of complex diseases in case-control studies. Biostatistics. 2010 (in press)
26. Lin Z, Poritz L, Franke A, Li T, Ruether A, et al. Genetic association of DLG5 R30Q with familial and sporadic inflammatory bowel disease in men. Disease markers. 2009;27:193–201. [PMC free article] [PubMed]
27. Stoll M, Corneliussen B, Costello C, Waetzig G, Mellgard B, et al. Genetic variation in DLG5 is associated with inflammatory bowel disease. Nature genetics. 2004;36:476–480. [PubMed]
28. Bartual R, Lacasa A, Marsal J, Tello J. Epistasis in the resistance of pepper to phytophthora stem blight (Phytophthora capsici L.) and its significance in the prediction of double cross performances. Euphytica. 1993;72:149–152.
29. Wu R. Detecting epistatic genetic variance with a clonally replicated design: models for lowvs high-order nonallelic interaction. TAG Theoretical and Applied Genetics. 1996;93:102–109. [PubMed]
30. Liang Y, Kelemen A. Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases. Statistics Surveys. 2008;2:43–60.
31. Kayano M, Takigawa I, Shiga M, Tsuda K, Mamitsuka H. Efficiently finding genome-wide three-way gene interactions from transcript- and genotype-data. Bioinformatics. 2009;25:2735. [PMC free article] [PubMed]
32. Jiang R, Tang W, Wu X, Fu W. A random forest approach to the detection of epistatic interactions in case-control studies. BMC bioinformatics. 2009;10:S65. [PMC free article] [PubMed]
33. Carlborg O, Andersson L. Use of randomization testing to detect multiple epistatic QTLs. Genetics Research. 2002;79:175–184. [PubMed]
34. Alison M. The effect of alternative permutation testing strategies on the performance of multifactor dimensionality reduction. BMC Research Notes. 2009;1 [PMC free article] [PubMed]
35. Edwards T, Turner S, Torstenson E, Dudek S, Martin E, et al. A General Framework for Formal Tests of Interaction after Exhaustive Search Methods with Applications to MDR and MDR-PDT. 2010 [PMC free article] [PubMed]
36. Judson R, Stephens J, Windemuth A. The predictive power of haplotypes in clinical response. pgs. 2000;1:15–26. [PubMed]
37. Bader J. The relative power of SNPs and haplotype as genetic markers for association tests. pgs. 2001;2:11–24. [PubMed]
38. Liu T, Johnson J, Casella G, Wu R. Sequencing complex diseases with HapMap. Genetics. 2004;168:503. [PMC free article] [PubMed]
39. Rha S, Jeung H, Choi Y, Yang W, Yoo J, et al. An association between RRM1 haplotype and gemcitabine-induced neutropenia in breast cancer patients. The Oncologist. 2007;12:622. [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...