Tradeoffs in Modeling Context Dependency in Complex Trait Genetics

Genetic effects on complex traits may depend on context, such as age, sex, environmental exposures or social settings. However, it is often unclear if the extent of context dependency, or Gene-by-Environment interaction (GxE), merits more involved models than the additive model typically used to analyze data from genome-wide association studies (GWAS). Here, we suggest considering the utility of GxE models in GWAS as a tradeoff between bias and variance parameters. In particular, We derive a decision rule for choosing between competing models for the estimation of allelic effects. The rule weighs the increased estimation noise when context is considered against the potential bias when context dependency is ignored. In the empirical example of GxSex in human physiology, the increased noise of context-specific estimation often outweighs the bias reduction, rendering GxE models less useful when variants are considered independently. However, we argue that for complex traits, considering patterns of context dependency across many variants jointly can improve both estimation and trait prediction. Finally, we exemplify (using GxDiet effects on longevity in fruit flies) how considering polygenic trends of GxE may also be important for interpretation, as analyses based on independently ascertained “top hits” alone can be misleading.


Validating and Applying the Decision Rule to Real Data
To validate the decision rule of Eqs.4-6 of the main text, we considered the example of sex as a context for physiological traits using UK Biobank data [1].For each of 27 physiological quantitative traits, we first split the sample by sex chromosome karyotypes into XX individuals (females) and XY individuals (famles).
For each sex, we randomly split the sample into a training set with 80% of the individuals and a test set with 20% of the individuals.We estimated within-sex GWAS in the training and test sets separately.We denote by β(t) Zi the marginal effect estimate of the i th SNP in sex Z in set t, and by ŝ(t) Zi the corresponding standard error.We denote by β(train) M ∪F i the estimated effect of the i th SNP in the union of training sets (with both sexes included).In our validation procedure, we treated β(test) Mi as the "ground-truth," i.e., as the true effect.While β(test) Mi is of course a very noisy estimate of β Mi , it is an unbiased estimate, so we expect that estimates that are closer to β(test) Mi in the aggregate will also be closer to β Mi .Using this assumption, we define two empirical quantities, error( β(train) Given V Mi and |β Mi − β Fi |, Eq. 4 allows us to calculate the expected difference between error( β(train)

Mi
) and error( β(train) M ∪F i ).However, we do not observe V Mi or |β Mi −β Fi |.Instead, we estimated these quantities and examined the agreement of the estimates in the two sets.We estimated V Mi as ŝ(train)
Estimating ) 2 + (ŝ (train) F1 ) 2 , . . ., (ŝ and took the absolute value of ash's output posterior mean estimates of β Mi − β Fi as our estimate of |β Mi − β Fi | for each site.With our estimates of |β Mi − β Fi | and V Mi in hand, we calculated the expected difference in squared errors using Eq. 4 (x-axis in Fig. 3A) and compared it with the actual difference between error( β(train)
For Fig. 3A,B, we used the same procedure as above to estimate the x and y axes, except we did not employ any data splitting.

Equivalence of Regression Coefficient Estimates for Explicit Interaction Term and Stratified Model
In the main text, we discuss the estimation of context-dependent effects through the estimation of two effects in subsamples stratified by context.The generative model defined in Eq. 1 is where β A and β B are fixed, context-specific effects of a reference allele at a biallelic, autosomal variant i, g i ∈ {0, 1, 2} is the observed reference allele count.α A and α B are the context-specific intercepts, corresponding σ 2 B are context-specific observation variances.Another common way to define a generative GxE model is with a linear genotype-by-context interaction term, namely, where α is the mean trait value for individuals with zero reference alleles in context A, β G is the genetic effect, β E is the contextual effect, β GxE in the genotype-context interaction term, e i is the contextual covariate for individual i, defined to be 0 if the individual is in context A and 1 if the individual is in context B, and σ 2 is the homoskedastic noise term.
Here, we show that least squares estimation of the two models above context-specific effect estimates.The ordinary least squares (OLS) estimate of the intercept and context-specific allelic effects under the model described in Eq.S1 are αA , βA = argmin The OLS estimates under the model described in Eq.S2 are α, βG , βE , βGxE = argmin Since e i = 1 if individual i is in context B and 0 otherwise, we can re-write Eq.S5 as α, βG , βE , βGxE = argmin By Eq.S3 we have that 2 is minimized by setting α = αA and β G = βA .In turn, by Eq. S4 we have that 2 is minimized by setting α + β E = αB and setting β G + β GxE = βB .Thus, both terms are simultaneously minimized by setting Then, under the explicit interaction model, for an individual in context A the estimated intercept is αA and the estimated genetic effect is βA .And, for an individual in context B, the estimated intercept is αA + αB − αA = αB and the estimated genetic effect is βA + βB − βA = βB .Thus, conditional on the context, both the explicit interaction model and the stratified model provide the same effect and intercept estimates.
We note that there is still an important difference between the stratified and explicit interaction models, as the stratified model allows for heteroskedasticity between contexts where the explicit interaction model does not.We also note that the proof of equivalence relies on the context variable being binary.

Continuous Context Variable
Here, we extend our analysis of single site estimation in a binary context to the case of a continuous context variable.Specifically, for individuals indexed by i = 1, . . ., n, we consider the generative model where y i ∈ R is the continuous trait value for individual i, g i ∈ {0, 1, 2} is the number of reference alleles carried by a diploid individual i at a focal biallelic variant of interest, e i ∈ R is the continuous context for individual i, β 0 is the the mean trait value for individuals with the context variable value of zero and zero reference alleles, β G is the main (additive) allelic effect, β E is the effect of the context, β G×E is the main interaction effect, and σ 2 is the residual variance.For simplicity, we will assume β 0 = 0.
For a binary context, we considered the differnce in mean squared error (MSE) of the "additive model" and "GxE model" in estimating the context-specific genetic effect.Analogously, for a continuous context we will consider the difference in MSE in estimating the total genetic effect conditional on a particular value of the context variable.
To perform this analysis, we will consider the estimation of two models.The first model, which we will call the "GxE" model, is described in Eq. ( S6) above (where we omit the intercept because we assume it is 0).For convenience, we will use the following matrix notation Then, we consider the standard least squares estimate of β = (X T X) −1 X T y, for which β ∼ N (β, σ 2 (X T X) −1 ). (S7) We will assume that the vector of genotypes ⃗ g and the vector of context covariates ⃗ e are orthogonal in our sample and are both mean centered.(In Section 3 below, we discuss the case of gene-context correlation, where these assumptions are not met.)Then, the matrix X T X will be approximately diagonal.Under this approximation, we can write Eq. (S7) as where Now, we are interested in the error of the GxE model in estimating the genetic effect conditional on a particular value of the context e.By Eq. (S6), we define the true genetic effect conditional on a particular value of the context e as β G +β G×E e.Thus, to evaluate the error of the GxE model, the conditional expected error is By the definition of conditional variance, we can write Eq. (S9) as e 2 (by Eq. ( S8)).
We will also consider the estimation error associated with the "additive model," which in this case estimates the model described in Eq. (S6) assuming that β G×E = 0.In effect, this assumes that the genetic effect is equal across all values of the context.Because X T X is approximately diagonal matrix, removing the G × E covariate and regression coefficient has a negligible effect on both the least squares estimates of β G and its sampling distribution.Thus, we can approximate the error of the additive model by calculating the conditional expectation under the model in Eq. (S6).Again, by the definition of the conditional expectation, we can write Eq.
(S10) as Thus, conditional on a particular value of e, we we prefer the G × E estimator for estimating the genetic effect if and only if This decision rule has a very intuitive interpretation: the G × E model is preferable when the estimation variance of the interaction term is smaller than the squared interaction effect size.While the exact difference in MSE between the additive and G × E models depends on the value of the context variable (see Eq. (S11)), the decision rule (Eq.(S13)) does not.

Gene-Context Correlation
The simple decision rule given in Eq. (S13) was derived using the assumption that the observed genotype vector and the observed context vector were orthogonal.In statistical terms, one may expect this condition to approximately hold when sampling from a population where r(G, E), the correlation between genotypes and the context, is 0. In general, this assumption may not be reasonable.
When this assumption is violated, the design matrix X will no longer form a diagonal matrix when leftmultiplied by its transpose.This results in both (a) larger standard errors for each of the marginal coefficients and (b) non-zero sampling covariance between the coefficients.In addition, |r(G, E)| > 0 may result in omitted variable bias in the additive model.As a result, the decision rule of Eq. (S13) will not hold exactly under conditions of |r(G, E)| > 0, and in fact may be far from correct when |r(G, E)| is large.
In general, creating a closed-form decision rule to choose between the GxE and additive models in this scenario is not possible.However, in this section we use a simulation study to gain intuition about the change in bias-variance tradeoff in the presence of gene-context correlation.
In our simulations, for a group of n individuals, we generate genotypes as Then, for some constant c, we generate context covariates as independently for i = 1, . . ., n.Finally, we generate trait values y 1 , . . ., y n using Eq. ( S6) for particular values of the remaining parameters.
To explore how the bias-variance tradeoff changes for different values of r(G, E), we ran a set of simulations with different values of c in the range of −3.5 to 3.5.For each of these simulations, we generated a dataset of n = 250 individuals with σ 2 E = 1, p = 0.1, β 0 = 0, β G = 0.25, β E = 0, β G×E = −.125, and σ 2 y = 7.
The results of the simulation are shown in Supplementary Fig. S6.Non-zero r(G, E) had a number of consequences.First, as the magnitude of r(G, E) increases, the magnitude of the bias of the genetic effect estimate of the additive model also increases (Supplementary Fig. S6A).In our simulation, we set the genetic effect and the interaction effect to have opposite signs.When r(G, E) is positive, it pulls the genetic effect estimate towards the interaction effect.In contrast, when r(G, E) is negative, it pulls the genetic effect estimate away from the interaction effect.Since the G × E model includes all causal variables, it is unbiased.
Second, as the magnitude of r(G, E) increases, the variance of the genetic effect estimates of both the additive and G × E models increase (Supplementary Fig. S6B).This is because when variables in a regression are (positively or negatively) correlated, it is more difficult to distinguish between their effects.The G × E model appears to be affected more strongly here, which likely results from the inclusion of an additional interaction term that is highly correlated with both the genotype covariate and the context covariate.Third, the interaction term in the G × E model remains unbiased under r(G, E), and increases in variance with larger magnitude of r(G, E) (Supplementary Fig. S6 C,D).

Classification-Based Inference in Pallares et al., 2022
In the main text, we discuss the characterization of GxE by Pallares et al., based on significance testing under each of the diets.The authors classified variants according to whether or not their associations with survivorship were significant under each diet as follows: 1. significant under neither diet → classify as no effect.
2. significant when fed the high-sugar diet, but not when fed the control diet → classify as high-sugar specific effect.
3. significant when fed the control diet, but not when fed the high-sugar diet → classify as control specific effect.
4. significant under both diets → classify as shared effect.
Approximately 31% were high-sugar specific, while the remaining 69% of the variants were shared.Fewer than 1% were labelled as having control specific effects.The authors concluded that high-sugar specific effects on longevity are pervasive, compatible with the hypothesis of widespread cryptic genetic variation for longevity.
This "top hits" approach places an emphasis on the context(s) in which trait associations are statistically significant, rather than on estimating how the context-specific allelic effects covary.In addition, this particular classification system also does not cover all possible ways in which context-specific effects may differ.A key example is the case where true effects are concordant in sign but differ in magnitude.The strong positive covariance of estimated effects observed genome-wide (Fig. 4A) suggests this case merits consideration.Such variants may fall into each of the existing categories, depending on the magnitude of effects and statistical power in each of the contexts.Three of the four possible classifications are clearly wrong, but what about the "shared" category?
The class "shared" may be interpreted as suggesting lack of context dependency (Fig. 1C in [3]).However, it will tend to include variants having strong effects under both diets, regardless of whether or not the diet-specific effects are similar.As Pallares et al. also note, there are marked differences in the magnitude of diet-specific estimated effects of variants in the "shared" category.Amongst the approximately 1, 500 variants labeled as shared, the estimated effect under the high-sugar diet is on average about 1.3× that of the estimated effect in the control diet.
Notably, the classification as diet-specific does not imply that a variant has an unusually large effect under this diet.On average, variants classified as shared actually have a slightly larger estimated effect under the high-sugar diet than variants classified as high-sugar specific (t-test p=1.6 × 10 −5 ).Thus, instead of suggesting little-to-no effect under the control diet and a large effect under the high-sugar diet (as predicted by the hypothesis that cryptic genetic variation is pervasive), the classification as high-sugar specific may commonly just point us to intermediate size effects-large enough to be significant in the systematically-larger effects context (where power is higher) yet too small to be significant in the systematically-smaller effects context (where power is lower).
6 Reanalysis of GxE in the Pallares et al.Experiment In the main text, we show that the classification of significantly-associated variants in Pallares et al. is consistent with pervasive amplification, despite the fact that this mode of covariance was not one of the generative modes considered by the authors.Here, we re-estimate the modes of covariance of genetic effects under the two diets, using an approach adapted from the one we have previously used [4].Specifically, we used the framework of "multivariate adaptive shrinkage" (mash) to model the covariance of effects between the high-sugar and control diets across all variants.
We fit a Multivariate Normal mixture model to the summary statistics of Pallares et al. [5].In short, mash takes in a set of multidimensional effect estimates and standard errors and estimates the true underlying distribution of effects as a mixture of zero-centered multivariate normal distributions and a point mass at 0.
Each mixture component in the model must be pre-specified before the fitting procedure.Following Zhu et al.
[4], we pre-specified a dense grid of covariance matrices to be input to mash.In particular, each covariance matrix is of the form where σ 2 c is the variance of the true effects in the control group, σ 2 hs is the variance of the true effects in the high sugar group, and ρ is the correlation between the effects in the high sugar and control groups.
To specify the set of covariance matrices, we formed a grid encompassing possible values of the correlation of allelic effects across diets and ratio of variances under each of the diets.Specifically, we varied ρ in the set of values (−1, − 3 4 , − 1 2 , − 1 4 , 0, 1 4 , 1 2 , 3 4 , 1), and we varied the ratio of the variances, hs on a logarithmic scale in the range 0.5 and 1.5.Taking the Cartesian product of this sets yielded a grid of covariance matrices.
Because the number of pre-specified covariance matrices was large, and to avoid overfitting, we used a forward step-wise selection procedure [6].We first started with a model with only the Null covariance matrix (i.e., no effect under either diet), and then added one covariance matrix to the mixture in a greedy manner in each step, by searching over the space of all covariance matrices for a matrix that maximally improves the likelihood of the mixture model.In each step, we either decide to include the new covariance matrix in the model and move to the next step, or instead stop the procedure and not include the new matrix, if the improvement in likeligood compared to the previous step was below a pre-specified threshold.This threshold was determined by conducting a level α = .05likelihood-ratio test.
To mitigate possible linkage disequilibrium (LD) among the input SNPs, we performed the procedure described above on a subset roughly 12K out of the roughly 270K variants in the data.These SNPs each came as a random sample from 12K roughly evenly spaced chromosomal blocks (in terms of physical distance).
Based on the output of this procedure-the mixture weights in the model ultimately chosen-92% of variants have non-zero effects under both diets but larger effects under the high-sugar diet (Fig. S4).This suggests that instead of affecting just a small subset of variants, a high-sugar diet amplifies the effects of the vast majority of variants on lifespan.Moreover, mash assigned zero weight to covariance matrices where an effect 285 is non-zero in one context but zero in another.

Fi|
is an upwardly biased estimator.Instead, we used an empirical shrinkage estimator (ash [2]) to estimate |β Mi − β Fi |.In essence, ash takes in a vector of effect estimates β and a vector of corresponding standard errors ŝ, estimates a prior distribution of the true effects p(β), and then uses this prior to obtain a posterior distribution of true effects.Specifically, for a large random sample of sites, i = 1, . . ., p, we ran ash on the βG + βG×E e)|e] (since βG and βG×E are unbiased estimates) = V ar[ βG + βG×E e|e] (since β G and β G×E are fixed parameters) in HS Supplementary Figure S4: Estimated mixture weights on covariance of true effects in Pallares et al.The xaxis shows (symmetric) variance-covariance matrices for true effects in the high-sugar and control diets.The variance-covariance matrices displayed are the only matrices to which mash assigned non-zero weight (from a much larger set of possible covariance matrices, following a variable selection procedure).Variance-covariance matrices are scaled by a constant for each of interpretability.Abbreviations: C-Control diet; HS-high sugar diet.: Model performance under correlation between genotype and context.(A) Bias of genetic effect estimate across different values of correlation.(B) Variance of genetic effect estimate across different values of correlation.(C) Bias of term estimating the interaction between genotype and context in an interaction model.(D) Variance of term estimating the interaction between genotype and context in an interaction model.