Leveraging family data to design Mendelian Randomization that is provably robust to population stratification

Mendelian Randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases due to weak instruments as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We demonstrate in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, while standard MR methods yield inflated false positive rates. We applied MR-Twin to 121 trait pairs in the UK Biobank dataset and found that MR-Twin identifies likely causal trait pairs and does not identify trait pairs that are unlikely to be causal. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, while MR-Twin is immune to this type of confounding.

MR-Twin is a method that uses family-based genetic data to construct a test for whether the 83 exposure has an effect on the outcome that is immune to confounding from population structure. It 6 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023.

3:
for k ≤ groups do 4: Generate genotype matrixX X X (k) of population k such that x k ij ∼ Bin(2, f k j ) for each individual i and SNP j.
Stack the rows of each genotype matrix 8: return Genotype matrix X X X 9: end procedure We compared the performance of MR-Twin to other MR methods via simulations consisting 119 of two populations with allele frequency differences modeled according to the standard Balding-Algorithm 2 Simulate population-stratified phenotypes 1: procedure getPheno(X X X, U, h 2 , α E , γ ue , γ uo ) X X X is the normalized genotype matrix, U is a vector with the fixed population label for each sample, h 2 is heritability of exposure E, α E is effect of E on outcome O, γ ue and γ uo are fixed confounding effects of U on E and O.

5:
Simulate return ((E,O)) 7: end procedure can then be easily sampled given the parental genotypes (Methods). For each sample, we retain 124 the population label, a binary variable indicating which population each sample belongs to. Unless 125 otherwise specified, each simulation had 50,000 (false positive rate simulations) or 100,000 (power 126 simulations) external samples and 1,000 trio samples evenly split between two populations with 127 fixation index (F ST )= 0.01, and 100 SNPs, 50 of which were causal for the exposure trait. Unless 128 otherwise specified, the heritability of the exposure trait was set to h 2 = 0.2.

129
Next, we simulate both the exposure and outcome phenotypes following a linear model, as 130 outlined in Algorithm 2. This model allows the population labels from the first step to have an 131 effect on the exposure and outcome phenotypes, which models population stratification that violates 132 the MR assumptions. We use this setting to assess the false positive rates (FPR) of methods under 133 population stratification, allowing the effects of the population labels on the exposure and outcome 134 phenotypes to range from 0 (no confounding) to 0.8 (substantial confounding). In a separate set of 135 simulations to assess power, we set the confounding effect to 0 and varied the causal effect. 136 We performed 1000 simulation replicates under these settings, each time simulating a set of 137 external and trio genotypes and phenotypes according to the chosen parameters, performing linear 138 regression between each SNP and the exposure and outcome phenotypes, and using the resulting 139 association statistics as input to each of the MR methods. We excluded SNPs with association (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. previous work such as [31]; here we focus on the trio-based method they describe [11], and simply 147 refer to that method as "Brumpton" below.

148
The trio mode of MR-Twin maintained a calibrated FPR irrespective of the strength of con-  Figure S1). We observed that increasing number 170 of trios increased power for all methods, as expected, suggesting that the family-based methods can 171 be expected to obtain increased power as more genetic family data are ascertained in the future.

172
The relative power of the methods remained roughly consistent across these experiments. 173 We performed simulations increasing the magnitude of population structure as measured by 174 the F ST (without necessarily increasing the confounding strength), and observed that increasing CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint the population structure leads to further FPR inflation for standard MR methods ( Figure S2).
We observed inflated FPR for standard MR methods even when there is no confounding for large  Figure S5. We interpreted these findings as indicating that 100 digital twins are likely 190 stable enough for simulations for which many replicates are run and speed is a priority, but 1000 191 or more digital twins is recommended for one-off real data analysis. Therefore we simulated 100 192 digital twins in our simulations and 1000 in our real data analysis. We note, however, that while the 193 MR-Twin runtime increases linearly with the number of digital twins simulated, the generation and 194 statistic computation for the digital twins can be done in parallel, so many twins can be simulated 195 efficiently given multiple compute cores or nodes. For clarity of results, we did not take advantage 196 of this in our runtime assessment.

197
Finally, MR-Twin also enables users to use parent-child duo or sibling datasets (Methods). We 198 assessed the performance of these modes versus the trio mode of MR-Twin ( Figure S4). We found 199 that the duo and sibling modes, while having lower FPR than most standard MR methods, did (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint In order to assess the results given by MR-Twin relative to other approaches in a real data context,   . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint to yield much stronger p-values than other methods and the family-based methods (Brumpton and 234 MR-Twin) tended to be conservative (Supplementary Table S2), in line with our simulation results. 235 In particular, of the 121 usable trait pairs, IVW identified 78 as significant, Egger identified 56 as 236 significant, Brumpton identified 20 as significant, and MR-Twin identified 19 as significant.   12 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. represents another valuable aspect of MR-Twin even if population stratification is believed to be 252 well-controlled in a particular study. In general, MR-Twin is immune to any confounder that is 253 independent of the genotypes of offspring given the genotypes of their parents.  in the data, with the inflation scaling with increased fixation index ( Figure S2). The reason for 268 this is that, even though the variants were simulated independently, they were correlated with one 269 another through the population labels. For example, suppose we have two variants, X1 and X2 270 and a population label U . The causal diagram for these three variables is X1 ← U → X2, so 271 X1 and X2 are correlated. This is both a point of caution for performing MR simulations and 272 a practical concern for users, for whom a SNP correlation matrix may not always be available or 273 computable if they are working with summary data.

274
MR-Twin avoids both of these issues, without requiring the user to specify a SNP correlation 275 matrix or apply various approaches to mitigate weak instrument bias. First, both MR-Twin and 276 Brumpton avoid the correlated-variant issue because they condition on parental genotypes, severing 277 the link between the offspring genotypes and the population structure. Second, MR-Twin would not 278 lose FPR calibration due to weak instrument bias, because this phenomenon has nothing to do with 279 the aspects of the MR-Twin test that guarantee immunity from confounding due to population and 280 13 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint it is possible that the bias in the MR effect estimate used in the MR-Twin statistic (Methods) could 282 lower power, but because the MR effect estimate equally affects both the digital twin statistics and 283 the true offspring statistics, a reduction in power seems unlikely and was not observed empirically 284 ( Figure S8).

285
There is extensive literature on family-based methods for avoiding confounding due to popu- where there are many replicates and speed is the paramount concern. Second, the populations 300 of the external and family datasets should be similar. This is natural for biobanks like the UK 301 Biobank, but can be more challenging when attempting to combine separate datasets. Third, care 302 should be taken to ensure that the normalization method used and covariates controlled-for are 303 similar in the external and trio datasets in order to avoid potential loss of power. to reduce its effect [9].
Suppose that, corresponding to each individual's genotypes X, we also observe the genotypes P1 366 and P2 of their parents (we later relax the trio assumption to allow for parent-child duo or sibling 367 data). Let A := (P1, P2). According to the graphical criteria for d-separation developed by Pearl

368
[44], A d-separates X from Z (Figure 1b): Twins" below). We refer to these samples as "digital twins". For each such sample b, we then

391
The MR-Twin test is related to the digital twin test [18] and likewise is a kind of conditional 392 randomization test [19]. Like the digital twin test, MR-Twin leverages the fact that offspring 393 genotypes are conditionally independent of "external" confounders such as population structure 394 given the parental genotypes and uses a conditional randomization test to test the weaker, but 395 equivalent, null hypothesis of no effect conditioned upon the parental genotypes.

396
Let X be a vector of offspring genotypes, and let A be the genotype vectors of the two parents 397 of the offspring. A may be directly observed, as in trio data, or inferred using parent-child duo 398 or sibling data (see "Generating Digital Twins"). Let Z be one or more "external" confounders, Thus, population structure is an external confounder, while horizontal pleiotropic traits are not. 401 We therefore have Assuming that all confounders are external and that X is significantly associated with E, O is 403 independent of X given A under the MR null hypothesis that E has no effect on O. This is because 404 18 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint hypothesis), and all paths not through E are blocked by conditioning on A as shown in Equation

406
6. We therefore want to test If this holds, then we cannot rule out that either X has no effect on E or E has no effect on O. 408 We test this null hypothesis via a conditional randomization test [19].

409
In testing this null hypothesis, it is helpful to be able to leverage SNP effect sizes estimated 410 from large, external datasets (such as publicly released summary statistics for resources like the 411 UK Biobank [1]), as this will often yield more statistically significant variants and better effect size 412 estimates than those generated using small genetic family datasets. We therefore note that the 413 following property also holds: where we use the shorthandβ to refer to the estimated effect sizes of each SNP on the exposure 415 and outcome traits. 416 We construct "digital twins"X sampled from the parental genotypes via Mendelian inheritance 19 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint under the null, whereT = t((X n ; O n ) N n=1 ;β). We can then use the procedure outline in Algo-423 rithm 3 to obtain a p-value for this test statistic [19].
4.4 Generating Digital Twins 436 We have assumed that trio data is available thus far for simplicity. However, the MR-Twin frame-437 work can also be used when parent-child duo data or sibling data are available. Here we discuss 438 the algorithms used to generate digital twins given trio, parent-child duo, or sibling data. 439 20 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint We assume that the SNPs used in the MR instrument are independent, a common assumption 441 when multi-SNP instruments are used in MR [27]. Therefore, we separately sample the genotype of 442 each SNP of the digital twin given the parent and/or offspring genotypes at that SNP. Let (D n ) be 443 the (N × M ) matrix of digital twin genotypes we will sample, corresponding to the true "offspring" 444 genotypes in (X n ). Further, let n index some family and j index some SNP, such that P1 nj (for 445 example) is the genotype for one parent in family n at SNP j. If we have both parents available, 446 sampling D nj is straightforward. Because the SNPs are considered independent, we do not need 447 to know the parental haplotypes. If a parental genotype P1 nj is 0 or 2, respectively, then a 0 or 448 1, respectively, is inherited by D nj . If the parent genotype is 1, then either 0 or 1 is inherited 449 with 50% probability each. D nj inherits alleles from the two parents independently. This can be 450 summarized as where Bern stands for the Bernoulli distribution, for each family n and SNP j.

21
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint the parents have 2 genotypes than 1 genotypes. Most simply, if one sibling has a 2 genotype at a 465 SNP and the other sibling has a 0, then the parents must both be heterozygotes. In all other cases, 466 approximation is needed.

467
The first approach is straightforward and involves randomly drawing two haplotypes from the 468 observed sibling haplotypes to generate a digital twin. This shuffling approach gives a rough 469 approximation of the likelihood of digital twin genotypes given the information the observed siblings 470 provide. The second approach, described in the Supplementary Note, involves using the sibling data 471 to infer a distribution over the possible parents, then performing a weighted random draw of digital 472 siblings based on those parents. In practice, we found that the shuffling approach was faster and 473 yielded lower FPR than the probabilistic approach while achieving similar power, so we used the 474 shuffling approach for the results in this paper.  CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint domization in the presence of residual population stratification, batch effects and horizontal 532 pleiotropy," bioRxiv, 2020. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 6, 2023. ; https://doi.org/10.1101/2023.01.05.522936 doi: bioRxiv preprint