Fast and flexible joint fine-mapping of multiple traits via the Sum of Single Effects model

We introduce mvSuSiE, a multi-trait fine-mapping method for identifying putative causal variants from genetic association data (individual-level or summary data). mvSuSiE learns patterns of shared genetic effects from data, and exploits these patterns to improve power to identify causal SNPs. Comparisons on simulated data show that mvSuSiE is competitive in speed, power and precision with existing multi-trait methods, and uniformly improves on single-trait fine-mapping (SuSiE) in each trait separately. We applied mvSuSiE to jointly fine-map 16 blood cell traits using data from the UK Biobank. By jointly analyzing the traits and modeling heterogeneous effect sharing patterns, we discovered a much larger number of causal SNPs (>3,000) compared with single-trait fine-mapping, and with narrower credible sets. mvSuSiE also more comprehensively characterized the ways in which the genetic variants affect one or more blood cell traits; 68% of causal SNPs showed significant effects in more than one blood cell type.

‚ Allows correlated traits.Methods marked as "no" assume measurement error is independent across traits.Assuming independent errors is appropriate if different, non-overlapping samples were used to measure the different traits.
‚ Models effect sharing.Methods with a "no" in the column (implicitly or explicitly) assume that the effects of a SNP on different traits are independent (conditioned on the SNP having a nonzero effect on one or more traits).A "yes" means that the method can model correlations among the effects on different traits.
‚ Sample runtimes.Sample runtimes were obtained by running the method on a simulated data set with J " 5,000 SNPs, N " 250,000 individuals, and R " 2 or R " 20 traits.When the method accepted either full data or summary data, the summary-data version of the method was used.Note the sample size, N , should only affect the running time for methods that only accept full data; it should not affect runtime of summary-data methods.When the method limits the number of causal SNPs, the upper limit was always set to 10 or the the largest acceptable value if this was less than 10.For PAINTOR, the upper limit was set to 2 because, in our tests, PAINTOR ran for a very long time when allowing 3 or more causal SNPs.See "Computing environment" for details about the computing environment used to obtain these runtimes.
‚ Software and version.The name of the software and the version number of the software that was used in our evaluations.For mvSuSiE, the git commit id was given instead of the version number.

Conventions for mathematical expressions
Here we summarize the notational conventions used in the main text and supplement.Matrices are written using bold, uppercase letters (e.g., A), column vectors are written as bold, lowercase letters (e.g., a), and scalars are written in plain font (e.g., a, A).For indexing, we usually use a capital letter to denote the total number of elements, and we use the corresponding lowercase symbol to denote the index; e.g., j " 1, . . ., J. We use R to denote the real numbers and R d for the set of real vectors of dimension d.We use ∆ d to denote the simplex on R d ; that is, all x P R d such that x 1 `¨¨¨`x d " 1, x i ě 0, i " 1, . . ., d.We use R mˆn to denote the set of all m ˆn matrices with real entries, S n `for the set of all n ˆn real, symmetric positive definite matrices, and S n `for the set of all n ˆn real, symmetric positive semi-definite matrices (this set may include matrices that are singular, or not invertible).We write the matrix transpose of A as A ⊺ .For a square matrix A, its inverse is A ´1, its determinant is det A, also written as |A|, the trace is trpAq, and A : denotes the Moore-Penrose inverse ("pseudoinverse") of A. We use I n for the n ˆn identity matrix, and we use 1 n as a shorthand for a column vector of ones of length n.We use a ⊺ to denote a row vector.We denote the outer product of (column) vectors a and b as a b b :" ab ⊺ , and we denote the elementwise product of matrices A and B as A ˝B. Finally, we typically denote ordered or unordered sets using calligraphic letters (e.g., A).

The multivariate simple regression model
The mvSuSiE model is based on a simple multivariate regression model with one variable which we refer to as the "multivariate simple regression model."This model is Y " MN N ˆRpxb ⊺ , I N , V q. (1) where Y P R N ˆR is a matrix of R observed responses in N samples, x is a vector of N observations for a single explanatory variable, b P R R is the (unknown) vector of regression coefficients for the R responses, V P S R ``is an invertible covariance matrix, and MN nˆm pM , U , V q is the matrix normal distribution [2,3] with mean M P R nˆm and covariance matrices U , V P S mˆm `.For now, this model does not include an intercept.
In the following, we give the expression for the likelihood for (1) and relate it to the more familiar multivariate normal distribution.The likelihood for the multivariate regression model (1) Given V , the least-squares estimate of b, denoted by b, and its variance-covariance matrix, Ŝ, are b " Note that b is also the value of b maximizing the likelihood (2).Using these quantities, the likelihood (2) can be rewritten as This expression is convenient because the terms involving b are multivariate normal up to a constant of proportionality, and in particular we have that ℓpb; x, Y , V q 9 N R pb; b, Ŝq, (6) where N n pθ; µ, Σq denotes the multivariate normal density at θ P R n with mean µ P R n and covariance Σ P S n `.
Remark 1. Calculation of b and Ŝ only requires the summary statistics x ⊺ x, Y ⊺ x.Also, since the likelihood (6) only involves b and Ŝ, the likelihood up to a constant of proportionality can also be computed with only these summary statistics.

The multivariate simple regression model with a normal prior
In the following proposition, we apply the above results for the multivariate simple regression model to a Bayesian multivariate simple regression model with a normal prior.
Proposition 1 (Bayesian multivariate simple regression with a normal prior).Consider the multivariate simple regression model (1) with a multivariate normal prior on the regression coefficients, b | S 0 " N R p0, S 0 q, where where The Bayes Factor (BF) comparing this model against the null model pb " The same BF can also be equivalently expressed as a ratio of two multivariate normal densities, Remark 2. Since the data x, Y only enter the expressions for the posterior mean b 1 and posterior covariance S 1 through b and Ŝ, it follows that calculation of the posterior mean and covariance only requires summary statistics x ⊺ x, Y ⊺ x.Similarly, the BF ( 14) can be computed with only x ⊺ x, Y ⊺ x.
Updating a scaling factor in the prior.Suppose the prior covariance is parameterized as S 0 " σ 2 0 U , in which U P S R `, σ 0 ě 0.Here we assume U is a fixed parameter and we would like to estimate the scaling factor σ 0 by maximizing the likelihood, σ2 0 :" argmax or, equivalently, by maximizing the Bayes Factor, which may be more convenient to compute, σ2 0 " argmax When U is invertible, the maximum-likelihood estimate σ2 0 can be computed using a simple EM algorithm [4], in which the M-step update is The E-step then consists of computing the posterior second moment, in which the posterior mean b 1 and posterior covariance S 1 are given by (9,10).The maximumlikelihood estimate σ2 0 is then recovered by iterating the E-step (18) and M-step (17) until convergence.The update (17) requires that U be invertible.To allow for singular matrices, a more general M-step update is in which R 1 ď R is the rank of U .Note ( 17) and ( 19) are equivalent when U is invertible; that is, when R 1 " R.

The multivariate simple regression model with an intercept
Now we extend the multivariate simple regression model (1) to include an intercept.We show that including an intercept in the model is equivalent to "centering" x and the columns of Y so that they all have means of zero.More precisely, centering is equivalent to integrating out the intercept with respect to an (improper) uniform prior on the intercept.This is the multivariate generalization of the result for univariate regression given in [5].This result is summarized in Proposition 2.
The multivariate simple regression model with an intercept is Proposition 2 (Multivariate simple regression with an intercept).Consider the multivariate simple regression model with an intercept (20).The least-squares estimate of µ-which is also the value of µ maximizing the likelihood (21)-and its covariance matrix are in which x : x i is the sample mean of x, and ȳ :" 1 N Y ⊺ 1 N is the vector containing the column means of Y .
The profile likelihood for b is in which x :" x ´x1 N and Ỹ :" Y ´1N ȳ⊺ are the centered x and Y .In other words, the profile likelihood for the multivariate simple regression with an intercept is the same as the likelihood for the multivariate simple regression without an intercept if we first center x and Y .Centering x and Y is therefore equivalent to including an intercept in the multivariate regression and estimating the intercept by maximum-likelihood.
Next, consider Bayesian calculations for µ with a multivariate normal prior, µ | S 0µ " N R p0, S 0µ q, in which S 0µ P S R `is a (possibly singular) covariance matrix.The posterior for µ conditioned on b is where The marginal likelihood obtained by averaging over the intercept is In the special case of an (improper) uniform prior on µ, defined as µ " N R p0, S 0µ q with S ´1 0µ Ñ 0, the posterior mean reduces to the least-squares estimate µ 1 " μ, with covariance matrix S 1µ " Ŝµ , and the marginal likelihood (28) simplifies to in which x :" x ´x1 and Ỹ :" Y ´1ȳ ⊺ are the centered x and Y .In other words, the marginal likelihood for multivariate simple regression with an intercept (20), when we use an improper uniform prior for the intercept, is the same (up to a constant of proportionality) as the likelihood for multivariate simple regression without an intercept (1) after first centering x and Y .
See below for a proof of this result.
Remark 3. To account for an intercept when computing posterior quantities and BFs for the multivariate simple regression model (1), x and Y should be centered before computing the summary statistics; that is, the summary statistics should be x⊺ x and Ỹ ⊺ x.See [1] for how to center summary statistics if they are not centered.
The Bayesian multivariate simple regression model with a mixture prior Here we extend the Bayesian multivariate simple regression model with a normal prior to a model with a mixture-of-normals prior, in which S 0 :" tS 01 , . . ., S 0K u, each S 0k P S R `is a (possibly singular) covariance matrix, and ω :" pω 1 , . . ., ω K q P ∆ K are the mixture weights.The normal prior (7) is as a special case of (30) when K " 1.
To facilitate derivation of the posterior computations, we introduce the following data augmentation that recovers (30) after integrating over a latent random variable ξ P t1, . . ., Ku, This augmented model allows us to reuse the posterior computations from the simpler models; in particular, posterior computations conditioned on ξ reduce to computations for the Bayesian multivariate regression model with a normal prior, which we state formally in the following proposition.
Proposition 3. Given S 0 and ω, the Bayes factor comparing this model against the null model pb " 0q is where the expressions for the individual BFs in the sum are given in Proposition 1.The posterior distribution of b is a mixture of normals, in which b 1k and S 1k are the posterior mean and covariance of b conditioned on ξ " k, given by ( 9) and (10), respectively, after substituting S 0 with S 0k , and the posterior mixture assignment probabilities ("responsibilities") are The posterior mean and covariance of b are From the above remarks, the posterior quantities and Bayes factors for this model can be computed using the summary statistics x ⊺ x, Y ⊺ x instead of using the full data x, Y .To formalize these computations with summary statistics, we introduce notation for Bayes factors and posteriors in terms of summary statistics: BF mix-ss px ⊺ x, Y ⊺ x, V , S 0 , ωq :" BF mix px, Y , V , S 0 , ωq.
Updating a scaling factor in the prior.Similar to above, here we consider a special case of the mixture-ofnormals prior in which the prior covariances are parameterized as S 0k " σ 2 0 U k , U k P S R `, k " 1, . . ., K, σ 0 ě 0. We assume the U 1 , . . ., U K are fixed parameters and we would like to estimate the scaling factor σ 0 by maximizing the likelihood, This is the same as maximizing the BF since the denominator in the BF does not depend on σ 0 , and the BF may be more convenient to compute: Again taking a simple EM approach to computing the maximum-likelihood estimate, the M-step update allowing for singular matrices is in which the E-step involves computing the posterior probabilities ω 1k and the posterior second moments, Here, R k ď R denotes the rank of U k .

The multivariate single effect regression model
The single effect regression (SER) model is a multiple regression model in which exactly one of the regression coefficients is non-zero [6].Here we define the multivariate single effect regression (MSER) model, which extends the SER model to the multivariate setting, and forms the basis for mvSuSiE.
in which Y P R N ˆR is a matrix storing N observations about R regression outcomes, X P R N ˆJ is a matrix storing N observations about J regression variables, B P R JˆR is a matrix of regression coefficients for the J variables and R outcomes, b P R R is a vector of "single effect" regression coefficients, V P S R ``is an invertible covariance matrix, S 01 , . . ., S 0K P S R `are (possibly singular) prior covariance matrices, γ P t0, 1u J is a random variable indicating which of the J variables explains the multivariate outcome (exactly one element is one, and the remaining are zero), π P ∆ J specifies the probabilities in the multinomial prior on γ, ω P ∆ K is a vector specifying the weights in the mixtureof-normals prior on b, and Multinompm, πq denotes the multinomial distribution for m random trials with probabilities π.Because γ is drawn from the multinomial distribution with a single random trial, γ has exactly one non-zero element; that is, γ k " 1 for some k, and γ k 1 " 0 for all other k 1 ‰ k.From this property, it follows that the matrix B has at most one row containing non-zero elements, and this row is given by b.The MSER model recovers the SER model when R " 1, K " 1, ω 1 " 1 and S 01 " 1.
The posterior distribution of B, γ is summarized by the following proposition.
Proposition 4. Let S 0 " tS 01 , . . ., S 0K u denote the K prior covariance matrices.Under the MSER model (47), the posterior distribution of B " γ bb given parameter settings Θ :" tV , S 0 , ω, πu is given by rmsγ where α " pα 1 , . . ., α J q is the vector of posterior inclusion probabilities (PIPs), which can be written using the Bayes factors (32) for the Bayesian multivariate simple regression model with a mixture of normals prior, and where the means b 1jk , variances S 1jk and posterior mixture weights ω 1jk are given by using the definitions of ω 1k , b 1k and S 1k in eqs.34-36.The posterior mean and covariance of b conditioned on γ are using definitions (37,38), and therefore the posterior mean of B is For describing the model fitting algorithms (below), we define a function, MSER, that returns the posterior distribution of b, γ under the MSER model given the data X, Y and the model parameters in which α " pα 1 , . . ., α J q is the vector of PIPs (49), Ω 1 is the J ˆR matrix of posterior mixture weights ω 1jk (52), B 1 is the set of posterior means b 1jk (50) for all j, k, and S 1 is the set of posterior covariances S 1jk (51) for all j, k.
The multivariate single effect regression model with summary statistics.From the above remarks, X ⊺ X and X ⊺ Y are sufficient to compute the posterior distribution of b, γ. (More precisely, only the diagonal elements of X ⊺ X are necessary to compute the posterior distribution of b, γ.)To formalize these computations with summary statistics, we also define Bayes factors and posterior quantities in terms of summary statistics: and the PIPs, To describe the model fitting algorithms that work with summary statistics, we define a function, MSER-ss, that returns the posterior distribution of b, γ under the MSER model given the summary statistics X ⊺ X, X ⊺ Y and the model parameters Θ: in which α " pα 1 , . . ., α J q is the vector of PIPs α ss j (60), Ω 1 is the J ˆR matrix of posterior mixture weights ω ss 1jk (59), B 1 is the set of posterior means b ss 1jk (57) for all j, k, and S 1 is the set of posterior covariances S ss 1jk (58) for all j, k.
Updating a scaling factor in the prior.Here we extend the MSER model to allow for estimating a scaling parameter, σ 2 0 ě 0, in which S 0k " σ 2 0 U k , k " 1, . . ., K. As above, we estimate σ 2 0 by maximum likelihood, σ2 0 :" argmax which is equivalent to maximizing a weighted sum of the Bayes factors, σ2 0 " argmax As before, the maximum-likelihood estimate can be computed using a simple EM algorithm.The M-step update in the EM algorithm is given by in which R k ď R denotes the rank of U k .The E-step in the EM algorithm consists of computing the posterior mixture weights ω 1jk , the posterior inclusion probabilities α j , and the posterior second moments at the current setting of σ 2 0 .

The mvSuSiE IBSS algorithm
The Iterative Bayesian Stepwise Selection (IBSS) algorithm for fitting mvSuSiE is derived by extending the ideas of [6] to the multivariate setting.Similar to IBSS for SuSiE, the IBSS algorithm for mvSuSiE is a coordinate ascent algorithm for optimizing a variational approximation [7,8] to the posterior distribution for B p1q , . . ., B pLq under the mvSuSiE model.The basic mvSuSiE IBSS algorithm is given in Algorithm 1. Lines 6-11 compute the posterior mean regression coefficients b plq for the lth single effect conditioned on each of the variables j " 1, . . ., J being the "single effect variable."These posteriors are then stored in a J ˆR matrix (lines [10][11].Once these are computed and stored, the unconditional posterior means Bplq are simply the conditional posterior means, µ l , weighted by the PIPs α l . The optional step of estimating the scaling factors σ 2 0l (line 5) is mainly for pruning unneeded single effects (see "Choice of L").This parameter estimation step can be viewed as an EM update in which the E-step is approximate [9,10].
Derivation of the mvSuSiE IBSS algorithm.The IBSS algorithm fits an approximate posterior distribution for B p1q , . . ., B pLq by minimizing a Kullback-Leibler (KL) divergence from the approximate posterior to the exact posterior subject to constraints on the approximate posterior.More precisely, denoting the exact posterior by p post and the approximate posterior by q, we seek a q that minimizes the KL-divergence D KL pq } p post q [11] subject to constraints on the q.Since the KL-divergence itself is hard to compute, we instead maximize the "evidence lower bound" (ELBO), which is equivalent to minimizing the KL-divergence, but easier to compute.The ELBO for the mvSuSiE model is in which the prior ppB p1q , . . ., B pLq q " ś L l"1 p l pB plq q is defined when thee mvSuSiE model is defined.Similar to [6], we constrain the approximate posterior so that it factorizes by the single effects: qpB p1q , . . ., B pLq q " L ź l"1 q l pB plq q. (67) Algorithm 1 Iterative Bayesian Stepwise Selection (IBSS) for mvSuSiE Require: Require: Maximum number of non-zero effects, L P t1, . . ., Ju.
Fitting the MSER to the residuals in Algorithm 1 maximizes the ELBO for a single effect.The main result of this section is summarized by the following proposition.
Proposition 5. Let F pq, g, V ; Y q be the ELBO (66) under the mvSuSiE model, and suppose that q is constrained to factorize over the single effects l " 1, . . ., L as in (67).The setting of q l pB plq q that maximizes the ELBO (66) while g, V and the other factors q l 1 pB pl 1 q q, for all l 1 ‰ l, are fixed has the following closed-form solution: ql :" argmax q l F pq, g, V ; X, Y q " MSERpX, Rl ; Θq, in which MSERpX, Y ; Θq defined in (56) gives the posterior distribution of B under the multivariate single effect regression (MSER) model given data X, Y and model parameters Θ, and we define as the N ˆR matrix of expected residuals that ignore the lth single effect, and Bplq :" E q rB plq s.
Corollary 1.A corollary of Proposition 5 is that the optimal qpBq factorizing as (67) is a product of factors q l pB plq q in which each factor is a posterior distribution under an MSER model (47).This proposition and corollary are multivariate generalizations of the results for the univariate regression setting given in [6] (Propositions 1 and A1 in that paper).The proof of Proposition 5 is sketched out in the next two sections.
Special case when L = 1.The mvSuSiE model with L " 1 is an MSER model (47).For this special case, the ELBO (66) is in which "ERSS" is the multivariate expected residual sum of squares, ERSS :" E q rpY ´XB p1q q ⊺ pY ´XB p1q qs.
In these expressions, we have dropped the "l" subscripts whenever they appear since there is only one single effect l. (We have kept the "plq" superscript for B to avoid confusing B p1q with B defined in the prior.) The variational distribution maximizing the ELBO (71), q :" argmax q F MSER pq, g, V ; X, Y q, is the true posterior, qpB p1q q " p post pB p1q q :" ppB p1q | X, Y , V , gq [7]; that is, q is the posterior distribution under an MSER model.At q, the ELBO (71) is equal to the marginal log-likelihood; that is, F MSER pq, g, V ; X, Y q " log ppY | X, V , gq.
Coordinate ascent update for lth single effect.To allow for posterior computations that are tractable, we restricted q to the set of posterior distributions that factorize over the single effects l " 1, . . ., L (67).With this approximation, we can now divide and conquer; we consider the problem of finding a q l pB plq q that maximizes the ELBO (66) while the remaining factors are fixed.
Expanding terms involving q l , the ELBO (66) is where the "const" is a placeholder for terms in the ELBO not involving q l , and the exprected residual sum of squares (ERSS) in this expression is ERSS :" E q rpY ´XBq ⊺ pY ´XBqs.
As a reminder, B " ř L l"1 B plq .Expanding the ERSS further, and making use of the property from (67) that the covariances are zero between all B plq and B pl 1 q whenever l ‰ l 1 , the ELBO can be rewritten as F pq, g, V ; X, Y q " ´N 2 log |2πV | ´1 2 trrV ´1ERSSplqs ´DKL pq l pB plq q } g l pB plq qq `const, in which ERSSplq :" E q rp Rl ´XB plq q ⊺ p Rl ´XB plq qs, and Rl is defined in (70).Assuming l " 1 without loss of generality, (75) is of the exact same form as (71) after ignoring terms that do not involve q l , and after replacing Y with Rl .In summary, the ELBO for mvSuSiE with L ą 1 can be rearranged to exactly match the expression for the mvSuSiE ELBO with L " 1 if we ignore terms not involving q l .
Computing the ELBO While computing the ELBO (68) is not strictly needed to implement the IBSS algorithm, in practice it is useful for monitoring progress of the IBSS algorithm and for comparing different mvSuSiE model fits.Here we explain how we compute the ELBO.
From the definition of the mvSuSiE model, for a given single effect l only one row of B plq contains nonzero values, so E q rb jr b kr 1 s " b jr b kr 1 " 0 for any j ‰ k and r, r 1 P t1, . . ., Ru, where b jr is an entry of B plq .Therefore, the ERSS simplifies to where D is a J ˆJ diagonal matrix with diagonal entries d jj :" pX ⊺ Xq jj and C plq j is the R ˆR covariance matrix for the jth row of B plq with respect to the approximate posterior q l , which can be asily computed from the results of Proposition 4.
The remaining terms in the ELBO (68) are KL-divergences, It is most convenient to compute this when q l pB plq q is updated, as we show next.
Computing the KL-divergence when L = 1.From (71), the KL-divergence for the mvSuSiE model with where the ERSS is given in (72).Recall, the optimal variational distribution is equal to the true posterior, qpB p1q q " ppB p1q | X, Y , V , gq.And when the optimal variational distribution is attained, the ELBO is equal to the marginal log-likelihood.Therefore, we have in which the Bayes factor BF mix was derived in Proposition 4. Finally, to arrive at the desired KLdivergence D KL pq l } p l q, we substitute Rl for Y in (81).A similar approach to computing the ELBO was taken in [6].
The IBSS algorithm for mvSuSiE with sufficient statistics Algorithm 2 outlines the IBSS algorithm for mvSuSiE with sufficient statistics in which the computations are rearranged so that they only require the sufficient statistics X ⊺ Y and X ⊺ X.

Proofs and extended derivations
Proof of Proposition 2. First we give the proof from the maximum-likelihood estimation perspective in which we treat µ. as a free parameter to be optimized.
The profile likelihood for b is therefore The conditional posterior for µ given b is ppµ | x, Y , V , S 0µ , bq 9 expt´1 2 µ 1 S ´1 1µ µ 1 u, which is the multivariate normal density with mean The marginal likelihood for b obtained by integrating over µ is ˆexpt´1 2 trrV ´1pY ´xb ⊺ ´1µ ⊺ q ⊺ pY ´xb ⊺ ´1µ ⊺ q `S´1 0µ µµ ⊺ su dµ.Expanding terms, we get When S ´1 0µ Ñ 0, the marginal likelihood simplifies to The marginal likelihood for b in the model with an intercept (and x, Y are not centered) is therefore proportional to the likelihood for the model without an intercept and when x, Y are centered.Supplementary Figure 6.Comparison of fine-mapping methods in simulations with two correlated traits and independent effects, Part B: detection of cross-trait causal SNPs (cross-trait CSs).The dotted horizontal lines show the target coverage (95%).Error bars show 2 times the empirical s.e. from the results across the n = 600 simulations.Note that flashfm does not provide cross-trait CSs and therefore was not included in these plots.See Supplementary Fig. 5 for Part A of these results, and for more details.Also note that the plots shown here for σ = 0 are the same as the plots in the top row of Supplementary Fig. 9. See Supplementary Fig. 5 for Part A of these results, and for more details.CAFEH does not provide trait-wise CSs so was not included in these plots.Also note that the plots shown here for σ = 0 are the same as the plots in the top row of Supplementary Fig. Supplementary Figure 8.Comparison of fine-mapping methods in simulations with two independent traits and correlated effects, Part A: detection of cross-trait and trait-wise causal SNPs using SNP-wise measures.In these simulations, two independent traits were simulated with correlated effects, with correlation σ = 0, 0.5, 1. 600 data sets were simulated for each choice of σ.In a fourth set of simulations (bottom row), the effects were simulated from a mixture of multivariate normals with different covariances (see "Simulation Scenarios" in the Online Methods for details).The plots on the left-hand side show power vs. FDR in identifying cross-trait causal SNPs using PIPs (or max-PIP for SuSiE).FDR and power were calculated as the threshold was varied from 0 to 1 (n = 600 simulations).Open circles are drawn at a threshold of 0.95.Note that flashfm does not provide a cross-trait measure so it is not included in the left-hand plots.The plots on right-hand side show power vs. FDR in identifying trait-wise causal SNPs.FDR and power were calculated from the 600 simulations as the marginal posterior probability (MPP) threshold (flashfm), PIP (SuSiE), or minimum lfsr (mvSuSiE) was varied from 0 to 1. Closed circles are drawn at a minimum lfsr threshold of 0.01 or a PIP/MPP threshold of 0.99.Also note that the results shown here for σ = 0 are the same as the top row of Supplementary Fig. 5 and (for some methods) Supplementary Fig. 3.Note that flashfm does not provide cross-trait CSs and therefore was not included in these plots.Also note that the plots in the top row are the same as the σ = 0 plots in Supplementary Fig. 6.See Supplementary Fig. 8 for Part A of these results, and for further explanations.Note that CAFEH does not provide trait-wise CSs so was not included in in these plots.Also note that the plots in the top row are the same as the σ = 0 plots in Supplementary Fig. 7.In each scenario, SNPs from all simulations (n = 600) were grouped into bins according to their reported PIP (10 equally spaced bins from 0 to 1).The plots show the average PIP from each bin (X axis) against the proportion of SNPs in that bin that are causal (Y axis).For a given bin, the error bar depicts 2 times the empirical s.e. from all n = 600 simulations.A well-calibratedmethod should produce points near the diagonal.See Supplementary Figures 1  and 2 for details on the mvSuSiE variants compared.Supplementary Figure 13.Prior on multivariate SNP effects estimated from the UK Biobank blood cell traits.Each plot shows a 16 ˆ16 scaled covariance matrix U k and its corresponding estimated mixture weight ω k .These covariance matrices and mixture weights specify the mixture-of-multivariate normals prior used in the mvSuSiE analyses of the UK Biobank blood cell traits.For visualization purposes only, each plot shows the scaled covariance matrix U k {s 2 k , where s 2 k is the absolute value of the largest (in magnitude) entry of U k , so that all of the plotted values lie between -1 and 1.Each covariance matrix is labeled by how the covariance estimate was initialized (see "Data-driven prior" in Online Methods).Here we compare the 6 blood cell traits that were included in both our mvSuSiE fine-mapping analyses and in the fine-mapping analyses of Vuckovic et al. [18].Each plot shows the posterior z-scores (posterior means divided by posterior standard deviations) computed from the Vuckovic et al enrichment results against the posterior z-scores from our enrichment analysis.The Vuckovic et al z-scores were downloaded from https://github.com/bloodcellgwas/manuscript_code,then posterior z-scores were computed using adaptive shrinkage [19].

ABCSupplementary Figure 1 .
Detection of cross-trait causal SNPs (cross-trait PIPs) Trait-specific + Shared Effects (Detection of cross-trait causal SNPs (cross-trait CSs) Detection of trait-wise causal SNPs (trait-wise CSs) Comparison of mvSuSiE variants with different priors.See the next page for details.

Supplementary Figure 9 .
Comparison of fine-mapping methods in simulations with two independent traits and correlated effects, Part B: detection of cross-trait causal SNPs (cross-trait CSs).See Supplementary Fig.8for Part A of these results, and for further explanations.The dotted horizontal lines show the target coverage (95%) and error bars show 2 times the empirical s.e. from the results in the n = 600 simulations.

Supplementary Figure 10 .
Comparison of fine-mapping methods in simulations with two independent traits and correlated effects, Part C: detection of trait-wise causal SNPs (trait-wise significant CSs).

Supplementary Figure 12 .
Covariance matrices used to simulate the effects of the causal SNPs in Scenario a.Each plot shows a 20 ˆ20 covariance matrix, U k , and its corresponding mixture weight, ω k , in the mixture-of-multivariate normals distribution used to simulate the effects of the causal SNPs.Note all covariance matrices contain elements spanning the range 0 to 1, and none of these matrices contain negative elements.

Supplementary Figure 19 .
Comparison of gchromVAR enrichments based on mvSuSiE vs. Vuckovic et al analysis.