Fast identification of biological pathways associated with a quantitative trait using group lasso with overlaps.
Weiner M, Aisen P, Weiner M, Aisen P, Petersen R, Jack CR Jr, Jagust W, Trojanowki JQ, Toga AW, Beckett L, Green RC, Saykin AJ, Morris J, Liu E, Green RC, Montine T, Petersen R, Aisen P, Gamst A, Thomas RG, Donohue M, Walter S, Gessert D, Sather T, Beckett L, Harvey D, Gamst A, Donohue M, Kornak J, Jack CR Jr, Dale A, Bernstein M, Felmlee J, Fox N, Thompson P, Schuff N, Alexander G, DeCarli C, Jagust W, Bandy D, Koeppe RA, Foster N, Reiman EM, Chen K, Mathis C, Morris J, Cairns NJ, Taylor-Reinwald L, Trojanowki JQ, Shaw L, Lee VM, Korecka M, Toga AW, Crawford K, Neu S, Saykin AJ, Foroud TM, Potkin S, Shen L, Kachaturian Z, Frank R, Snyder PJ, Molchan S, Kaye J, Quinn J, Lind B, Dolen S, Schneider LS, Pawluczyk S, Spann BM, Brewer J, Vanderswag H, Heidebrink JL, Lord JL, Petersen R, Johnson K, Doody RS, Villanueva-Meyer J, Chowdhury M, Stern Y, Honig LS, Bell KL, Morris JC, Ances B, Carroll M, Leon S, Mintun MA, Schneider S, Marson D, Griffith R, Clark D, Grossman H, Mitsis E, Romirowsky A, deToledo-Morrell L, Shah RC, Duara R, Varon D, Roberts P, Albert M, Onyike C, Kielb S, Rusinek H, de Leon MJ, Glodzik L, De Santi S, Doraiswamy P, Petrella JR, Coleman R, Arnold SE, Karlawish JH, Wolk D, Smith CD, Jicha G, Hardy P, Lopez OL, Oakley M, Simpson DM, Porsteinsson AP, Goldstein BS, Martin K, Makino KM, Ismail M, Brand C, Mulnard RA, Thai G, Mc-Adams-Ortiz C, Womack K, Mathews D, Quiceno M, Diaz-Arrastia R, King R, Weiner M, Martin-Cook K, DeVous M, Levey AI, Lah JJ, Cellar JS, Burns JM, Anderson HS, Swerdlow RH, Apostolova L, Lu PH, Bartzokis G, Silverman DH, Graff-Radford NR, Parfitt F, Johnson H, Farlow MR, Hake AM, Matthews BR, Herring S, van Dyck CH, Carson RE, MacAvoy MG, Chertkow H, Bergman H, Hosein C, Black S, Stefanovic B, Caldwell C, Hsiung GY, Feldman H, Mudge B, Assaly M, Kertesz A, Rogers J, Trost D, Bernick C, Munic D, Kerwin D, Mesulam MM, Lipowski K, Wu CK, Johnson N, Sadowsky C, Martinez W, Villena T, Turner RS, Johnson K, Reynolds B, Sperling RA, Johnson KA, Marshall G, Frey M, Yesavage J, Taylor JL, Lane B, Rosen A, Tinklenberg J, Sabbagh M, Belden C, Jacobson S, Kowall N, Killiany R, Budson AE, Norbash A, Johnson PL, Obisesan TO, Wolday S, Bwayo SK, Lerner A, Hudson L, Ogrocki P, Fletcher E, Carmichael O, Olichney J, DeCarli C, Kittur S, Borrie M, Lee TY, Bartha R, Johnson S, Asthana S, Carlsson CM, Potkin SG, Preda A, Nguyen D, Tariot P, Fleisher A, Reeder S, Bates V, Capote H, Rainka M, Scharre DW, Kataki M, Zimmerman EA, Celmins D, Brown AD, Pearlson GD, Blank K, Anderson K, Saykin AJ, Santulli RB, Schwartz ES, Sink KM, Williamson JD, Garg P, Watkins F, Ott BR, Querfurth H, Tremont G, Salloway S, Malloy P, Correia S, Rosen HJ, Miller BL, Mintzer J, Longmire CF, Spicer K, Finger E, Rachinsky I, Rogers J, Kertesz A, Drost D.
Source
Imperial College London, UK.
Abstract
Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways.We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our "pathways group lasso with adaptive weights" (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets.In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small.
- PMID:
- 22499682
- [PubMed - indexed for MEDLINE]
- PMCID:
- PMC3491888
Free PMC ArticleFigure 1
The problem of overlapping pathways: here there are three pathways,

,

and

, two of which overlap. A: Standard formulation. Pathway parameter vectors
β1 and
β2 overlap, since they have SNPs in common (shaded dark grey). Where an overlapping SNP has a non-zero coEfficient, only

, can be selected independently. B: Formulation with duplicated SNPs. An expanded G parameter vector,
β*, is created by duplicating overlapping SNPs (dotted line).

and

now enter the model separately, so that pathways can be selected independently.
Stat Appl Genet Mol Biol. 2012 January 6;11(1):Article-7.
Figure 3
Frequency distribution of ADNI SNPs by number of pathways they map to. SNPs are mapped to genes within 10kbp. The data set consists of 8,078 SNPs and 551 pathways.
Stat Appl Genet Mol Biol. 2012 January 6;11(1):Article-7.
Figure 5
Application of bias-adjusted weighting procedure to the data used in the simulation study.
R = 40,000, with a different null response,

(0, 1), at each MC simulation.
α = 0.98. (a) Empirical pathway selection frequency distribution, Π
*, with standard, pathway size weighting,

.
D = 2.24. Dotted horizontal line shows the expected distribution, II
l = 1/L ≃ 0.002. (b) Π
* with bias-adjusted weights after 10 iterations.
D = 0.12. (c) Variation of weighting adjustment factor
w(τ)/
w(τ–1) with
dl at a single iteration, with
α = 0.98. Each point represents the adjustment to a single
wl,
l = 1,…,
L. (d) Decrease in K-L divergence,
D, over 10 iterations.
Stat Appl Genet Mol Biol. 2012 January 6;11(1):Article-7.
Figure 7
ROC curves illustrating proportion of simulations with rk1 ≤ z, for ranks z = 1,2,…,100. Power is average across 500 simulations. False positive rate = (z – 1)/L. Scenarios corresponding to the higher SNP effiect size (δk = 0.005) are presented in the left-hand column, with the equivalent scenarios at the lower effect size (δk = 0.001) on the right.
Stat Appl Genet Mol Biol. 2012 January 6;11(1):Article-7.
Figure 9
Distribution of the power-adjusted, normalised, weighted ranking score,
R, across 500 simulations. The final ‘50+’ column includes simulations for which no causal pathway was ranked in the top 100, i.e.

;
R = 100.
Stat Appl Genet Mol Biol. 2012 January 6;11(1):Article-7.
Figure 2
Stat Appl Genet Mol Biol. 2012 January 6;11(1):Article-7.
Figure 4
Distributions of

across 500 MC simulations for the 6 scenarios described in Table 1. Where SNPs are distributed within a single gene (scenarios (c) and (f)), the number of causal pathways tends to be larger, since a single gene can map to multiple pathways. Where SNPs are distributed randomly across

(scenarios (a), (b), (d), and (e)), this number tends to be smaller, particularly where the number of causal SNPs is large (scenarios (a) and (d)).
Stat Appl Genet Mol Biol. 2012 January 6;11(1):Article-7.
Figure 6
Comparison of ranking performance: adaptive weighting scheme (section 2.3) vs. standard pathway size weighting (13).
S = 10;
δk = 0.005; SNPs randomly distributed across

. (a) ROC curves illustrating power to identify at least one causal pathway in the top 100. Power is average across 500 simulations. (b) Distribution of ranking power,
p100, across 500 simulations. This is the proportion

of causal pathways in

that are ranked in the top 100 pathways. Notches indicate 95% confidence intervals for the true median. (c) Distribution of the power-adjusted, normalised, weighted ranking score,
R, across 500 simulations. The final ‘50+’ column includes simulations for which no causal pathway was ranked in the top 100, i.e.

;
R = 100.
Stat Appl Genet Mol Biol. 2012 January 6;11(1):Article-7.
Figure 8
Box plots of distribution of ranking power,
p100, across 500 simulations. This is the proportion

of causal pathways in

that are ranked in the top 100 pathways. Notches indicate 95% confidence intervals for the true median.
Stat Appl Genet Mol Biol. 2012 January 6;11(1):Article-7.
Publication Types
MeSH Terms
Grant Support