Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases

Polygenic risk scores (PRS) are an emerging tool to predict the clinical phenotypes and outcomes of individuals. Validation and transferability of existing PRS across independent datasets and diverse ancestries are limited, which hinders the practical utility and exacerbates health disparities. We propose PRSmix, a framework that evaluates and leverages the PRS corpus of a target trait to improve prediction accuracy, and PRSmix+, which incorporates genetically correlated traits to better capture the human genetic architecture. We applied PRSmix to 47 and 32 diseases/traits in European and South Asian ancestries, respectively. PRSmix demonstrated a mean prediction accuracy improvement of 1.20-fold (95% CI: [1.10; 1.3]; P-value = 9.17 × 10−5) and 1.19-fold (95% CI: [1.11; 1.27]; P-value = 1.92 × 10−6), and PRSmix+ improved the prediction accuracy by 1.72-fold (95% CI: [1.40; 2.04]; P-value = 7.58 × 10−6) and 1.42-fold (95% CI: [1.25; 1.59]; P-value = 8.01 × 10−7) in European and South Asian ancestries, respectively. Compared to the previously established cross-trait-combination method with scores from pre-defined correlated traits, we demonstrated that our method can improve prediction accuracy for coronary artery disease up to 3.27-fold (95% CI: [2.1; 4.44]; P-value after FDR correction = 2.6 × 10−4). Our method provides a comprehensive framework to benchmark and leverage the combined power of PRS for maximal performance in a desired target population.

provides a comprehensive framework to benchmark and leverage the combined power of 49 PRS for maximal performance in a desired target population. 50 51 INTRODUCTION 52 53 Thousands of polygenic risk scores (PRS) have been developed to predict an individual's 54 genetic propensity to diverse phenotypes 1 . PRS are generated when risk alleles for distinct 55 phenotypes are weighted by their effect size estimates and summed 2 . Risk alleles included 56 in PRS have traditionally been identified from genome-wide association studies (GWAS) 57 results conducted on a training dataset, which are weighted and aggregated to derive a PRS 58 to predict distinct phenotypes. The association between PRS and the phenotype of interest 59 is subsequently evaluated in a test dataset that is non-overlapping with the training dataset 3 . 60 61 Most PRS have been developed in specific cohorts that may vary in terms of population 62 demographics, admixture, environment, and SNP availability. Limited validation of many 63 PRS outside of the training datasets and poor transferability of PRS to other populations 64 may limit their clinical utility. However, pooling of data from individual PRS generated and 65 validated in diverse cohorts has the potential to improve the predictive ability of PRS across 66 diverse populations. The Polygenic Score Catalog (PGS Catalog) is a publicly available 67 repository that archives SNP effect sizes for PRS estimation. The SNP effect sizes were 68 developed from various methods (e.g. P+T 4 , LDpred 5,6 , PRS-CS 7 , etc.) to obtain the highest 69 prediction accuracy in the studied dataset. PRS metadata enables researchers to replicate 70 PRS in independent cohorts and aggregate SNP effects to refine PRS and enhance the 71 accuracy and generalizability in broader populations 8 . However, optimizing PRS 72 performance requires methodological approaches to adjust GWAS estimate effect sizes that 73 take into account correlated SNPs (i.e., linkage disequilibrium) and refine PRS for the target 74 population 4,5,7,9-12 . Furthermore, numerous scores are often present for single traits with 75 varied validation metrics in non-overlapping cohorts. There is a lack of standardized 76 approaches combining PRS from this growing corpus to enhance prediction accuracy and 77 generalizability while minimizing bias, for a target cohort 8,11,13 . 78 79 To address these issues, we sought to: 1) validate previously developed PRS in two 80 geographically and ancestrally distinct cohorts, the All of Us Research Program (AoU) and 81 the Genes & Health cohort, and 2) present and evaluate new methods for combining 82 previously calculated PRS to maximize performance beyond all best performing published 83 PRS. To better capture the genetic architecture of the outcome traits, we proposed PRSmix, 84 a framework to combine PRS from the same trait with the outcome trait. Previous studies 85 highlighted the effect of pleiotropic information on a trait's genetic architecture 14,15 . 86 Therefore, we proposed PRSmix+ to additionally combine PRS from other genetically 87 correlated traits to further improve the PRS for a given trait. 88 89 To assess the prediction improvement, we performed PRSmix and PRSmix+ for 47 traits in 90 European ancestry and 32 traits in South Asian ancestry. We evaluated 1) the relative 91 improvement of the proposed framework over the best-performing pre-existing PRS for each 92 trait, 2) the efficient training sample sizes required to improve the PRS, 3) the predictive 93 improvement in 6 groups including anthropometrics, blood counts, cancer, cardiometabolic, 94 biochemistry and other conditions as the prediction accuracies varied in each group, and 4) 95 the clinical utility and pleiotropic effect of the newly built PRS for coronary artery disease. 96 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 23, 2023. ; https://doi.org/10.1101/2023.02.21.23286110 doi: medRxiv preprint Overall, we show that PRSmix and PRSmix+ significantly improved prediction accuracy. An 97 R package for preprocessing and harmonizing the SNP effects from the PGS Catalog as 98 well as assessing and combining the scores was developed to facilitate the combining of 99 pre-existing PRS scores for both ancestry-specific and cross-ancestry contexts using the 100 totality of published PRS. The development of this framework has the potential to improve 101 precision health by improving the generalizability in the application of PRS 16  as the alternative alleles in the independent cohorts. In each independent biobank (All of Us,112 Genes & Health), we estimated the PRS and split the data into training (80%) and testing 113 (20%) datasets. In Phase 2, in the training dataset, we trained the Elastic Net model with 114 high-power scores to estimate the mixing weights for the PRSs.  Table 1). In each cohort, we compared the improvement of our proposed 159 framework with the single best score from the PGS Catalog. We estimated the averaged 160 fold-ratio as a measure of the improvement of prediction accuracy by our approach, 161 compared to the best single score from PGS Catalog. We also classified the traits into 6 162 categories as anthropometrics, blood counts, cancer, cardiometabolic, biochemistry, and 163 other conditions (Supplementary Table 2  Simulations were used to evaluate the combination frameworks 169 170 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. To compare the performance of PRSmix and PRSmix+ against the best single PRS and 183 evaluate the sample sizes needed for training the mixing weights, we performed simulations 184 with real genotypes of European ancestry in the UK Biobank given the large sample sizes 185 available (Fig. 2). Briefly, we randomly split 7,000 individuals as a testing data set mimicking 186 the testing size of 20% of real data. In the remaining dataset, we used 200,000 individuals 187 for GWAS to estimate the SNP effect sizes for PRS calculations. Finally, with the rest of the 188 data, we randomly selected different sample sizes as the training sample to evaluate the 189 sample sizes needed to train the mixing weights. To assess the improvement of PRS 190 performance, we computed the fold-ratio of prediction accuracy R 2 between PRSmix and 191 PRSmix+ against the best-performing single simulated PRS. 192 193 Our results showed that the trait-specific combination, PRSmix, showed no improvement 194 with the training sample smaller than 500 for most of the traits. Our simulations illustrated 195 that traits with low heritability required a larger sample size to achieve an improvement 196 compared to traits with high heritability (Fig. 2a and 2b smaller improvement compared to traits with a smaller heritability (Fig. 2c). 205 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 23, 2023. ; https://doi.org/10.1101/2023.02.21.23286110 doi: medRxiv preprint Table 1). Traits with the best-performance trait-specific single PRS which showed a lack of 239 power were also removed. Overall, we observed a significant improvement compared to 1 240 using a two-tailed paired t-test with PRSmix. PRSmix significantly improves the prediction 241 accuracy compared to the best PRS estimated from the PGS Catalog. Consistent with our simulation results, a smaller improvement was observed for traits with a 272 higher baseline prediction accuracy from PGS Catalog ( Supplementary Fig. 4), noting that 273 the baseline prediction accuracy depends on the heritability and genetic architecture (i.e. 274 polygenicity). In contrast, more improvement was observed for traits with lower heritability, 275 thus lower prediction accuracy, when comparing the single best PRS (Fig. 1c).

277
Prediction accuracy and predictive improvement across various types of traits 278 279 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The whiskers reflect the maximum and minimum values within the 1.5 × interquartile range 292 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) PRSmix+ equipped with both LDpred2-auto and PGS Catalog scores (Fig. 5). Employing 355 pleiotropic effects only provided a small improvement with height (Supplementary Table 6). 356 On the other hand, T2D demonstrated that all methods of cross-trait combinations provided 357 a significant improvement over the trait-specific combination (Fig. 5).

359
Clinical utility for coronary artery disease 360 361 362 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. ancestries. The baseline model for risk prediction includes age, sex, total cholesterol, HDL-375 C, systolic blood pressure, BMI, type 2 diabetes, and current smoking status. We compared 376 the integrative models with PGS Catalog, PRSmix, and PRSmix+ in addition to clinical risk 377 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Asian ancestries, respectively (Supplementary Fig. 5 and Supplementary Fig. 6) Moreover, we observed that there is a plateau of improvement for PRSmix from the training 432 size of 5000 in both European and South Asian ancestries ( Supplementary Fig. 7), which 433 aligned with our simulations (Fig. 2a and 2b) Abraham et al. 8 we also observed that our method could identify more related risk factors to 500 include compared to previous work conducted on stroke ( Supplementary Fig. 8). Therefore, 501 our method is more comprehensive in an unbiased way in terms of choosing the risk factors 502 and traits to include with empirically improved performance. 503 504 Fifth, greater performance is observed even for non-European ancestry groups 505 underrepresented in GWAS and PRS studies. We empirically demonstrate the value of 506 training and incorporating pleiotropy with all available PRS to improve performance, 507 including multiple metrics of clinical utility for CAD prediction in multiple ancestries. In South 508 Asian ancestry, we observed that PRSmix and PRSmix+ demonstrated a significant 509 improvement with the best improvement for CAD. Of note for CAD, the relative 510 improvements in South Asian ancestry were higher than in European ancestry for PRSmix 511 and equivalent for PRSmix+. Transferability of PRS has been shown to improve the clinical 512 utility of PRS in non-European ancestry 16,34 . Although the prediction accuracy for South 513 Asian ancestry is still limited, our results highlighted the transferability of predictive 514 improvement with PRSmix and PRSmix+ to South Asian ancestry. We anticipate that 515 ongoing and future efforts to improve our understanding of the genetic architecture in non-516 European ancestries will further improve the transferability of PRS across ancestry. 517 518 Lastly, traits with low heritability or generally low-performing single PRS benefit the most 519 from this approach, especially with PRSmix+, such as migraine in both European and South 520 Asian ancestries. Additionally, our results showed that pleiotropic effects play an important 521 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 23, 2023. ; https://doi.org/10.1101/2023.02.21.23286110 doi: medRxiv preprint role in understanding and improving prediction accuracies of complex traits. However, 522 anthropometric traits, which are highly polygenic 35 and have good predictive performance 523 using the best PGS Catalog, also showed improvement with the combination framework in 524 both European and South Asian ancestries. 525 526 Given that PRSmix+ outperformed PRSmix, one might consider if there is a reason to use 527 PRSmix instead of PRSmix+. We observed that in cases of highly heritable traits or high 528 performance with a single PRS, there was only marginal improvement of PRSmix+ over 529 PRSmix. In this scenario, PRSmix could provide similar predictive performance while being 530 less time-consuming because trait-specific PRS inputs only are required. However, for traits 531 with lower heritability PRSmix+ shows a marked improvement over PRSmix and would be 532 preferred. Wang et al. 36 showed that the theoretical prediction accuracy of the target trait 533 using the PRS from the correlated trait is a function of genetic correlation, heritability, 534 number of genetic variants and sample size. Future directions could include defining the 535 minimum parameters required for the performance of the PRSmix+ model to improve on 536 single trait-specific PRS. 537 538 Our work has several limitations. First, the majority of scores from PGS Catalog were 539 developed in European ancestry populations. Further non-European SNP effects will likely 540 improve the single PRS power, which may in turn, also improve the prediction accuracy of 541 our proposed methods. Second, the Elastic Net makes a strong assumption that the 542 outcome trait depends on a linear association with the PRS and covariates. However, a 543 recent study demonstrated there is no statistical significance difference between linear and 544 non-linear combinations for neuropsychiatric disease 13 . Third, we did not validate the mixing 545 weights in an independent cohort. We expect that in the future, there will be emerging large 546 independent biobanks, but prior non-genetic work demonstrates the value of internal 547 calibration for optimal risk prediction. Fourth, we estimated the mixing weights for each 548 single SNP as a mixing weight of the PRS. Future studies could consider linkage 549 disequilibrium between the SNPs and functional annotations of each SNP. Fifth, our 550 frameworks were conducted on binary traits with a prevalence > 2%. Additional combination 551 PRS models are emerging that seek to use preexisting genotypic data from genetically 552 related, but low prevalence conditions, to improve the prediction accuracy of rare 553 conditions 13 . Sixth, the baseline demographic characteristics (i.e., age, sex, social economic 554 status) in the target cohort might limit the validation and transferability of PRS 37 . Although 555 these factors were considered by using a subset of the target cohorts as training data, it is 556 necessary to have PRS developed on similar baseline characteristics. Lastly, with the 557 expanding of all biobanks, there might be no perfect distinction between the samples 558 deriving PRS and the testing cohort, future studies may consider the potential intersection 559 samples to train the linear combination. 560 561 In conclusion, our framework demonstrates that leveraging different PRS either trait-specific 562 or cross-trait can substantially improve model stability and prediction accuracy beyond all 563 existing PRS for a target population. Importantly, we provide software to achieve this goal in 564 independent cohorts. 565 566 METHODS 567 568 Data 569 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 23, 2023. ; https://doi.org/10.1101/2023.02.21.23286110 doi: medRxiv preprint 570 The All of Us Research Program 571 572 The All of Us Research Program is a longitudinal cohort continuously enrolling (starting May 573 2017) U.S. adults ages 18 years and older from across the United States, with an emphasis 574 on promoting inclusion of diverse populations traditionally underrepresented in biomedical 575 research, including gender and sexual minorities, racial and ethnic minorities, and 576 participants with low levels of income and educational attainment. 38 Participants in the 577 program can opt-in to providing self-reported data, linking electronic health record data, and 578 providing physical measurement and biospecimen data. 39 Details about the All of Us study 579 goals and protocols, including survey instrument development, 40 participant recruitment, data 580 collection, and data linkage and curation were previously described in detail. 39,41 581 582 Data can be accessed through the secure All of Us Researcher Workbench platform, which 583 is a cloud-based analytic platform that was built on the Terra platform. 42 Researchers gain 584 access to the platform after they complete a 3-step process including registration, 585 completion of ethics training, and attesting to a data use agreement attestation. 43 Therefore, we can analytically estimate the confidence interval of prediction accuracy for 678 each of the score. We selected high-power scores defined as power > 0.95 with P-value <= 679 0.05 or P-value <= 1.9 x 10 -5 (0.05/2600) for the combination with Elastic Net. 680 681 To compare the improvement, for instance between PRSmix and the best PGS Catalog, we 682 estimate the mean fold-ratio of R 2 across different traits with its 95% confidence interval and 683 evaluated the significance difference from 1 using a two-tailed paired t-test. We split the simulated cohort into 3 data sets for: 1) GWAS 2) training set: training the 705 mixing weights with a linear combination and 3) testing set: testing the combined PRS. We 706 incorporated PRS1, PRS2 and PRS3 to assess the trait-specific PRSmix framework. We 707 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) incorporates LD between markers to infer the posterior effect sizes. In our work, we 756 implemented LDpred2-auto 51 since this method can infer heritability and the proportion of 757 causal variants. LDpred2-auto was conducted with 800 burn-in iterations and 500 iterations. 758 The proportion of causal variants was initialized between 10 -4 and 0.9. Furthermore, 759 LDpred2-auto does not require a validation set, the SNP effect sizes were averaged 760 between scores. We used 1,138,726 HapMap3 variants that overlapped with SNPs from 761 whole-genome sequencing data in the All of Us cohort. The LD reference panel developed 762 from European ancestry was provided by the LDpred2-tutorial. 763 764 wMT-SBLUP: wMT-SBLUP 19 calculates the mixing weights of PRS using sample sizes from 765 GWAS summary statistics, SNP-heritability and genetic correlation. We compared wMT-766 SBLUP with our method using 5 traits that were originally assessed with wMT-SBLUP 767 including CAD, T2D, depression, height, and BMI. We curated 26 publicly available GWAS 768 summary statistics (Supplementary Table 18) and performed LDpred2-auto with quality 769 controls suggested by Privé et al 5,51 . We used LD score regression to estimate SNP-770 heritability and genetic correlation across 26 traits. For each of the 5 outcome traits, we 771 selected correlated traits with P-value of genetic correlation less than 0.05. 772 773 Elastic Net for linear combination: we also implemented linear combination by Elastic Net 774 with the LDpred2-auto-derived PRSs for contributing traits since this strategy was proposed 775 by several works 8,13,20 . We selected scores with significant variance explained (P-776 value<0.05) to the outcome trait and conducted Elastic Net using the glmnet R package 46 . 777 778 Phenome-wide association study 779 780 We obtained the list of 1815 phecodes from the PheWAS website (last accessed December 781 2022) 52 . The phecodes were based on ICD-9 and ICD-10 to classify individuals. PheWAS 782 was conducted on European ancestry only in AoU. For each phecodes as the outcome, we 783 conducted an association analysis using logistic regression on PRS and adjusted for age, 784 sex, and first 10 PCs. The significance threshold for PheWAS was estimated as 2.75 x 10 -5 785 (0.05/1815) after Bonferroni correction. 786 787 Data availability 788 789 The PGS Catalog is freely available at https://www.pgscatalog.org/. Our new scores are 790 deposited in the PGS Catalog. The All of Us and Genes & Health individual-level data is a 791 controlled access dataset and may be granted at https://www.researchallofus.org/ and 792 https://www.genesandhealth.org/, respectively. 793 794 The weights from the PRSmix and PRSmix+ scores in this manuscript have been returned to 795 the PGS Catalog. The R package to implement PRSmix and PRSmix+ in independent 796 datasets is at https://github.com/buutrg/PRSmix. 797 798 Software/analyses: 799 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Most of all we thank all of the volunteers participating in the All of Us Research Program and 846 Genes & Health. 847 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) HeartFlow, Novartis, Genentech, and GV, scientific advisory board membership to Esperion 853 Therapeutics, Preciseli, and TenSixteen Bio, is a scientific co-founder of TenSixteen Bio, 854 and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. 855 Others declare no conflict of interest. 856 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 23, 2023. ; https://doi.org/10.1101/2023.02.21.23286110 doi: medRxiv preprint