Polygenic risk scores (PRSs) are powerful tools for predicting genetic susceptibility to complex diseases using genome-wide association studies (GWAS). However, due to the Eurocentric bias in existing GWAS datasets-largely dominated by European ancestry (EUR) cohorts-PRS performance declines significantly in non-European populations, exacerbating health disparities and limiting the equity of precision medicine (Martin et al., 2019; Peterson et al., 2019).
Recent studies have introduced transfer learning (TL) strategies to improve PRS accuracy by adapting models trained in EUR populations for application in underrepresented ancestries. These strategies include reweighting SNP effects, cross-ancestry joint modeling, and deep domain adaptation (Zhao et al., 2022; Liu et al., 2022).
This study addresses two key questions: (i) Can PRS models trained on EUR datasets be effectively adapted for non-EUR populations using TL methods? (ii) How do TL approaches compare with standard PRS models across diverse ancestries?
We propose to: (i) Benchmark classical PRS tools (P+T, LDpred2, PRS-CS, Lassosum2, BayesR) against TL-based frameworks (TL-PRS, PRS-CSx, transPGS, deep learning models); (ii) Focus on type 2 diabetes (T2D) and coronary artery disease (CAD), due to their global burden, polygenic nature, and diverse summary statistics availability; (iii) Develop ancestry-aware PRS pipelines using UK Biobank’s multi-ancestry data and large external GWAS datasets.
For T2D, we will leverage DIAMANTE, PAGE, and Biobank Japan. For CAD, we will use CARDIoGRAMplusC4D and ancestry-stratified Japanese GWAS data. This comprehensive evaluation aims to improve prediction accuracy, fairness, and clinical utility of PRS models across ancestries.