Abstract
Rare variant polygenic scores (rvPRS) are developed to improve phenotype prediction, yet a standardized construction protocol remains unavailable. We aim to establish an optimal rvPRS protocol using whole exome sequencing (WES) and imputed genotype (IMP) data from 502,369 UK Biobank participants and to evaluate its predictive performance compared to common variant PRS (cvPRS). rvPRS models are constructed for 13 binary and 5 quantitative traits using gene-burden and single-SNP associations and are assessed via R2, perSD OR/Beta, NRI, and IDI. Single-SNP-based rvPRS outperform gene-burden models, and IMP-derived rvPRS generally surpass WES-derived models. For 6 of 12 validated traits, combined tPRS (cvPRS + rvPRS) improves prediction over cvPRS alone. IMP data also show a stronger correlation between heritability and rvPRS association strength. This study provides a practical rvPRS protocol applicable across traits and underscores the potential of rare variants to enhance phenotype prediction.