Our objective in this study is to build and validate an “Integrated Risk Score” for diseases and traits, defined to be a single prediction function (such as a logistic regression) containing a polygenic risk score, family history information, and the predicted effects from rare variants. We want to validate these scores across ancestries and within families to test the robustness of the scores.
The research will involve analyzing multiple segments of the UK Biobank’s data, including the rare variants extracted from the whole genome sequencing data, family history information from the survey data, and genotypes from the imputed genotypes. The project will conclude by creating prediction models for diseases and traits in the form of an Integrated Risk Score, which combines all three sources.
Research questions:
1. Can the trans-ethnic portability of polygenic risk scores be improved by integrating data from diverse ancestries and using techniques such as X-Wing (Miao 2023), SBayesRC (Zheng 2024)?
2. Are these resulting polygenic risk scores attenuated within families by e.g. assortative mating (Young 2023)?
3. Can rare variants from exome wide association studies yield incremental preedictive value? Are these trans-ethnically portable?
4. Can we integrate the rare variants, polygenic scores and family history information in a way grounded by mathematical theory?
Scientific Rationale:
To our knowledge, this Integrated Risk Score would be the first of its kind and would contribute to the literature. We are only aware of studies that incorporate two of the three. For example, rare and common variants have been analyzed together in the case of type 1 diabetes (Dornbos 2023). Family history and polygenic scores have been integrated, e.g. (Mars 2022).