Approved Research

Assessing the role of rare and common variation in human clinical conditions and risk prediction using improved methods for genotype imputation and genetic effect modeling across populations

SelfDecode

Lay summary

Many common human diseases are affected by thousands of genetic variants, each with individually small effects on the risk of developing these clinical conditions. Genome-wide association studies (GWAS) identify disease-associated variants by comparing hundreds of thousands of affected and unaffected individuals. The identified variants are then aggregated to produce polygenic risk scores (PRS) to stratify individuals based on their susceptibility to developing a particular disease as a function of the genetic variants they carry. However, the lack of diversity in GWAS which have historically included predominantly those of European descent, and the demonstrated low portability of PRS models built using variants identified in one population to other populations, have limited the utility of PRS in clinical practice. Furthermore, rare variants tend to have a greater contribution to phenotypic variation, but due to challenges in rare variant imputation, are not often included in GWAS.

Over this three-year project, we aim to utilize the data from the UK Biobank (UKBB) to increase polygenic risk score prediction accuracy for its application in personalized medicine. We plan to evaluate and refine conceptually new statistical and computational methods that improve imputation performance of rarer variation, thus driving identification of additional disease-associated variants and improving disease risk estimation. To overcome GWAS challenges associated with non-coding variants, the need for repeated testing for different populations, and to address the massive increase in sample size required to determine if variants are disease-associated, we have developed new exploratory models which identify key disease variants, and novel PRS models which incorporate several levels and types of functional information, and phenotypic and genotypic data.

We will utilize the rich individual-level genetic and phenotypic UKBB data to tackle the influence of lifestyle and environmental factors on human conditions across ethnicities and other socio-demographic groups in addition to the effects of aggregate polygenic risk scores, and evaluate the predictive performance of our novel absolute and relative disease risk models. These models also incorporate biomarkers and other demographic factors, and can be utilized as screening tools to guide health promotion. The results of this project should aid in early identification of individuals at higher risk of certain health conditions who will benefit from preventive interventions and personalized treatments, and will result in the generation of many PRS models for common diseases to optimize clinical care.