Approved Research
Regression methods for phenome-wide association analysis on large-scale biobank data
Lay summary
With the advances in genotyping technologies and electronic health records (EHRs), large biobanks have been great resources to identify novel genetic associations and gene-environment interactions on a genome-wide and even a phenome-wide scale. To date, several phenome-wide association studies (PheWAS) have been performed on biobank data, which provides comprehensive insights into many aspects of human genetics and biology. Although inspiring, PheWAS on large-scale biobank data encounters new challenges including large computational burden, unbalanced phenotypic distribution, and genetic relationship. For quantitative and binary traits, some state-of-art strategies such as matrix projection, saddlepoint approximation, and mixed model approaches have been used to overcome the challenges. However, for some complex phenotypes such as MRI, the analysis approaches are still urgently needed. This application proposes to develop fast and accurate regression methods which can be used to fully utilize phenotypes with complex structure. Since whole genome sequencing can accurately identify and genotype rare variants, scalable and powerful methods to evaluate rare variant associations will also be proposed in this application. In addition, the evolving availability of new technologies will provide us with rich multi-omics data resources. This application will also consider how to effectively incorporate additional information to boost powers and to increase interpretability in phenome-wide studies. The application estimates a duration of 3 years. The developed approaches will be important supplementary to the existing analysis approaches and will be applied to UK Biobank data to identify novel genetic variants associated with phenotypes and environmental factors, which can contribute to translational and clinical research, including to construct risk prediction models for complex diseases and phenotypes, to identify the causal effect of exposures and drugs, and to identify drug targets and repurposing.