Skip to navigation Skip to main content Skip to footer

Approved research

Augmenting data-driven discovery of disease risk genes using multivariate and integrative approaches to genome-wide association analysis

Principal Investigator: Professor Xin He
Approved Research ID: 27386
Approval date: July 17th 2018

Lay summary

The ultimate aim of this project is to draw new connections between genes and diseases, such as psychiatric disorders and diabetes. Our group develops methods and software for discovering gene-disease associations from large-scale data. We take two distinctive approaches to this problem: we develop multivariate methods that generate informative links between genes and multiple phenotypes?disease, biomarkers and environmental exposures; we integrate external information on genetic variants, such as gene expression (eQTLs), to guide discovery of disease risk genes. The UK Biobank offers a unique opportunity to apply these approaches and expand biological insights into disease risk. The UK Biobank has collected health, medical and genetic data at an unprecedented scale and scope. It would be of great value to the UK Biobank initiative, we believe, to help researchers better understand the genetic basis of complex inherited diseases. The proposed research will employ highly innovative methods to analyze the genetic data. By developing, applying and evaluating these methods on the UK Biobank data, then disseminating them to the larger research community as standalone software packages, we believe these efforts will help to realize the true potential of UK Biobank Project. Our research efforts are roughly broken down into 5 stages: (1) we download the UK Biobank data, and take `quality-control` steps to manipulate data that may compromise robustness of our results; (2) we develop preliminary software implementations using programming languages such as R; (3) we test our software in smaller data sets to ensure accuracy and reproducibility; (4) we apply our methods to the Biobank data, then interpret and verify the results, often with the aid of other bioinformatics resources; and (5) we develop and disseminate user-friendly software toolkits. We have accumulated experience in all these research stages. Full cohort.