Skip to navigation Skip to main content Skip to footer

Approved research

Discovering disease mechanism through integration of genomic data with clinical data using 500,000 individuals in the UK Biobank

Principal Investigator: Professor Eleazar Eskin
Approved Research ID: 31211
Approval date: November 28th 2019

Lay summary

Large data collection efforts within health systems promise to advance our understanding of how genetic traits and environmental factors influence common disease and condition status. In this proposal, we address this big data challenge by developing a novel framework for integrating genomic data with clinical data, in order to identify genetic traits related to specific disorders. Effective methodologies for integrating analysis of large datasets of multiple data types require a strategy that takes advantage of unique properties that emerge upon combination of these data. These kinds of analyses benefit from multiple phenotypes. Our proposed research aims to improve the prevention, diagnosis and treatment of inherited disorders. Advancing our understanding of the relationship between genetic and environmental traits involved in human disease requires identification of unique properties that emerge upon combination of multiple data types. Specifically, we seek subtypes that in some mathematical sense (e.g., correlation) have a relation with the genotype. Maximizing this relation will result in a more heritable subtype of the phenotype; thus, it will result in increased power for detecting association and accuracy for prediction of this subtype. We will utilize causal graphs to interrogate the relationships between traits. Our proposed research aims to improve the prevention, diagnosis and treatment of inherited disorders. Advancing our understanding of the relationship between genetic and environmental traits involved in human disease requires identification of unique properties that emerge upon combination of multiple data types. The size and scope of the UK Biobank data presents a unique opportunity for development of a novel framework. Recent studies demonstrate that an individual?s genetic background may profoundly affect the accuracy of diagnosis and efficacy of treatment. Integration of genomic data with traditional treatment decision-making modalities promises to significantly improve diagnosis and treatment of an individual?s disorder. We are requesting access to the currently available UK Biobank genetic dataset, a cohort comprised of 150,000 individuals. In addition, we are requesting access to the newly-released cohort of 500,000 individuals; we are under the impression that this cohort will be available by the time this application is processed. This includes all calls and imputation (category 100315), confidences (category 100316), and intensities (category 100317) data-fields. In addition, we are requesting access to datasets in the top-level categories of Population Characteristics, UK Biobank Assessment Centre, Online Follow-up, and Health-related Outcomes.