Skip to navigation Skip to main content Skip to footer

Approved Research

Complex Analysis and Creation of Extensible Datasets for UKBioBank

Principal Investigator: Dr Ben Busby
Approved Research ID: 82561
Approval date: June 7th 2022

Lay summary

Aims: We are going to make it easier to access and subset on different data types, so researchers can get to meaningful results faster.  Scientific Rationale:  For precision medicine to be realized, researchers must be able to subset cohorts of patients on imaging, medical test (e.g. electrocardiogram), activity, or metabolic data.  Once these subsets are established, treatment options become more precise due to an understanding the genetic contributions to these cohorts' disease.   Public health impact: This subsetting is important because it is often used to direct the treatment protocol for this disease.  If the subsetting is done incorrectly and inappropriate drugs are used for each subset, patients may die faster.  Project duration: We expect this project to last approximately two years.

Scope extension:

We would like to demonstrate the possibility of creating extensible data sets for analysis of: neurological diseases (particularly with imaging data), metabolic disease (particularly with accelerometer data), cardiovascular disease (particularly with electrocardiograms), and colorectal cancer, subsetting CMS types with WGS and proteomic data).  We are also particularly interested in drug response data as well as in adverse event data.  We will demonstrate how to effectively conduct longitudinal analysis of drug response and integrate them back into a UKBRAP database for future analysis by clinicians and others. Specifically, we plan to set up a system to look at continuation, or non-continuation of a drug given either resolution or continuation of a disease as labeled by ICD-10 code or primary care notes.  We also plan to look for common adverse event symptoms in this context.

For interpretable machine learning aspect, we aim to develop novel machine learning algorithms and/or identify existing machine learning algorithms that can be used to successfully predict an outcome (value of a UK Biobank data field, e.g., any of the ICD code) and also investigate and develop computation frameworks and techniques for building interpretable results from those prediction methods.