Skip to navigation Skip to main content Skip to footer

Approved Research

Investigating human physiology and disease by integrating multimodal data in the UK biobank

Principal Investigator: Dr Tarjinder Singh
Approved Research ID: 101287
Approval date: June 2nd 2023

Lay summary

Summary: We simplify and extract new factors from complex and high-throughput data (imaging and medical records) using machine learning and deep learning methods. We will integrate these different data types to understand the genetic basis of human physiology and disease.

Scientific rationale: Increasingly complex data are now available in national biobanks. Developing methods to integrate and co-analyze these data into meaningful features will enable us to identify new relationships between complex data types (e.g., imaging and accelerometry) and human physiology and disease status.

Aims: First, we will identify data modalities for which machine learning can be applied to extract new features. For computer vision-related techniques, we will manually annotate a few hundred data points with interesting features in collaboration with clinical experts. We will develop and apply different models to extract these features automatically and validate these results. After extracting features, we will perform genetic analyses studying these features on all participants. This will include adapting and applying methods for imputed data, whole-exome, and whole-genome sequencing. Finally, we will study the relationship between these newly extracted phenotypes and traits to causal links with disease outcomes.

Public health impact: Many medical conditions, such as mental illnesses, are complex in how they present in patients. We will use machine learning methods on imaging and medical record data to identify hidden factors linked to specific conditions, providing new insights into disease biology and diagnosis.

Expected duration of the project: We propose five years from the start to the completion of this project and publish a methods paper within the next two years, with publications as the primary output.