Skip to navigation Skip to main content Skip to footer

Approved Research

A framework to assess causal effects in observational data through deep digital phenotyping and creating digital twins

Principal Investigator: Dr Rohan Khera
Approved Research ID: 71033
Approval date: May 17th 2021

Lay summary

In a real-world setting, we have been limited in our ability to infer whether a particular lifestyle or a treatment received by an individual leads to a good or a bad health outcome, despite systems to measure both these lifestyle features and treatments, and the outcomes. A major limitation is that inferring such a cause-and-effect association requires that at least identical individuals exist, which differ only on the lifestyle or treatment in question. The ability to define individuals that resemble each other has, however, been limited by our ability to actually leverage only crude characteristics even though electronic health records as well as electrocardiographic and imaging data may capture some of these distinct patient features. 

The current proposal investigates a novel strategy to create an efficient way to define each individual using their data captured in all these high-quality data sources. This virtual representation of each person (referred to as their "digital twin") will be used to identify other individuals who resemble this person on a set of measurable characteristics.

Our work will evaluate the least amount of unique data sources that are required to define digital twins. Pairs of digital twins will be followed over time for the development of adverse cardiovascular events while focusing on identifying lifestyle or treatment differences that may underlie differences in the trajectory of outcomes among such digital twins.

Collectively, our investigations will leverage the uniquely powerful contributions of UK Biobank participants to make methodologic advances that allow us to gain deeper insights from observational studies, potentially expanding their role in scientific discovery.

Scope extension:

Current Scope

Our objective is to investigate the use of deep digital phenotyping of individuals to create their virtual representations (or digital twins) and leverage it as a strategy to match individuals on high-dimensional data to allow evaluation of causal effects of interventions. We propose the following aims to achieve these objectives:

  1. To evaluate the feasibility of using detailed digital phenotypic characteristics spanning baseline demographic, clinical, electrocardiographic, and imaging data in the UK Biobank to create digital twins of participants. We will sequentially use data with increasing complexity and ease of collection to efficiently create these virtual representations of individuals using novel tools and identify phenotypically similar individuals based on the similarity between their digital twins.
  2. To describe the long-term trajectories of healthcare utilization and clinical outcomes among phenotypically similar individuals based on progressively more complex digital phenotypic features. We will evaluate the degree of heterogeneity in both healthcare utilization and important clinical outcomes. To test the reliability of these pairs of digital twins in assessing causal effects on interventions, we will pursue an evaluation of positive and negative control outcomes to address their reliability in detecting an expected effect.

Extended Scope

In the Extended Scope, we aim to delve deeper into characterizing the digital phenotypes that were generated in the Current Scope. Our overall objective is to improve care and patient outcomes by adopting precision medicine strategies using these phenotypes. Specifically, the extended study will describe the generated phenotypes, define their distribution, and assess their significance for disease diagnosis and prognosis. This will be achieved through the following three aims:

  1. To describe and define the distribution of generated digital phenotypes generated using data streams available from the UK Biobank.
  2. To develop personalized approaches for disease diagnosis and risk stratification by leveraging digital twins and phenomapping methods.

3. To define and validate personalized treatment response algorithms derived from digital twins and phenomapping methods.