Last updated:
ID:
285049
Start date:
5 November 2024
Project status:
Current
Principal investigator:
Elena Nabieva
Lead institution:
Princeton University, United States of America

The nature vs nurture debate is likely to be resolved with a “yes, both”. Genetics may predispose an individual to certain health outcomes, but the realization of that predisposition may depend on environmental and behavioral factors. Analyzing the interplay may help understand individual and group vulnerabilities and focus efforts on their avoidance. The UK Biobank provides a unique cohort that has both whole-genome sequencing data and rich information about the participants’ medical conditions, lifestyle factors (such as diet and exercise) and environment.

We plan to apply advanced machine learning and data science methods to tackle both aspects of the problem using the UK Biobank data. While looking at the genetic data, we plan to tackle an area that, for a long time, has been something of a “blind spot” in genomics research, namely the “dark matter” of the non-coding genome. Rather than being “junk DNA”, this part of the genome contains elements that are responsible for regulating when and where genes will be expressed in the body, and, ultimately, for cell identity. Over the years, our team has developed deep-learning computational methods that can interpret variation in these regulatory regions and predict its effects on what happens in the cell. We will use these tools and others to predict the effect of mutations in genes and outside of genes to identify factors that contribute to disease.

In parallel, we will apply data-science methods to analyze the rich information collected by the UK Biobank through questionnaires and some linked environmental information, such as data on air pollution. We will first look for confirmation of the patterns that have been described in literature, such as the influence of air pollution on kidney disease. We will then look for other factors that may be disease-relevant.

Having developed a list of candidate mutations and environmental exposures, we will examine the relationship between the two with the hope of identifying the cases where the environment modulates genetic predisposition. If successful, this work will help pave the way to personalized approaches to health care.

Although our framework can be applied to any condition, we will initially focus on those where we have extensive expertise, such as kidney disease.