Skip to navigation Skip to main content Skip to footer

Approved research

Quantitative genotype-phenotype prediction using deep probabilistic models to integrate standing human genetic variation and variation across all of evolution and clinical datatypes

Principal Investigator: Professor Debora Marks
Approved Research ID: 53995
Approval date: October 28th 2019

Lay summary

Many diseases have a significant hereditary component. In a few cases, we know exactly how a genetic variant causes a specific disease. However, in most cases we do not know which variants are truly causal, even if we have some link between multiple genetic sites and the disease. Furthermore, many diseases are likely to be caused by a complex interaction of multiple genetic and environmental factors. Thus, a deeper understanding of the genetics of these diseases can help us better address them. Our project aims to discover the major genetic players in diseases and how they can be influenced by environmental factors. Therefore, access to the UK Biobank, a large cohort of health records that includes both sequencing information and environmental information, is a great resource to understand how these disease processes arise. We want to use data from the UK Biobank to better predict how a person's genetic mutations, individually and in combination, might affect their overall health in a given environment. To do this, we will build on our prior work that uses statistical techniques to find patterns across large datasets and accurately predict mutation effects. Previously, we were limited to just observing how a particular piece of a genomic sequence changes BETWEEN species - for example, comparing a protein family in all sequenced mammals. With access to the UK Biobank, we can combine this information with how sequences change WITHIN the human species, and use those results as a basis for predicting how mutations, both individually and in interactions with other genetic variations, can affect your health. We can then extend this work to incorporate environmental information, like diet or diagnoses, to further refine our predictions. We will start by focusing on a few diseases, e.g. Alzheimer's and multiple sclerosis, but this work will be more widely applicable. The initial stages will take 2-3 years to complete, and we will expand to other diseases in subsequent years. We hope that this project will enable us to better understand how individual alterations in a person's genome can combine to make them more likely to develop different diseases over the course of their lifetime.