Approved Research
Development of machine learning methods for modeling higher-order genetic interactions in human health and disease traits
Lay summary
Many human traits (for example, height, weight, disease risks) are in large part determined by the underlying DNA sequences. Hence a major goal in genetics is to predict the trait of a person from their DNA sequences alone. The achievement of this goal can have many transformative effects on public health. However, a major challenge in implementing this idea is the phenomenon that genes often interact to determine the observable traits, meaning that the effect of a genetic variant (e.g. mutation) observed in one patient may have a dramatically different effect when combined with different background DNA sequences found in a new patient. For this project, we aim to use modern machine learning to extract the complex rules that govern how genes interact to determine various human health and disease traits from the UK Biobank datasets. We envision this project to take place over three years, and output several methods capable of accurately predicting a patient's disease risks, along with many other phenotypes, simply by reading the patient's DNA sequences. We believe these methods will greatly aid the early identification of high-risk individuals so that effective preventative measures may be taken. We also believe that they may prove valuable tools for many other fields in public health.