Deep Learning Human Genetics to Enhance Discovery of Genetic Disease Associations
Principal Investigator: Professor Rick Stevens
Approved Research ID: 55964
Approval date: April 21st 2020
This research applies the latest advances in artificial intelligence, specifically deep learning, to understand how an individual's genetic code may connect to complex observable characteristics (referred to as phenotypes): for example, the shape and structure of their brains, or how certain individuals may be vulnerable to diseases such as cancer. Our research, if successful, will lead to new computational tools that can recognize early warning signs of a disease or susceptibility to different diseases. Deep learning (DL) has been particularly successful in biomedical applications; it has been applied to recognize objects in images and videos such as brain tumors and whether patients have Parkinson's or Alzheimer's disease, extract phenotypes from electronic health reports, and control a surgical robot's movement to name just a few. However, for DL applications to be successful, they need extremely large datasets. We are excited to use the UK Biobank data because of the large number of patients and its intrinsic diversity. We hypothesize that DL approaches can outperform traditional methods, but only when trained on a sufficient number of records. But what exactly is this sample size, and does it vary from disease to disease? Furthermore, what is the nature of inherent bias and vulnerabilities in the context of varying data sample sizes? We are interested in examining the relationship between sample size and prediction accuracy by phenotype for DL methods. We are also interested in understanding the impact of the complexity of the phenotype in relationship to prediction accuracy, variation and error. We will build tools incrementally, beginning with a tool which uses a patient's genetic information to make predictions about different phenotypes. We will start with phenotypes associated with the brain, and head and neck cancers. Quite distinct phenotypes with different complexities. This project will be completed in about two years. The use of DL approaches to characterize complex genotypic-phenotypic associations has been challenging mainly due to the lack of (large, well annotated) datasets. The UK-Biobank datasets offer an unprecedented opportunity to evaluate these new techniques in the context of the questions posed above. We speculate that DL techniques can have a transformative impact in enabling potentially patient specific treatments based on a variety of prediction tasks, such as the early prediction of Alzheimer's, Parkinson's, best drug treatments, and patient outcomes.