Skip to navigation Skip to main content Skip to footer

Approved Research

Discovering disease-associated variants via deep learning with a biological knowledge graph

Principal Investigator: Professor Jure Leskovec
Approved Research ID: 79791
Approval date: May 19th 2022

Lay summary

Diseases often result from genetic changes in people. To understand a disease, therefore, it's useful to study the genetics of people who do and do not have the disease. Historically, the most common approach is a Genome Wide Association Study (GWAS). In a GWAS, every genetic change is analyzed to see if it happens more often in patients with the disease than in patients without it. Ultimately, GWAS studies identify genetic changes associated with the disease. These changes can then be investigated to understand how the disease progresses, how it varies between individuals, and how it may be treated.

Here, we will improve GWAS by using graph machine learning. Graph machine learning is an artificial intelligence approach which operates on data structured as a network. Here, graph machine learning is helpful because it can analyze each patient's genetic changes in the context of prior knowledge about related diseases. Our first aim is to represent that prior knowledge as a large network that contains diseases, genes, and the known relationships between them. Our second aim is to add UK Biobank patients to this network by representing patients, their genetic changes, and their basic characteristics like age and sex. Our final aim is to develop new machine learning approaches on this network that identify the genetic changes that lead to disease. The machine learning approach will learn from the hundreds of thousands of patients in the UK Biobank, analyzing patients both with and without a given disease and their corresponding genetic changes. Importantly, the machine learning approach will also learn from the prior knowledge present in our network. Hopefully, this will lead to the discovery of new genetic changes associated with diseases which may eventually lead to new therapies.

Our project is scoped to take three years to allow for the construction of our knowledge network, it's integration with the patient data in the UK Biobank, and the development of new machine learning methods to identify genetic changes associated with disease.

Our project will impact public health in three ways. First, our method may lead to a deeper understanding of diseases by identifying new genetic changes associated with them. Second, our method may help us understand how diseases affect patients differently based on specific genetic changes. Finally, our method may identify genetic changes that serve as promising targets for new therapies.