Approved Research

Explainable machine learning for studying genomic association of Alzheimer's

University of Washington, Seattle

Lay summary

The goal of our project is to explore a new method to find genetic factors that are more common in people with Alzheimer's disease.

Alzheimer's disease is the most common neurodegenerative disease, affecting nearly 50 million people worldwide. Furthermore, Alzheimer's is regarded as complex and heterogeneous, meaning that it is both hard to predict and caused by many different factors. Finding the genetic factors that contribute to Alzheimer's can help doctors identify people with the disease and contribute to the development of better treatments.

The traditional approach to finding important genes is a genome wide association study (GWAS) where Alzheimer's disease is modeled as a linear function of a single genetic factor (y=mx+b). Linear models with a single input were used because genomic data often had so many inputs (~1 million genetic factors) and relatively much fewer samples. However, more and more genomic data is becoming available. In particular, with the half million samples in the UK Biobank data set, we hope to use more complex machine learning models to better understand the genetic factors behind Alzheimer's disease.

The complex machine learning models we plan to use include neural networks and tree-based models, which many data scientists prefer to linear ones. Complex models often do better at making predictions in comparison to simpler models; however, the downside is that they are harder to understand. For genomic data, it can often be more important to understand which genes are important over having a model that makes better predictions. In order to bridge this gap, recent advances explain complex machine learning models and may enable us to discover new genetic factors behind Alzheimer's disease.

In this study, we plan to combine complex machine learning models and explainable machine learning and compare the genetic factors we discover to genetic factors discovered by previous work that used more traditional approaches (linear models).