Approved Research

Learning disease characteristics from multi-modal data for precision medicine

Hasso Plattner Institute for Digital Engineering gGmbH

Lay summary

Family studies have shown that most common diseases have a heritable component. Due to advances in genotyping and sequencing technologies, a number of diseases such as hypertension, chronic kidney disease, epilepsy, back pain, obesity, inflammatory bowel disease (IBD), Alzheimer's disease, and psychiatric disorders have been linked to human genetic variation. In other words, the diagnosis, prognosis and treatment of these diseases can be informed by one's own genetic make-up.

In most diseases, heritability cannot be explained by one specific variant with large effects in a given gene. It has been shown for most common diseases that a large fraction of heritability can be explained by the added small effects of multiple, often thousands of genetic variants. These can be combined into a single number often named 'polygenic risk score'.

Beyond genetic factors, clinical or behavioral risk factors such as smoking can influence disease manifestations. Therefore, the project's main goal is to combine clinical, genomic, and other data to improve risk prediction for complex diseases using machine learning. This work can also be used to make new definitions of diseases, with the goal of more personalized risk assessment and therapy planning and gaining insight into disease mechanisms. UK Biobank is a unique project that makes integrating genomics and clinical data possible.

An important aspect in the medical field is validation: making scientific discoveries on one single patient cohort might not transfer well to other patient populations. As such, we also aim to validate the results obtained with the UK Biobank in other comparable data resources, such as the BioMe cohort from the Mount Sinai Health System or the FinnGen study. If results are similar across cohorts, researchers can be more confident that they are true and do not come from statistical noise in the original data.

Over the course of three years, we plan to develop and validate machine learning approaches that may eventually improve health e.g. by suggesting measures to reduce the burden of disease before a more severe manifestation takes place.