Skip to navigation Skip to main content Skip to footer

Approved Research

Machine learning for using omics data to improve the classification and prediction of diseases in the UK Biobank

Principal Investigator: Professor David Clifton
Approved Research ID: 83801
Approval date: November 21st 2022

Lay summary

Genetic information is important for predicting the risk of many types of diseases. "Polygenic risk scores" (PRS, measured using a simple blood sample) are a summary measure of a person's genetic susceptibility for certain diseases. We will use advanced machine learning (ML) methods to investigate different PRS at the genomic level (a person's DNA).

The use of other detailed biological information ("omics") provides biological insights to specific diseases. We will incorporate multiple levels of omics data, starting from genomics (e.g. PRS as above), followed by other types of "omics", named transcriptomics (how the DNA is used by the body to make RNA), proteomics (how the RNA is used to make proteins), and finally metabolomics (how the body uses proteins in its cells), all of which have the potential to improve the classification and prediction of specific diseases.

An important step in the analysis of complex data is to visualise the results of the ML methods.  We will use best-practice ML methods for this work, and will develop new models for combining the different data into a single model so that we can better understand diseases. For example, the extracted information from the data will then be modelled using ML methods such as deep neural networks.

We will combine the different levels of data described above (genomics, proteomics, etc.), thus further helping us to understand the underlying pathway of specific diseases. This is of particular relevance to rare diseases and infectious diseases. For example, advanced ML methods can assist in addressing the challenge of having a low number of cases in rare diseases, and providing timely insight into infectious diseases such as the fast-evolving variants of COVID-19.  A key product of the work will be the ML methods that can be used by other researchers to answer their own questions of the data in UK Biobank.