Skip to navigation Skip to main content Skip to footer

Approved Research

Knowledge-infused time-to-event models for predicting incidence and progression of disease

Principal Investigator: Dr David Selby
Approved Research ID: 109178
Approval date: December 5th 2023

Lay summary

The goal of this research project is to measure the extent to which different types of data help in predicting the likelihood of contracting certain diseases (or contracting them sooner) and the response to new therapy methods. Much existing research has used electronic health records for this task, using a class of methods known as 'survival analysis' or 'time-to-event' whereas other studies use imaging data or models fitted to different types of -omics data, such as genomics, transcriptomics and proteomics, but do not take account of changing effects over time.

It is common for many datasets to be missing information required to make accurate/precise conclusions. By 'borrowing' information from other datasets, knowledge already available about connections between variables, or even other diseases, we can potentially overcome this issue using machine learning methods.

However, the complexity of the datasets, especially coming from different sources, types or 'modes' means that many methods are computationally intensive to run and the resulting models are hard for clinicians, patients or other stakeholders to interpret. Here we aim to combine methods from different areas, as well as 'transferring' information learnt from one dataset to another, to quantify the extent to which such a mixture of data sources makes predictions more accurate or easier to understand in clinical terms. It is possible that such a combination of data could allow for simpler, more interpretable models.

Many existing machine learning methods can handle large datasets but are designed for classification or regression problems only -- predicting a categorical variable or a continuous value -- but not necessarily for survival analysis -- i.e. predicting when an event is likely to occur. Adaptations of these tools, or certain transformations of the data to turn survival into a classification problem, can allow researchers to use machine learning methods to predict the likely time until an event, or the factors that influence this outcome.

Based on the cumulative expertise of the research group and the track record of previous projects, we are convinced this line of enquiry will provide ready-to-use, transparent/explainable models for evaluating personal risks of disease progress, and, if applicable, likely timelines. Such approaches can be made available to provide timely decision support for health professionals and the general public, allowing for a larger-scale screening of people in the population who may be at (higher) risk, in turn reducing nationwide statistics on late or missed diagnoses.