Skip to navigation Skip to main content Skip to footer

Approved research

Statistical/Machine classification of disease

Principal Investigator: Dr Anthony Webster
Approved Research ID: 42583
Approval date: November 2nd 2018

Lay summary

Patterns of disease within a population can indicate common causes. These causes can either be avoided, or treatments can be developed for them. One pattern of disease would be an increased risk of asthma in an area of high pollution. This project explores whether the time from a person's birth to the occurrence of a disease can be combined with other medical information to identify the existence of high-risk groups, or to better understand the processes by which diseases progress. This is possible through a combination of "big data" and emerging statistical methods such as "machine learning". Our primary aim is to search for new links between diseases. We will develop a new classification of diseases, with the expectation of providing new insights and a deeper biological understanding of links between them. Such insights have previously led to new lines or research, better treatment, and improved prevention. A secondary aim is to identify groups within the population that are especially susceptible to particular diseases. The existence of higher-risk groups would prompt further work to identify individuals at risk and to modify advice. The research is made possible by a combination of large data sets ("Big Data"), and emerging modern statistical methods. Older methods will be combined with new techniques to make full use of the benefits of big data. The majority of time (roughly 60%) will be used to develop, implement, and optimise new methods, possibly with updated studies if new data becomes accessible. The rest will be used to explore the consequences of the results, and to report them. The project will run for an initial period of 36 months, longer if results prompt further studies, and if funding permits it. The project's impact is likely to be felt through new lines of research to understand and tackle disease, and indirectly through subsequent use of new methods. If high-risk groups within the population are identified, then it may influence future medical advice and diagnosis.