Skip to navigation Skip to main content Skip to footer

Approved research

Machine learning for rare disease pattern recognition and analysis

Principal Investigator: Mr Giovanni Charles
Approved Research ID: 54218
Approval date: January 17th 2020

Lay summary

Undiagnosed disease places a significant economic burden on the health sector. An accurate diagnosis can lead to patient-centric clinical benefits as well as monetary benefits for hospitals. An independent study commissioned by Mendelian showed that rare disease (RD) patients, while undiagnosed, have cost the NHS in excess of £3.4 billion over the last 10 years [1]. We hope that our framework would lead to better insight into the large dataset provided by UK BioBank enabling to produce a scalable way to analyze heterogenous data for the purpose of understanding patterns in health data. We aim to develop pipelines to process the data, extract features of interest and perform classification for rare/undiagnosed diseases. We will therefore use supervised and unsupervised techniques widely used in the machine learning field as well as finding new frameworks for such datasets. From a medical research point of view, our statistical models will provide good insight to disease modellers in terms of advanced data exploration techniques for rare disease prediction, statistically identifying good predictors of rare disease diagnosis, discovering hierarchical structures that may exist in the clinical patterns of rare disease as well as accounting for confounding health factors and individual diversities. We will compare the performance and impact of a variety of predictive systems on rare disease diagnosis. These systems are expected to differ in interpretability, accuracy, confidence, and coverage. We will also explore the impact of using a predictive system to suggest relevant referrals, suggest genetic testing and predict diagnosis on our health system. Evaluations will consider the health economic, clinical workflow and clinical pathway aspects that would be affected. We would return any medical records suspected of rare disease, and any training/validation sets created by the study back to the BioBank. [1] -