Machine learning detection of novel disease correlates
Principal Investigator:
Dr Jeremy Rossman
Approved Research ID:
676
Approval date:
April 1st 2016
Lay summary
Modern techniques in computer science are now enabling large datasets to be analysed in ways never before possible. These analyses have successfully enabled the elucidation of novel correlations hidden within the data, correlations that can be used to predict a wide variety of factors. We will use cutting-edge machine learning algorithms to analyse the existing collection of BioBank data to identify novel correlates of disease and create new screening algorithms. Experiments will search for correlates of many diseases, beginning with breast cancer, the third leading cause of death in the current BioBank population. The successful completion of this study will enable the early detection and screening of at-risk patients for a variety of diseases, including those without existing early screening methodologies. Initial experiments will specifically search for correlates of breast cancer. By training the computational algorithms on patients diagnosed with breast cancer (prevalent and incident) it will be possible to identify health factors correlated with the diagnosis of breast cancer. These factors will then be used to search the initial recruitment dataset to identify participants that were breast cancer negative at recruitment but predicted to develop breast cancer. These individual records will be examined to see if any developed incident breast cancer after recruitment, enabling the validation of our approach before screening for correlates of many additional diseases. Full cohort.