High dimensional modelling of complex traits in the presence of corrupted predictor variables
Approved Research ID: 60755
Approval date: August 24th 2020
We are working on development of improved statistical methods for working with large data like the UKbiobank. In personalized medicine, one of the goals is to predict the likelihood that an individual would develop a disease or some other trait such as bone mineral density (BMD). Some of the challenges include missing data for some participants, inaccuracy of measurements or biases of self reported questionnaires. In this study we want to do a better job in building these complex models, by explicitly accounting for missing data or variables that are inaccurately measured. For example, we know that exercise and diet affect BMD. However, these variables are often inaccurate because they are self reported. To improve prediction we will utilize specially designed equations and algorithms that adjust results for the variables that are measured inaccurately. In this project, we plan to develop and implement these new methods and algorithms so that they work with datasets of the size of the UK Biobank. The Aims include (1) developing a new method that copes with groups of variables where the errors in the data are of different types, and writing software to perform predictions with this method. Our first aim will restrict attention to predicting continuous traits such as BMD. Then (2) we will extend this work to binary traits such as presence or absence of a disease. Our third aim involves exploring more complex models with interactions. The project is expected to last 2 years. The public health impact is indirect. These methods will make it possible to improve predictions for many different traits and diseases.