Approved Research
Novel Statistical Methods for Complex Survival Data with Applications to Disability Development in the Elderly
Lay summary
Disability in the elderly is a significant health issue with a growing impact on healthcare costs. We aim to develop novel statistical methods to address several important challenges in analyzing disability data from the UK Biobank. Many traditional models (like the Cox model) ignore interval-censoring and delayed entry, producing severe estimation biases and leading to inaccurate scientific findings. To the best of our knowledge, there is few disability research that can properly handle interval-censoring and left-truncation, and in the meantime, deal with large sample sizes and high-dimensional features in the biobank data.
The proposed study contains three parts. First, we will develop a novel semiparametric transformation model under complex censoring (STMCC), tailored for interval-censoring and left-truncation. The semiparametric transformation model is a broad class of statistical methods that incorporate various existing methods, such as the popular Cox proportional hazards model and the proportional odds model. Therefore, the method is more flexible than the conventional Cox model for practical use. Second, we will develop an STMCC-based feature selection method capable of screening and selecting important features associated with disability development out of high-dimensional features, including genetic data with millions of SNPs and imaging data with millions of voxels. Meanwhile, we will incorporate prior information about feature structures (like gene-pathway information) to enhance the interpretability of the feature selection results. These feature selection methods will help us to reduce the dimensionality and extract useful features in the genetic and imaging data. Third, we will develop a novel deep-learning method based on STMCC, which will capture the complex and non-linear structures among high-dimensional features through the state-of-art neural network. A computationally efficient algorithm will be designed to handle the large number of observations in the biobank. By applying the proposed method to the biobank data, we can construct an accurate, dynamic, and individualized prediction model that predicts the risk profiles of disability development at any future time point for each individual.
Overall, we hope that our novel statistical methods will bring new and accurate insights into the genetic and imaging biomarker identification during the disability development process. Furthermore, by providing individualized, dynamic, and interpretable risk predictions, our work will contribute to the early prevention and effective management strategies for disability in the elderly.