Integrative predictive modeling of multi-modal data in cancer, aging, and cardiovascular diseases

Last updated:: 17 October 2025

ID:: 922920
Start date:: 17 October 2025
Project status:: Current
Principal investigator:: Dr Sheng Luo
Lead institution:: Duke University, United States of America

Research questions
We are seeking access to UK Biobank data to address the following research questions:
a.Biomarker Identification
Which data modalities (e.g., clinical phenotypes, neuroimaging, multi-omics profiles) are significantly associated with the risk of symptom onset?
Among these modalities, which provides the highest predictive utility for symptom onset?
Which specific features within each modality (e.g., imaging voxels, genetic variants, protein expression levels) exhibit the strongest predictive associations with symptom onset?
b.Risk Stratification
Given the identified informative multi-modal features, can we stratify individuals into high- and low-risk groups with respect to future symptom onset?
Among individuals classified as high-risk, which therapeutic interventions are associated with improved clinical outcomes?
c.Dynamic Risk Prediction
Can the proposed model accommodate newly acquired longitudinal data to dynamically update individualized risk estimates for symptom onset?
How does the proposed integrative modeling framework compare to existing approaches, including conventional and advanced machine learning algorithms, in terms of predictive accuracy and model generalizability?
Objectives In this proposal, we aim to develop an integrative modeling framework to leverage complex multi-modal data for disease modeling, biomarker identification, and personalized risk prediction in cancer, aging-related disorders, and cardiovascular diseases.
Scientific rationale for the research This proposal leverages multi-modal data from the UK Biobank to address key research questions related to biomarker identification, risk stratification, and dynamic risk prediction. We aim to employ advanced statistical methodologies, including multivariate functional mixed models, multi-omics factor analysis, and longitudinal functional data analysis, to extract latent disease profiles, which will be associated with survival outcomes.