This project aims to develop and evaluate domain-specific foundation models that can learn generalizable biomedical representations from large-scale, multi-modal UK Biobank data including genomics, imaging, electronic health records (EHR), and lifestyle data to support early diagnosis, risk stratification, patient-clinical trial matching, and individualized therapy prediction in cardiovascular and oncology indications.
We will address the following research questions:
Can multi-modal foundation models improve risk prediction accuracy for diseases such as coronary artery disease, atrial fibrillation, breast cancer, lung cancer, oral cancer, and cervical cancer?
How do latent representations derived from UK Biobank data capture underlying disease biology across different populations?
Can these models help identify novel biomarkers and disease subtypes that are not evident through unimodal analysis?
The core objective is to build interpretable, privacy-preserving, and regulatory-compliant AI systems that align with the latest international guidelines on medical AI use. Scientific rationale stems from the need to move beyond task-specific models to foundation models that generalize across cohorts, diseases, and health systems, reducing bias and enhancing robustness. UK Biobank provides one of the richest resources globally to train such models due to its diversity, data completeness, and longitudinal depth.