Last updated:
ID:
986618
Start date:
28 September 2025
Project status:
Current
Principal investigator:
Professor He Yayi
Lead institution:
Tongji University School of Medicine, China

Aim. Build a pan-cancer, pan-omics framework that integrates UK Biobank circulating proteomics, metabolomics (NMR biomarkers), genetics/exome, imaging-derived phenotypes (IDPs), environmental exposures and longitudinal health records to (i) identify aetiological determinants of cancer, (ii) distinguish early-detection signals from reverse causation, and (iii) deliver robust, fair and well-calibrated risk/prognosis models.
Research questions and objectives.Which proteins, metabolites, genetic signals, IDPs, behaviours and exposures associate with cancer incidence and prognosis across sites, and which are site-specific vs shared? What incremental value does each modality add beyond clinical/lifestyle factors, and which multimodal combinations maximise discrimination and calibration while preserving subgroup fairness? Can genetic triangulation (cis-pQTL, PRS/exome scores, colocalisation, Mendelian randomisation) support causality for candidate biomarkers? Can interpretable ML/DL models provide clinically useful, reproducible tools?
Scientific rationale. Cancer is multifactorial; complementary modalities capture biology from inherited risk to circulating pathways and organ structure/function. Joint modelling across time horizons and cancer types is necessary to separate causal drivers from markers of preclinical disease.
Design and methods. Models include penalised Cox, gradient boosting, deep survival (e.g., DeepSurv/DeepHit) and sequence models (RNN/Transformer) for longitudinal records; multimodal fusion (late/attention). Evaluation: time-dependent AUC, C-index, Brier/IBS, calibration, decision curves, and subgroup fairness (sex, age, ethnicity, deprivation). Interpretability via SHAP/integrated gradients. Work is conducted on RAP; only anonymised, aggregate outputs are exported in the public interest.