Research Outline: This project aims to leverage multi-omics data and advanced machine/deep learning (ML/DL) techniques to address unmet needs in hepatocellular carcinoma (HCC) and colorectal cancer (CRC) research by using UKB data.
Research Questions: 1) Can integrating multi-omics data (genomics, transcriptomics, proteomics, metabolomics) in UKB data via ML/DL outperform single-omics approaches in identifying novel predictive markers and high-risk genes for HCC and CRC? 2) Do ML/DL models on integrated multi-omics yield more robust prognostic/predictive models than conventional statistical methods? 3) Are there shared/cancer-specific molecular signatures/genes from integrated analysis revealing common/distinct pathogenic mechanisms?
Objectives: 1) Curate, preprocess, and integrate UK Biobank’s multi-omics data (paired with clinical phenotypes) for HCC and CRC cohorts. 2) Develop/validate ML/DL models (CNN, transformer, random forests) to prioritize markers/genes, comparing single- vs. multi-omics performance. Build prognostic/predictive models integrating multi-omics and clinical variables, assessing generalizability via validation. 4) Characterize identified markers/genes to clarify their role in HCC/CRC pathogenesis.
Scientific Rationale: HCC and CRC are leading cancer killers, lacking precise biomarkers for early detection, risk stratification, and personalized therapy. Single-omics fails to capture molecular heterogeneity, while ML/DL excels at analyzing high-dimensional multi-omics to uncover hidden associations. UKB’s large, well-annotated data offers an unparalleled resource for validation. This integration enhances marker/gene discovery accuracy, refines risk prediction, and provides mechanistic insights, enabling clinical translation for early intervention-aligning with UKB’s mission to advance population health and precision oncology.