Last updated:
ID:
1224312
Start date:
26 January 2026
Project status:
Current
Principal investigator:
Mr Naisarg Bhawna Patel
Lead institution:
Vellore Institute of Technology, Vellore, India

Early identification of women at elevated risk of hormone-related cancers remains a key challenge for population health and cancer prevention. Existing risk prediction models typically focus on single cancer types and rely on additive statistical frameworks that may not fully capture complex interactions between genetic, biological, and lifestyle factors.

This project aims to develop an explainable machine learning framework to predict 5- and 10-year incident risk of breast, ovarian, and endometrial cancers using longitudinal data from UK Biobank. The central research questions are:
(1) Can non-linear integration of polygenic risk scores, circulating blood biomarkers, reproductive history, and lifestyle factors improve risk prediction compared to traditional models?
(2) Which risk drivers are shared across female hormone-related cancers, and identify the cancer-specific biomarkers for further analysis?
(3) How do key risk contributors differ across menopausal status and follow-up time horizons?

We propose to train a machine learning model using time-aware case-control sampling, with hyperparameter optimisation performed using Optuna. Model performance will be evaluated using discrimination and calibration metrics. It will also be benchmarked against other previously published models. To ensure transparency and clinical interpretability, SHapley Additive exPlanations (SHAP) will be used to identify and quantify individual-level and subgroup-level risk drivers.

External validation will be conducted through temporal splits and stratified analyses by menopausal status. The study will not involve re-contacting participants or returning individual results.

This research has the potential to provide a transparent, scalable framework for women’s cancer risk stratification, supporting evidence-based prevention strategies and contributing to the development of precision screening approaches in population cohorts.