Lung cancer remains the leading cause of cancer mortality, largely because most cases are detected at a late stage. Emerging evidence shows that plasma protein levels undergo measurable changes years before a clinical diagnosis, suggesting an opportunity for earlier detection through biomarker modeling. However, current approaches typically rely on cross-sectional comparisons and do not capture the dynamic biological processes that precede disease onset.
This project proposes the development of a deep learning framework to model the pseudo-temporal trajectories of plasma protein expression in the pre-diagnostic period of lung cancer. Using UK Biobank’s large-scale plasma proteomics data, linked cancer pathological stage, and participant characteristics (age, sex, ethnicity, and smoking history), we will reconstruct protein progression patterns that reflect the biological transition from health to preclinical lung cancer.
The research will address three key questions:
(1) Do plasma proteins exhibit systematic, time-dependent trajectories years before lung cancer diagnosis?
(2) Can deep learning models that integrate protein trajectories with demographic and lifestyle factors improve individual-level lung cancer risk prediction?
(3) Are trajectory-based biomarkers generalisable across population subgroups, including never-smokers?
The primary objective is to use the deep learning framework to estimate protein pseudo-time trajectories and construct a multimodal risk prediction framework. Rigorous validation and subgroup analyses will assess model performance and clinical relevance. By transforming longitudinal proteomics into predictive trajectories, this research will guide early detection strategies and improve understanding of preclinical molecular changes in lung cancer.