Plasma proteome provide valuable physiological insights for early cancer detection and localization. In our recent study using the Olink® Explore-3072 platform on a discovery cohort of 599 plasma samples (Control n=123, treatment-naive lung cancer n=190 early and 286 late stage) we identified biomarkers predictive of lung cancer. Applying machine learning (ML) models to our Olink data, we discovered 20 key biomarkers with high predictive accuracy, achieving an AUC of 0.98, with 93% sensitivity and 99% specificity in the unseen hold-out test set. These findings indicate that plasma proteomics with our tailored ML approaches, holds significant promise as a tool for early lung cancer detection.
To validate these results and assess their broader applicability, we plan to apply our ML framework to the UK Biobank data, which includes proteomic profiles from over 54,000 participants, including 392 lung cancer cases and a large set of matched controls. The availability of longitudinal data, comprehensive demographic information, and detailed health records in the UK Biobank makes it an ideal resource for external validation of our ML models and biomarkers.
Our primary objective is to apply our ML models on the UKBiobank Olink data to assess their performance and confirm the robustness of our 20 biomarkers in predicting future lung cancer cases. Integrating demographic data can create more robust models that account for variations in risk across different population groups, ultimately improving the lung cancer detection in diverse cohorts.
Successful model refinement and validation using UK Biobank data would help to develop a rule-in blood test that could transform lung cancer screening by providing a non-invasive, cost-effective means of triaging patients for low dose CT scans. This would not only optimize the use of healthcare resources but also reduce the financial burden on the NHS, while improving patient outcomes through more targeted and earlier detection.