Last updated:
ID:
1153973
Start date:
18 March 2026
Project status:
Current
Principal investigator:
Dr Justin Jee
Lead institution:
Memorial Sloan Kettering Cancer Center, United States of America

Early cancer detection is critical for improving patient survival and achieving cure, yet current screening approaches rely on limited eligibility criteria such as age, sex, and smoking history. While emerging technologies like multi-cancer early detection liquid biopsies hold promise, their utility is constrained by cost and imperfect accuracy when applied broadly. A scalable method to identify individuals at highest risk of undiagnosed cancer is urgently needed to optimize screening and prevention strategies. We propose to harness routinely collected, longitudinal health data, including comorbidities, medications, laboratory and vital sign trajectories, demographics, and lifestyle exposures, to improve individualized cancer diagnosis. These rich datasets remain underutilized in existing risk calculators, which largely ignore dynamic temporal patterns that may signal early cancer development. Advances in artificial intelligence, particularly transformer-based foundation models, enable integration of such complex, irregular longitudinal data at scale. We have developed a time-series foundation model, SPARC, trained on 57,510 patients and over 78 million health measurements at Memorial Sloan Kettering (MSK). SPARC has demonstrated superior performance in predicting mortality and treatment complications compared with static baseline models. In this project, we will extend SPARC to predict new cancer diagnoses across two high-impact cohorts: 1) MSK-CHORD study cohort (N>100,000; ~20% incidence of second cancers): Fine-tune SPARC to identify cancer survivors at risk for second primary malignancies, enabling more targeted prevention. 2) UK Biobank: Fine-tune SPARC to predict first cancer diagnoses in a large, diverse national population. This work aims to deliver dynamic, generalizable, and data-driven cancer diagnosis tool that focus screening on those most likely to benefit, improving outcomes and resource use.