Last updated:
ID:
1120616
Start date:
5 January 2026
Project status:
Current
Principal investigator:
Professor Gert Uves van Zyl
Lead institution:
Stellenbosch University., South Africa

Scientific Rationale: Four groups of non-communicable diseases – namely cancer, endocrine-metabolic diseases, neurodegenerative diseases, and cardiovascular diseases – have the most negative impact on healthspan, which is defined as the period of life when individuals remain healthy despite advancing chronological age. Modifiable risk factors present an opportunity to diminish the adverse impact of these conditions on health quality.
Longitudinal electronic medical data serve as valuable predictor and outcome variables for assessing an individual’s risk for various cancers when employing deep learning models. I shall therefore train a deep learning model, referred to as Delphi2M, on the UK Biobank data and validate this model on South African individuals (utilizing cases with cancer and controls who reached a similar or older age without cancer). Identifying risk factors will facilitate both their modification and, where appropriate, the application of cancer screening tests.
The research question investigates whether an AI model built on longitudinal UK Biobank data – including ICD-10 codes, age, gender, alcohol use, cigarette smoking, and BMI – can effectively predict cancer risk in South African individuals. This preliminary data will inform a subsequent study focused on conducting cancer screening in high-risk individuals based on the preliminary findings.
The objectives are therefore:
1. Reproduce the published Delphi-2M model trained on UK Biobank data.
2. Validate the UK Biobank-trained model using a limited set of longitudinal health data from South African case-control participants, focusing on breast, lymphoma, oesophageal, lung, liver, colorectal, gastric, ovarian, and pancreatic cancers.
3. In a subsequent study, collaborate with a team of researchers to identify individuals who require cancer screening based on the identified risk factors.