This project addresses a critical challenge in AI-enabled healthcare: the heterogeneous performance of predictive models, which can amplify existing health disparities across population subgroups. Models trained for aggregate accuracy often overlook differences in effectiveness when disaggregated by socially sensitive criteria such as age, sex and ethnicity, perpetuating historical unfair pathways in disease diagnosis and treatment.
#Scientific rationale
This project aims to move beyond aggregate predictive accuracy to ensure that Machine Learning (ML) models for chronic disease do not exacerbate health inequities. Using the rich, multi-modal UK Biobank dataset, we will investigate risk prediction for chronic cardiovascular diseases.
# Research Questions
1. What performance disparities do state-of-the-art ML models exhibit when disaggregated across socially sensitive subgroups?
2. What methods can reveal the causal factors ad pathways that lead to inequitable outcomes? (e.g. Can counterfactual models be used to identify these pathways?)
3. How can novel bias mitigation strategies be developed and validated to ensure equitable patient-level predictions across subgroups?
# Objectives
1. Develop accurate fairness-unaware baseline models in line with state-of-the-art ML practices to predict risk of chronic cardiovascular diseases
2. Evaluate performance disparities across subgroups defined by socially sensitive criteria
3. Develop, benchmark, and validate bias mitigation strategies, including the exploration of counterfactual models.