Last updated:
Author(s):
Frederik Christensen, Deniz Kenan Kılıç, Alexander Djupnes Fuglkjær, Jesper Petersen, Tarec Christoffer El-Galaly, Andreas Glenthøj, Jens Helby, Izabela Ewa Nielsen
Publish date:
4 November 2025
Journal:
eJHaem
PubMed ID:
41195239

Abstract

Background: Haemoglobin S (HbS) and C (HbC) are the most important sickling variants on the African continent, imposing major health burdens. Early detection of carrier status is crucial but often hindered by resource limitations.

Objectives: To develop machine learning (ML) models to accurately classify HbS and HbC carriers using readily available routine blood tests, facilitating cost-effective mass screening.

Methods: We utilised demographic and routine blood parameters from 469,248 individuals from the UK general population, including 1635 individuals with HbS and/or HbC variants identified by whole exome sequencing, to develop ML models for carrier detection based on standard blood tests. Three ML models (Logistic Regression [LR], Random Forest [RF] and XGBoost [XGB]) were trained using 32 different standard blood test results.

Results: All models demonstrated high discriminatory ability (ROC-AUC: LR 0.951; RF 0.943; XGB 0.956) in the UK general population. At a sensitivity of 95%, specificities were 77% (LR), 76% (RF) and 78% (XGB). SHAP analysis revealed consistent key features across models. When use was restricted to black individuals, performance fell considerably.

Conclusions: ML models based on routine blood tests effectively identify HbS and HbC carriers in a mixed general population. This approach has the potential to enhance screening efficiency by reducing reliance on specialised techniques.

Related projects

The blood consists of cells such as the red blood cells, white blood cells and platelets. These blood cells are often measured as part of…

Institution:
Copenhagen University Hospital, University of Copenhagen, Denmark

All projects