Skip to navigation Skip to main content Skip to footer

Approved Research

Derivation of a machine learning (ML) model to improve pre-test probability of identifying individuals with genetic variants linked to familial hypercholesterolemia (FH).

Principal Investigator: Mr Christophe Stevens
Approved Research ID: 67789
Approval date: January 27th 2021

Lay summary

Familial hypercholesterolaemia ("FH") is an inherited disease caused by variations in genes related to the clearance of LDL-cholesterol (the so-called "bad cholesterol"). As a result, FH increases the levels of LDL-cholesterol from birth within the blood stream. Over time, this continuous exposure to high LDL-cholesterol results in fat deposits ("atherosclerotic plaques") within the arteries, generally referred to as atherosclerosis. When these plaques become large and/or unstable, they can slow down the blood flow or generate blood clots obstructing the blood flow, causing heart disease or an acute heart attack. Cholesterol-lowering medications decrease levels of LDL-cholesterol, and, when administered early using effective doses, can prevent the development of atherosclerosis and heart diseases/attacks.

FH affects approximately 1 in every 311 individuals but is underdiagnosed, with less than 7% currently identified in the UK. Genetic testing is the most accurate tool for diagnosing FH, but it is expensive and not available everywhere due to a lack of resources. Diagnostic tools called clinical criteria are often used instead of genetic testing in daily clinical practice. These tools use patients' characteristics including cholesterol levels, age at onset of heart diseases and family history, to make a diagnosis. Unfortunately, these tools might not work well in different populations and often fail to accurately identify FH patients. 

With our research, we aim to help find FH patients using a branch of Artificial Intelligence called Machine Learning (ML). ML consists of a set of techniques that allow the replication a specific human behaviour involving reading large amounts of information and making predictions based on the data. ML models are computer software and mathematical models derived from the data that can differentiate between disease-free individuals and affected patients. We believe that ML models can better identify FH patients than clinical diagnostic tools currently used in clinical practices.

The performance of ML models will be compared to the performance of traditional clinical diagnostic tools. This will be done by counting the number of patients who have been misclassified by current diagnostic tools and newly derived ML models. If our ML models outperform clinical diagnostic tools, they will help identify more FH patients, on a national scale and earlier in life, ultimately allowing clinicians to treat more patients and help prevent heart disease. The present proposal over 3 years would be expected to substantially improve the current detection rates of <7% UK and <5% globally, in a cost-effective, scalable fashion.

Research Questions:

1) How well do clinical criteria (Dutch Lipid Network Criteria, SimonBroome, Medped, NICE) identify individuals with genetic variants consistent with FH?

2) Can a ML model improve the likelihood of identifying carriers of genetic variants consistent with FH?


Clinical criteria for FH alone, in the absence of a genetically confirmed diagnosis, have their own limitations of sensitivity and specificity with implications for therapy, testing of relatives and unnecessary referrals to specialists. We intend to derive ML models to improve the pre-test probability of identifying carriers of FH variants in a UK Biobank (UKBI) derivation dataset and compare the sensitivity and specificity of current clinical criteria to those of ML models on a UKBI evaluation dataset. Should ML models outperform clinical criteria, we would, as a next step, (beyond this proposal) apply ML models to routine electronic healthcare records (EHRs) and conduct a trial of current approaches to case finding versus applying ML models to improve pre-test probability. Our approach could lead to the development of a cost-effective and internationally scalable decision support tools integrated within EHRs, potentially reducing the detection gap in FH where of the estimated 25 million patients <5% are believed to have been identified.


Extended Research Questions:

1) How effectively are modifiable risk factors managed (smoking and drinking habits, physical activity, and obesity) in treated FH patients?

2) Can causal models, derived using machine learning and the doWhy causal framework, quantify the impact of modifiable risk factors on CVD and aid in personalized risk management?


We aim to conduct a causal machine learning analysis of individuals carrying FH variants identified in our initial step. The goal is to establish the potential causal effect of modifying risk factors on overall CVD risk. If these effects are substantial, there is an opportunity to develop a causal machine learning model. This model could assist clinicians treating FH in better understanding the potential benefits of lifestyle interventions.