Skip to navigation Skip to main content Skip to footer

Stroke risk prediction using routine electronic health records enhanced by genetic data and machine learning

Stroke risk prediction using routine electronic health records enhanced by genetic data and machine learning

Principal Investigator: Dr Alexander Knight
Approved Research ID: 55513
Approval date: March 3rd 2020

Lay summary

Stroke is a preventable disorder, so why are stroke units in the UK full? The reasons are complex, but there are three key factors:


  1. We don't do a good job of controlling known risk factors like blood pressure, cholesterol, and diabetes. If we did this better, we could cut the number of strokes by more than half.
  2. We have a poor understanding of how the risk factors combine and how they change over time.
  3. The genetic factors are not well understood. In the future, genetics will become part of health records and could be used to forecast stroke risk.


Our vision is to use information from people's medical records and their genetic profile, to predict their personal stroke risk. By targeting treatments on those patients with the highest risk we will be able to dramatically reduce the numbers of strokes occurring each year.


We will combine data from:

* 10,000 stroke patients who have given us permission to access their medical records and to determine their genetic profile

* The UK Biobank's ~500,000 participants


With this data we will:

  1. Develop a computer model that could be used to predict someone's personal stroke risk from their medical records;
  2. Assess the importance of genes when predicting the risk of stroke


To develop the computer model, we will use advanced techniques in "machine learning" to identify patterns in patients' electronic health records that are linked to higher stroke risk. We need sophisticated methods to interpret these records because they are often incomplete and have long gaps in them. We will test how our model performs, and if successful we will use it to develop a "Clinical Decision Support Tool" that will be able to predict stroke risk from a patient's medical records.


In the future an important part of people's medical data will be their genetics. 100,000 genomes have already been sequenced by Genomics England, but to put this data to work, we need to understand how someone's DNA influences their risk of stroke and how it interacts with other risk factors.


The combination of genetic data with medical records will enable accurate risk predictions and greatly improve the targeting of treatments onto those patients who need it the most; ultimately this will significantly reduce the numbers of strokes that occur each year.