Approved research
Methods to facilitate variant interpretation in genomic sequencing
Approved Research ID: 53953
Approval date: February 3rd 2020
Lay summary
Genetics plays a major role in the cause and predisposition to disease. Approximately 3.5M people in the UK suffer from a rare disease at some point in their lives, and research shows that nearly 50% of rare disease sufferers wait >5 years for a diagnosis. Delays in diagnosis lead to inappropriate management and disease progression. 80% of these are rare diseases are influenced by a genetic change in their genome. There is also a growing understanding of the role of genetics in cancer, which it is estimated will affect approximately 50% of people in the UK during their lifetime. Characterization of the genetic changes that have occurred in these diseases can lead to a diagnosis and determine which treatment is likely to work. Therefore, sequencing the genomes of these patients is essential to ensure they will receive the best possible care. NHS England recently launched the NHS Genomic Medicine Service to enable this to become routine. Yet while DNA sequencing is now becoming widespread, there is still a bottleneck in interpreting the data. Many hundreds of genetic changes can be identified in a person's genome, many of which are unrelated to the disease. Determining which of these is having an effect is a highly manual and time-consuming process. In addition, a particular genetic change may not cause disease in all cases and may be influenced by other factors. Highly-trained clinical scientists must therefore search through large numbers of genetic changes and assess a wide range of data sources including the academic literature to determine which ones are the most likely to be disease-causing. We aim to develop an automation system that supports clinical interpretation. The more we know about which variants are present in different patients and in which disease, the better chance we have of predicting which variants are disease-causing in the future population. The availability of the high-quality UK Biobank sequencing dataset, along with the recent development of innovative statistical and machine-learning methods, means we can now seek to identify the factors that determine whether a genetic change is important in a given patient and use this information to develop tools that predict whether a variant is likely to be disease-causing. Automating this process will ease the pressure on clinical scientists, ultimately reducing waiting times for patients with genetic disease.
Scope extension:
Many medical conditions, including rare disease, cancer and common diseases such as diabetes, are now known to have a genetic basis. Understanding which of the many genetic variants identified during genome sequencing are disease-causing (pathogenic) is vital in understanding and treating these conditions, yet interpreting this data continues to be a significant challenge. To address this, we aim to develop new statistical and machine-learning methods to predict which of these variants are likely to be pathogenic. Congenica produces a leading clinical decision platform used to facilitate variant interpretation in rare disease and has access to an initial pool of variant interpretation data. However, in order to predict which variants are pathogenic it is necessary to understand the contribution of variants across the whole population. As an extensive dataset encompassing individuals across the full spectrum of health conditions, we propose to use the UK Biobank data to investigate associations between variants and health outcomes using a case-control approach. We intend to use this analysis to develop tools that will improve and automate the identification of pathogenic variants. Ultimately, we aim to make variant interpretation more efficient, easing the pressure on clinical scientists and reducing waiting times for patients with genetic disease.
Following on from our initial analyses looking at variant pathogenicity in rare disease, we would like to apply similar statistical and machine learning techniques to look at how variants identified in genomic sequencing impact on a broader range of health outcomes, including expanding our analysis to additional disease areas and how genetic variants impact on the way we respond to drug treatments.