Approved Research

Development of methodology and computationally efficient software for the analysis of genetic studies of complex "time-to-event" outcomes

University of Liverpool

Lay summary

Identifying genetic information associated with individuals having certain outcomes, for example developing a disease can be useful for predicting disease risk. The most common study design for identifying this genetic information is the genome-wide association study (GWAS) which involves testing patients at hundreds of thousands of genetic variants known as single-nucleotide polymorphisms (SNPs) to see what version of the genetic variant they hold. However, the statistical methods used to analyse GWAS are suitable only for investigating association with common SNPs, where the minor allele frequency (MAF) - the frequency of the least common version of SNP - is greater than 5%.

Whole-exome sequencing is an alternative approach to GWAS with many advantages. In this approach, genetic information is collected from areas of the genome (exomic regions) having a significant proportion of genetic variants which are highly likely to be biologically feasible as predictors of disease risk. The approach also allows rare variants (MAF<5%) to be investigated and they can be grouped together into variants believed to have similar molecular and biological effect, and these analyses are generally better at identifying correlations with outcome than single-variant approaches.

Analysing whole-exome sequencing data requires different statistical methods to those used for analysing GWAS data, and whilst appropriate methods have been developed, these are only for studies with binary or continuous outcomes. Often our interest is in 'time-to-event' outcomes, for example age of disease onset, and therefore we are experiencing an analytical bottleneck for identifying genetic variants associated with these types of outcomes. In this project we aim to address this bottleneck by developing novel statistical methods and appropriate software for time to event outcomes that can cope with the scale and complexity of exome sequence data, and apply them to real datasets to identify genetics predictors of age of onset for various diseases measured in UK Biobank. In turn, this will increase the likelihood of identifying genetic variants associated with health outcomes, improve prediction models of age of disease onset and inform drug development.