Whole Genome Imputation of UK Biobank participants in partnership with Genomics England
Principal Investigator:
Professor Mark Caulfield
Approved Research ID:
48031
Approval date:
July 4th 2019
Lay summary
The UK Biobank dataset has already proved of tremendous value in identifying genetic mutations underlying differences among people in their odds of getting different diseases, as well as differences in normal traits such as height. However, many mutations carried by the volunteers within UK Biobank have not been identified, especially rare mutations. Another dataset, the 100,000 genomes project generated by Genomic England, identifies almost all mutations in a different collection of individuals. We will use this dataset to computationally identify such mutations, and their carriers, in the UK Biobank data. To do this, we will use the fact that when individuals share stretches of DNA through shared ancestors, the mutations two individuals carry are also mainly shared within these stretches. This enables us to 'impute' markers from individuals from the 100,000 genomes project to those in UK Biobank based on patterns of DNA sharing. This approach will allow us to map variation patterns at hundreds of millions of mutations in the Biobank data. We will release these data back to researchers studying the Biobank, to enable them to understand how these mutations impact our health, and risk of disease. This will be of very wide usefulness: yielding new insights for a large proportion of the traits that can be studied within these samples. A particular benefit might be enabling researchers to find rare mutations that have a larger impact on disease risk, or disease-related traits, than those seen for commoner variants. These large impacts can provide powerful insights into the biological causes of their effects.