Principal Investigator: Dr Daniel MacArthur
Broad Institute, USATags: 48511, aggregate data, association testing, browser, gene burden, joint calling, reprocessing
The UK Biobank is creating exome data for its 500,000 participants – this is the sequence data of all the genes of each participants. Over the next 3 years, we would like to process this data to make it consistent with other big exome datasets, so that these datasets can be combined and compared. We would then like to combine the processed UKBB exome data with the Genome Aggregation Database (gnomAD) dataset, one of the most widely used human genetic datasets in the world. Then we will publicly release the combined dataset – but just in aggregate form, as gnomAD does now (gnomad.broadinstitute.org). People from around the world will be able to see how many humans possess a particular genetic change, and what the distribution of their sex and age and ancestries are – but they won’t be able to tell which person any particular genetic change came from.
The gnomAD dataset is currently used by scientists and doctors around the world, to help diagnose patients with rare diseases, to help make new medications, and for basic scientific understanding of how human genes can change, and how similar and different we all are. This project would add UK Biobank data to this effort, making diagnoses and research projects easier to perform, and making the UK Biobank exome data available to people without a lot of computational expertise in a way that still preserves the privacy of the participants.
We would also create a new web browser, to display the results of analysis of UK Biobank data. These analyses would use the exome, the older genotype data, and all the questionnaire answers and measurements taken to try to find associations between the information in people’s DNA and everything else the UK Biobank has been measuring. This web browser would also be publicly available, making these important analyses available to scientists and the public.