Last updated Dec 12, 2018
Find out more about the possibilities for genetic research within UK Biobank: UK Biobank Scientific Conference 2018
View our Genetics Publications list for genetics-based papers being generated through use of the resource.
What genetic data are available?
Genome-wide genotyping data are available for all 500,000 participants in the UK Biobank cohort. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array on an initial 50,000 participants; the remaining 450,000 participants were genotyped using the Affymetrix UK Biobank Axiom® array, that genotyped ~850,000 variants. The two arrays are extremely similar (with over 95% common content). Further details can be found in the useful resources section at the bottom of this page.
Quality control and imputation (to over 90 million SNPs, indels and large structural variants) was performed by a collaborative group headed by the Wellcome Trust Centre for Human Genetics.
The following data are available:
- A clean set of QC’ed genotype calls
- Confidence values that a genotype call is correct
- Intensity data to generate cluster plots
- Extensive QC information regarding SNPs and samples including SNP metrics, batch effects, population structure and relatedness
- Imputed data
The following data is also available upon request:
- Un-QC’ed genotype calls and confidences
- CEL (image) files
- Spectrophotometric measurements taken during DNA isolation
Documentation detailing the quality control analyses and the imputation methodology can be found via the useful resources section at the bottom of this page:
Please see the Data Showcase as to which data are available: http://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100314
If you wish find information on specific genetic loci measured on the Axiom array, please use the Genomic Search facility.
Timelines of data availability
The first batch of genetic data, which included genotyping and imputed data (on approximately 150,000 participants) was made publicly available in May 2015. This included the 50,000 participants genotyped using the UK BiLEVE array and 100,000 participants genotyped on the UK Biobank array. Genetic data for the full cohort was released in July 2017.
Other genetic sequencing projects underway
Exome sequencing: In 2017 GSK and Regeneron made an application for an exome sequence assay on 50,000 UK Biobank participants. Regeneron have entered into a further collaboration with AbbVie, Alnylam Pharmaceuticals, AstraZeneca, Biogen, Pfizer, Takeda and Bristol-Myers Squibb to undertake an exome sequence assay on the remaining 450,000 participants. They aim to complete this work over three years, with the data being made available through the Data Showcase by the end of 2020.
Whole genome sequencing: In spring 2018 the Medical Research Council (MRC) awarded UK Biobank with a £30M grant to sequence the whole genome of 50,000 UK Biobank participants. This sequencing will be undertaken by the Wellcome Sanger Institute, Cambridge and began earlier this year and will continue through to the beginning of 2020.
Applications for genetic data
UK Biobank would like to make the following comments on the scope of applications for genetic data.
- All applications (including GWAS-PheWAS applications) require a clear stated hypothesis.
- It is ultimately for researchers to select the associations they wish to study from the genotyped and imputed SNPs and the UK Biobank phenotypes. UK Biobank may have a view on whether such associations (in particular whether the selected phenotypes) would be appropriate, but there is no underlying restriction in principle, which would serve to limit the scope of the associations that a researcher might choose to study. As such, as a matter of principle, UK Biobank would consider approving a suitable GWAS / PheWAS study.
- Although some phenotypes are readily available, others (particularly some health outcomes) may not be well-ascertained or may not be appropriately validated (at this time). By way of illustration, self-reported outcomes collected during the participant baseline visit are readily available. However, other phenotypes, such as validated outcomes for incident and prevalent disease depend on the availability of the health record linkage data (over which UK Biobank inevitably has less direct control).
- UK Biobank will accept requests for extensions to existing applications – as long as the application still satisfies the broad purpose of the overall research question. If we think it falls outside the scope of the original application, then we will ask you to resubmit a new application.
- UK Biobank intends to make available in due course a set of all (or at least the great majority) GWAS results available through the European Genome Archive. All other (i.e. non-GWAS) results should be returned to UK Biobank directly at the time of publication.]
Use of a single genetic dataset
We have received a number of requests from institutions who would like to be able to store a single central genetic dataset, which can be linked a) between collaborators and b) for use on multiple applications from within the same institution. We support this proposal and, with permission, we approve the use of a single institutional genetics dataset to be used for multiple applications. For the avoidance of doubt, each research application needs to have their own copy of the phenotypic dataset (as these datasets are relatively small in size and are encrypted with project-specific identifiers).
UK Biobank genotyping and imputation data release
- UKB Genotyping and Imputation Data Release – FAQ
- Decryption key & instructions on how to remove the UK Biobank encryption wrapper *
- UK Biobank genetic files data formats
*This document is now deprecated and should no longer be required. If you downloaded v2 of the genotyping and imputed data from EGA (with both UKB and EGA encryption) you may still need this document but going forward this will no longer be required.
Details on the Affymetrix UK Biobank Axiom® array:
UK Biobank genetic quality control documentation:
- UK Biobank Quality control documentation
- Imputation Documentation
- View details on the UK Biobank Axiom Array here
Genome-wide genetic data on ~500,000 UK Biobank participants, Clare Bycroft et al. bioRxiv July 2017
The impact of improved microarray coverage and larger sample sizes on future genome-wide association studies, Lindquist KJ et al. Genetic Epidemiology May 2013 37 (4): 383-392