Last updated Mar 6, 2018
Find out more about the possibilities for genetic research within UK Biobank: UK Biobank annual meeting 2017
Search our publications database for genetics-based papers.
UK Biobank genotyping and imputation data release
- UKB Genotyping and Imputation Data Release – FAQ
- Decryption key & instructions on how to remove the UK Biobank encryption wrapper *
- UK Biobank genetic files data formats
*This document is now deprecated and should no longer be required. If you downloaded v2 of the genotyping and imputed data from EGA (with both UKB and EGA encryption) you may still need this document but going forward this will no longer be required.
What genetic data are available?
Genotype data are available for all 500,000 participants in the UK Biobank cohort. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array on an initial 50,000 participants; the remaining 450,000 participants were genotyped using the Affymetrix UK Biobank Axiom® array. The two arrays are extremely similar (with over 95% common content). Further details can be found by clicking the links below:
Quality control and imputation (to over 90 million SNPs, indels and large structural variants) was performed by a collaborative group headed by the Wellcome Trust Centre for Human Genetics.
The following data are available:
- A clean set of QC’ed genotype calls
- Confidence values that a genotype call is correct
- Intensity data to generate cluster plots
- Extensive QC information regarding SNPs and samples including SNP metrics, batch effects, population structure and relatedness
- Imputed data
The following data is also available upon request:
- Un-QC’ed genotype calls and confidences
- CEL (image) files
- Spectrophotometric measurements taken during DNA isolation
Documentation detailing the quality control analyses and the imputation methodology can be found via the links below:
- UK Biobank Quality control documentation
- Imputation Documentation
- View details on the UK Biobank Axiom Array here
Please see the Data Showcase as to which data will be made available: http://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100314
If you wish find information on specific genetic loci measured on the Axiom array, please use the Genomic Search facility.
Timelines of data availability
The first batch of genetic data, which includes genotyping and imputed data (on approximately 150,000 participants) was made publicly available end May 2015. This includes the 50,000 participants genotyped using the UK BiLEVE array and about 100,000 participants genotyped on the UK Biobank array. The rest of the data are expected to be available very soon.
Applications for genetic data
UK Biobank would like to make the following comments on the scope of applications.
- All applications require a clear stated hypothesis, such that UK Biobank can judge whether the application is by a bona fide researcher for health-related research, which is in the public interest.
- It is ultimately for researchers to select the associations they wish to study from the genotyped and imputed SNPs and the UK Biobank phenotypes. UK Biobank may have a view on whether such associations (in particular whether the selected phenotypes) would be appropriate, but there is no underlying restriction in principle, which would serve to limit the scope of the associations that a researcher might choose to study. As such, as a matter of principle, UK Biobank would consider approving a suitable GWAS / PheWAS study.
- UK Biobank would highlight that although some UK Biobank phenotypes are readily available, others may not be well-ascertained or may not be appropriately validated (at this time). By way of illustration, self-reported outcomes collected during the participant baseline visit are readily available. However, other phenotypes, such as validated outcomes for incident and prevalent disease depend on the availability of the health record linkage data (over which UK Biobank inevitably has less direct control).
- In order to provide a level playing field, UK Biobank would request that if an application is submitted prior to 30 November 2015, then the relevant datasets will be made available as soon as reasonably possible thereafter. Further, UK Biobank will also accept amendments to existing applications – as long as the application still satisfies the criteria set out in paragraph 1 above – which will be dealt with in the same time frame.
- UK Biobank intends to make available in due course a set of all (or at least the great majority) such associations through the Showcase database. This will be available to registered researchers (and a specific application to UK Biobank will not be required to access such summary data).
Use of a single genetic dataset
We have received a number of requests from institutions who would like to be able to store a single central genetic dataset, which can be linked a) between collaborators and b) for use on multiple applications from within the same institution. We support this proposal and going forward, we will release suitable bridging files to enable such linkage to take place. It would be helpful for our administrative team if, when applying, it could be made as explicit as possible as to the precise linkage required (in terms of the pre-existing genetic dataset and the identities of the collaborators). For the avoidance of doubt, the same approach to linking datasets between different applications still applies http://www.ukbiobank.co.uk/wp-content/uploads/2013/10/UK-Biobank-data-linkage.pdf
The impact of improved microarray coverage and larger sample sizes on future genome-wide association studies, Lindquist KJ et al. Genetic Epidemiology May 2013 37 (4): 383-392
A Timeline for future data availability can be found here.