Skip to navigation Skip to main content Skip to footer

Genetic data

Detailed genetic data on half a million people

This page provides an overview of the different types of genetic data available in UK Biobank. For more detailed information on the genotype array coverage, laboratory processes and quality control checks, visit the genetics section on data showcase.

Accessing genetic data

Please note that some information regarding data access and download in the below FAQs may be out of date.  For instructions on accessing or downloading genetic data, please refer to our Data Access Guide.

Genome-wide genotyping

Genome-wide genotyping was performed on all UK Biobank participants using the UK Biobank Axiom Array. Approximately 850,000 variants were directly measured, with > 90million variants imputed using the Haplotype Reference Consortium and UK10K + 1000 Genomes reference panels.

You can view the full Axiom array SNP list by downloading the csv file*. Alternatively, you can use the Genomic Search facility to find specific genetic loci of interest that are measured on the array. Imputed data using different reference panels are planned to be made available in the future. 

 

Genotyping and Imputation FAQs

Related publication

Whole exome sequencing 

Whole-exome sequencing measures the regions of the genome (about 2%) that are involved in coding for proteins and is particularly suitable for identifying disease-causing and/or rare genetic variants. 

A vanguard exome sequencing project on the first 50,000 participants was performed by Regeneron and GlaxoSmithKline. A further consortium (comprising Regeneron, AbbVie, Alnylam Pharmaceuticals, AstraZeneca, Biogen, Pfizer, Takeda and Bristol-Myers Squibb) have completed the exome sequencing project and data is available to researchers for 470,000 participants.

Exome sequencing FAQs

Related publication

We have identified an issue regarding some data that you may have used within the UK Biobank Research Analysis Platform (UKB-RAP). This issue solely affects the alternatively-reprocessed exome dataset subcategory using the DRAGEN pipeline and relates to two data fields.

 

Important update on exome data fields

Whole genome sequencing

Whole genome sequencing measures the entire genome and will provide information that will complement and enhance the existing genotyping and exome data. It is the biggest whole genome dataset available in the world and will transform the way in which scientists study the genetics determinants of a wide range of health outcomes.

The Medical Research Council have provided funding to UK Biobank for a pilot project (the Vanguard) to perform whole-genome sequencing on 50,000 participants, which is being undertaken by the Wellcome Sanger Institute, Cambridge.

A consortium of government, industry and charity came together to fund whole genome sequencing of the remaining 450,000 participants. This project is funded by:

  • UK Government’s research and innovation agency, UK Research and Innovation (UKRI), through the Industrial Strategy Challenge Fund
  • The Wellcome Trust
  • A consortium of industry partners: Amgen, AstraZeneca, GlaxoSmithKline and Johnson & Johnson.

Data for 200,000 whole genomes is now available to approved researchers in the Data Showcase.

Whole Genome Sequencing FAQs

Whole genome sequencing will generate 15 petabytes of data!

Explore our data

Last updated