Skip to navigation Skip to main content Skip to footer

Whole Genome Sequencing data on 200,000 UK Biobank participants available now

Whole Genome Sequencing data on 200,000 UK Biobank participants made available for research

First release from world’s largest whole genome sequencing project could help researchers to understand the genetic determinants of disease and accelerate innovative drug discovery work.

London, 17 November 2021 – In a major step forward for the advancement of genomics research, today Whole Genome Sequencing (WGS) data[1] for the first 200,000 UK Biobank participants has been made available to researchers through the recently launched Research Analysis Platform.

This dataset represents the world’s largest single release of WGS data[2]. When combined with the extensive amount of lifestyle, biochemical, imaging and health outcome data already held for UK Biobank participants, it will enable researchers to better understand the role of genetics for health outcomes and to advance drug discovery and development.

The whole genome sequencing of all 500,000 UK Biobank participants is the most ambitious project of its kind ever undertaken. It has been funded through a public-private partnership involving Amgen, AstraZeneca, GlaxoSmithKline (GSK) and Johnson & Johnson[3], alongside Wellcome and UK Research and Innovation (UKRI), and sequencing has been carried out by deCODE Genetics and the Wellcome Sanger Institute[4]. The release of these 200,000 whole genomes today will be followed up by the release of the WGS data for the remaining 300,000 participants in early 2023.

Importance of these data for understanding genetics and relationship with human health

Access to these WGS data will allow researchers from across the world to study the 98% of the genetic code that until recently had no clear purpose. Whole genome sequencing on this unprecedented scale will significantly enhance understanding of the following:

  • WGS data will enable researchers to identify rare non-coding variants that contribute to disease onset and progression. By combining the WGS data with the rich clinical and lifestyle data of UK Biobank participants, researchers are now uniquely equipped to answer questions about why some individuals develop particular diseases but others do not, and why certain conditions worsen in some individuals over time.
  • The WGS data will help to accelerate drug discovery and development by allowing researchers to identify new drug targets. This is important because pharmaceutical companies have found that potential drug targets supported by clear genetic evidence are twice as likely to result in effective medicines[5].
  • The large-scale nature of the UK Biobank cohort and constellation of health outcome information available also afford an opportunity to assess patient stratification by identifying subgroups of individuals who are more or less likely to respond to treatment, or who are more or less likely to experience side-effects.

A collaborative achievement

This highly anticipated project has only been made possible through collaboration between government, industry, and charity in a project that showcases the strengths of the UK life sciences industry.

The collaborative effort of all partners has resulted in the £200m project being delivered according to plan and budget, despite the challenging conditions caused by the Covid-19 pandemic and remote working.

"Sequencing at such a large scale and speed would not have been possible without the long-term vision of UKRI and Wellcome, the support of the industry consortium, and the expertise of the sequencing teams. The WGS project will make UK Biobank the most detailed genomics database in the world and by sharing these data with the global research community our aim is to enable breakthroughs in understanding, diagnosis, prevention and treatment strategies for a range of common and life-threatening diseases."

Professor Sir Rory Collins, Principal Investigator at UK Biobank

The Chair of the Joint Steering Committee, Letizia Goretti, representing Johnson & Johnson Innovation[6] on behalf of the industry consortium parties, Amgen, AstraZeneca, GSK and Johnson & Johnson, said: “We are all incredibly proud of contributing to the creation of the largest whole genome sequencing data set in the world. These data, combined with the extensive lifestyle, biochemical and health outcome data already available, makes the UK Biobank an increasingly powerful resource for understanding the genetic architecture of diseases and accelerating drug discovery and development. It is a truly pivotal moment for scientific research aimed at improving human health.”

Dr Michael Dunn, Director of Discovery Research at Wellcome, said: “The release of the first 200,000 whole genome sequences is a tremendous achievement, not only for UK Biobank, but also for the sequencing partners, deCODE Genetics and the Wellcome Sanger Institute. The integration of the sequences with the other characteristic data sets from participants will create a powerful resource to enable major discoveries that will benefit health outcomes.”

Professor Dame Ottoline Leyser, Chief Executive at UK Research and Innovation, said: “The UK Biobank programme demonstrates how transformative science can be delivered much more rapidly by working in partnership. Through the Data to Early Diagnosis Challenge, UKRI is proud to have co-invested in this ambitious programme, which has delivered an unprecedented data resource to accelerate the application of genomics to improve health.”



Notes to Editors

[1] Whole Genome Sequencing analyses the entire human genome, a unique genetic code of 3 billion building blocks that contain the 24,000 genes inside a human cell and which control the biochemical processes that underpin life.

[2] Five petabytes of WGS data have been made available to the research community today.

[3] On behalf of Johnson & Johnson, the WGS contract was entered into by Janssen Biotech, Inc., one of the Janssen Pharmaceutical Companies of Johnson & Johnson, and the collaboration was facilitated by the Johnson & Johnson Innovation Centre in London, UK.  

[4] The whole genome sequencing of the first 50,000 UK Biobank participants was conducted by the Wellcome Sanger Institute and funded by the Medical Research Council (MRC). Following this pilot, the WGS consortium began sequencing the whole genomes of the remaining 450,000 UK Biobank participants.

[5] Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank (2021)

[6] Letizia Goretti is an employee of Janssen Pharmaceutica NV.