UK Biobank makes available vast trove of genetics information

Published:: 20 July 2017

A vast trove of genetic data on half a million Britons becomes available today (20th July 2017) for research into a wide range of diseases.

The information is released by UK Biobank to approved researchers having been checked and strengthened over the past two years by genetics experts at Oxford University.

All 500,000 UK Biobank participants provided samples of blood for long term storage and analysis, including genetic, when they volunteered for the project from 2006 to 2010.

US-based Affymetrix undertook the original genotyping of samples in 2013-14. More than 800,000 carefully selected letters (genotypes) throughout the whole genome of three billion letters of each participant were measured.

Oxford scientists have been able to estimate (impute) a further 90 million other genotypes from each participant, since areas of the genome tend to be passed on together from parents to child, one generation to the next. They have also checked and re-checked the data for quality and been able to infer HLA types for each participant, which is helpful in understanding how the immune system works.

We believe that this is the single largest release of a genetic dataset in terms of number of individuals genotyped. The dataset is vast, but we hope it will drive innovative and exciting studies to transform research.
Mark Effingham, UK Biobank Chief Information Officer.

Data are provided so that they do not identify participants, and are available to approved health researchers anywhere in the world.

UK Biobank is already one of the most detailed prospective studies of its kind. The study includes information about participants’ health and well-being, key body measurements, their diets, occupational history, mental health and activity levels. Regular updates from hospital and GP records, and health statistics strengthen the resource immeasurably, and UK Biobank has embarked on a study to image the hearts, brains and abdomens of 100,000 participants.

The genetic data will be boosted shortly by the results of exome sequencing of at least 50,000 samples, being undertaken by UK-based GSK and Regeneron, from the USA. Exome sequencing measures every letter of the genome, but only in those parts of the genome (the exome) that are directly used to produce proteins. The work might be especially useful in developing new treatments which might interfere with that process in diseased cells.

Genetics data are being stored and provided to approved researchers by the European Genome-phenome Archive (EGA), which is a resource developed by EMBL-EBI and the Centre for Genomic Regulation (CRG).

Mark Effingham said: “Working with the EGA has been crucial in delivering these data quickly and efficiently, so that scientists can get on with the work of improving health.”

There are lots of ways these exciting data can be used:

Investigating the relationship between genes and diseases – are particular changes in inherited DNA associated with particular diseases?
More sophisticated analyses of our genes to help identify the causes of disease and the right ways to intervene to improve health.
Learn about shared biology – the same changes in our DNA may be involved in quite different diseases in interesting ways.
Research into how genetic & lifestyle measures influence health in thousands of people will be hugely important.
Investigate how genetic risk factors interact with particular diets, lifestyles, environment and other aspects of our health.