Celebrating 15 years of UK Biobank
Celebrating 15 years of UK Biobank
-Nuffield Department of Population Health
UK Biobank is one of the largest biomedical databases in the world, containing genetic, lifestyle and health information from 500,000 UK individuals.
Fifteen years ago in Stockport, the first participants were recruited for a pilot study that would later become the UK Biobank: one of the most detailed, long-term health research resources in the world. Here we celebrate the long association between this unique biomedical database and NDPH researchers, and the increasingly powerful methods helping to answer the most important questions about human health and disease.
In the early 2000s, the revolutions in genomics and big data were opening up exciting new possibilities in large-scale, biometric health research. At the same time, the UK Government recognised the value of data-driven, population-level studies in countering the alarming rise in non-communicable diseases, including dementia, cancer and diabetes. Genetic research was seen as key for transitioning towards a future where medicines were used as prevention, rather than treatment. Calls were made for a national DNA database that would enable research into how genetic, health and lifestyle data could help identify individuals at risk of chronic diseases.
On 29 April 2002, the MRC, the Wellcome Trust and the Department of Health announced the allocation of £45 million start-up funding to UK Biobank. Three years later, on 29 September 2005, Professor Rory Collins (now Head of NDPH) was appointed Principal Investigator and Chief Executive of the fledgling project. At the time, he was co-director of Oxford University’s Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), which became part of the newly-formed NDPH in 2013. Professor Collins brought a clear vision and great enthusiasm to the UK Biobank project, informed by his experiences helping to lead other large-scale studies. These include the China Kadoorie Biobank (CKB), which recruited over half a million adults during 2004-08. CKB was initiated, designed and led by researchers who are now part of NDPH, in collaboration with scientists in China.
Thanks to Professor Collins’ expertise, UK Biobank was soon underway and recruitment began in earnest in April 2007. In July 2009, a new, £4.5 million dedicated high-tech storage facility designed to hold 10 million samples at -80°C was officially opened at Cheadle, Stockport, by HRH Princess Anne. By July 2010, the project achieved its aim in recruiting half a million participants – just over three years since the first participants were recruited. At that time, Professor Collins said: ‘This is a landmark achievement…The UK Biobank resource will be available to the best scientific minds wherever they might be, and I am convinced it will make many major contributions to improving the health of future generations.’ The years that followed have certainly proved this prediction true.
Putting data to work for health-related research
As a charity, UK Biobank is committed to making the data collected on its 500,000 participants available for approved researchers to carry out health-related studies that are in the public interest. This includes data on genomics, multi-modal imaging, biochemistry measures, physical measures, lifestyle and environmental factors as well as longitudinal data on health outcomes over time. Before researchers access the data, all personal identifiers are removed so that individual participants cannot be identified and the data are checked to ensure that no potentially disclosive information is included. The researchers are then provided with the de-identified data that they need for their research project.
By facilitating global access and operating as a not-for-profit, UK Biobank has enabled over 20,000 researchers across the world to use their data in studies, from PhD students to professors. NDPH researchers frequently use the resource for their research, especially since May 2017, when the Li Ka Shing Centre for Health Information and Discovery officially opened at Oxford University’s Old Road Campus. This simultaneously became the home of the Big Data Institute and the UK Biobank’s data analysis, IT, record linkage, epidemiological and communication teams, besides many NDPH staff. Professor Collins remains Principal Investigator and Chief Executive of UK Biobank, and in 2019 Professor Naomi Allen (NDPH) became its chief scientist.
This high density of biobank and data analysis-related expertise in the same building means that NDPH researchers are well-placed to draw new insights from the UK Biobank. These include Dr Keren Papier, a nutritional epidemiologist within NDPH’s Cancer Epidemiology Unit, who recalled the first time she used UK Biobank for her research. 'When I first accessed the UK Biobank resource, I remember feeling surprised and excited to learn about all of the different data options available, including repeated dietary surveys, genetic data, and health data."
Finding risk factors for disease
A key focus of studies that use UK Biobank data is identifying and quantifying factors that make people more or less likely to develop a particular disease. This can be difficult to do using small-scale studies, due to the immense variation in physical, biochemical and genetic measures between individuals. But UK Biobank’s size means that researchers can take into account variables that might have an influence, such as socioeconomic status, smoking and pre-existing illnesses.
Using this approach, NDPH researchers have identified a range of factors associated with subsequent disease risk, both physical (such as waist circumference) and biochemical (for instance, hormones). This information can help screening programmes identify those who would benefit from preventative treatments.
Research from UK Biobank is also helping people to take control of their health themselves. The resource contains much information about lifestyle choices, such as smoking, alcohol intake, activity levels and diet. These data have revealed associations between, for instance, greater red meat intake and increased bowel cancer risk, and watching TV for more than two hours each day with worse overall health.
In a recent study, Associate Professor Aiden Doherty from NDPH used UK Biobank to investigate whether physical activity protects against heart disease. Between 2013 and 2015, approximately 100,000 UK Biobank participants were sent wrist-worn accelerometers to record their activity over a seven-day period, and their health was followed-up over the following five years. ‘Our results showed an inverse association between the amount of total physical activity and cardiovascular disease incidence, with no threshold of effect at low or high levels’ said Professor Doherty. ‘The UK Biobank wearables study has transformed our understanding of how physical activity is related to serious illnesses. This dataset is also driving advances in our understanding of sleep and circadian rhythms too. It is truly a phenomenal resource for important healthcare research.’
Entering the genetic age
The UK Biobank biomedical database is also a highly-rich genetic resource, and includes data on all 500,000 participants about the presence of approximately 850,000 genetic variants, many of which are known to predispose an individual towards developing a particular disease. This information is helping researchers to navigate an increasing issue with health research: multimorbidity, the presence of more than one illness in the same individual. As global populations age, a key research priority is to understand whether having one disease or condition increases the risk of contracting another. Observational studies can indicate whether two illnesses appear to be associated (for instance, kidney disease being more common in people who are obese) but they can’t tell us whether these diseases have a common cause and/or if they are interlinked.
To address this, NDPH researchers are pioneering new approaches for analysing UK Biobank’s genetic data by testing whether having a higher overall genetic risk score for one disease is associated with an increased likelihood of developing another. So far, their discoveries include finding that obesity is a direct cause of chronic kidney disease and type 2 diabetes a cause of erectile dysfunction, besides shedding light on the biological mechanisms underpinning irregular heart rhythms. In addition, Dr Anthony Webster has been using advanced statistical methods to map diseases into clusters that likely share common risk factors. These discoveries could ultimately help focus health interventions to where they can be most effective.
UK Biobank is set to become an even more powerful genetic resource, however. In 2018, UK Biobank partnered with the Wellcome Sanger Institute to sequence the entire genomes of 50,000 UK Biobank participants. Oxford University’s Wellcome Trust Centre for Human Genetics played a leading role in this landmark achievement, particularly in designing the genotyping arrays and data analysis methods. Following this success, in 2019 the Sanger Institute, together with deCODE in Iceland, began sequencing the remaining 450,000 participants’ genomes. This is the biggest endeavour of its kind ever, and will result in sequencing data equivalent to around 600 billion pages of text. The first tranche of data for 200,000 participants is due to be released at the end of this year, and will be made available via a new cloud-based Research Analysis Platform. Together with UK Biobank’s existing whole-exome sequencing data for 200,000 participants, this unprecedented level of genetic detail will open the door for researchers to study some of the most complex questions in how genes interact with our lifestyles and environment to cause disease.
Prepared for the COVID-19 pandemic
As an established, data-rich resource, UK Biobank could immediately assist during the early days of the COVID-19 pandemic to help address many of the urgent unknowns about this new disease. In particular, there was a pressing need for data on how COVID-19 cases were distributed across the country, the factors that increased an individual’s risk of severe disease, and whether previous exposure gave any long-lasting protection. In response, Dr Alan Young, (whose team built many of the UK Biobank data handling systems) and the UK Biobank Data Analyst team (both based at NDPH) worked closely with researchers across Oxford University, NHS Digital and Public Health England to make new datasets available. These included SARS-CoV-2 antigen test data, primary care data, hospital inpatient data (including critical care) and death data.
This information has been used by over 700 research groups worldwide, for instance to quantify how different genetic and environmental factors influence the risk of developing severe COVID-19. These have revealed strong associations between sociodemographic factors (such as social deprivation and occupation), lifestyle factors (such as smoking and alcohol), and pre-existing diseases (including asthma, obesity and psychiatric disorders) with increased vulnerability to COVID-19.
UK Biobank also rapidly launched a seroprevalence study, to generate important evidence about the proportion of people who had been infected with coronavirus and how long SARS-CoV-2 antibodies remain in the blood. The study involved 20,000 individuals, half of whom were existing UK Biobank participants and half of whom were their adult children and grandchildren (to increase the age representation across the study). They were asked to provide a blood sample every month for six months, which were sent to a laboratory at Oxford University’s Target Discovery Institute for SARS-CoV-2 antibody measurements. Participants also completed a survey asking about any symptoms they had experienced and potential risk factors for coronavirus exposure (such as household size, employment, use of personal protective equipment, and transport modes).
NDPH researchers, led by Dr Rishi Caleyachetty’s and Professor Sarah Parish’s teams, played a leading role in analysing the data. The first results, published in July 2020, revealed that infection rates varied considerably across the UK and between different demographic groups. Later results indicated that, for the vast majority of individuals infected with coronavirus, antibodies lasted for at least six months, suggesting that individuals may be protected for a substantial period following infection. This information played an important role in guiding national and local social distancing policies.
UK Biobank continues to evolve, adding ever more forms of data. For instance, last March data on the telomere lengths for all 500,000 UK Biobank participants became available. Ultimately, this could help researchers to understand why some older adults succumb to chronic diseases while others do not. UK Biobank is also engaged in the world’s largest imaging study of internal organs. This will involve an imaging scan of the brain, heart and abdomen, a carotid ultrasound and a full-body scan of bone density and body fat composition of up to 100,000 participants of UK Biobank. Ultimately, the resulting library will open up new ways to investigate disease mechanisms, particularly for diseases such as arthritis, coronary heart disease, osteoporosis and Alzheimer’s disease.
In response to the coronavirus pandemic, the imaging study has expanded its scope to investigate the effect of SARS-CoV-2 infection on internal organs. When combined with longitudinal health records, and data on past infection collected using home-based antibody kits, these data will generate a unique resource to enable research into the longer-term health effects of coronavirus infection (so-called ‘long-COVID’). Up to 3,000 UK Biobank participants who attended an imaging assessment before the pandemic are now being invited back for a repeat scan. Half of these have been identified by home antibody testing kits as having been infected with SARS-CoV-2, whilst the other half have not previously been infected.
Various Oxford University researchers have been heavily involved in developing the imaging protocols and data analysis methods for the imaging studies. These include Professor Stefan Neubauer (Radcliffe Department of Medicine) for Cardiovascular Magnetic Resonance Imaging (MRI); Professors Stephen Smith and Karla Miller (Nuffield Department of Clinical Neurosciences) for brain MRI; and Professor Paul Leeson (Radcliffe Department of Medicine) for carotid ultrasound scanning.
‘To achieve the high-throughput needed to scan 100,000 people, we had the challenge of condensing an hour-long protocol into 35 minutes’ said Professor Miller. ‘Drawing on our previous work with the Human Connectome Project, which pioneered next-generation fast imaging technologies, we were able to squeeze cutting-edge imaging into very short timescales. The richness of the resulting imaging data, alongside a completely unprecedented number of deeply characterised subjects, has been a game changer for the field of brain imaging.’
NDPH researchers are also collaborating with Professor Barbara Casadei’s team in the Radcliffe Department of Medicine on the Heart Monitor Project. The aim of this is to assess the prevalence and predictors of irregular heart rhythms in the UK middle age population and the impact of this on health outcomes.