This message addresses claims in The Guardian on 14 March 2026 about the consequences of some researchers unintentionally adding de-identified participant data to online repositories when making their computer code available to other researchers.
A message from Professor Sir Rory Collins, Chief Executive and Principal Investigator of UK Biobank

Dear UK Biobank participant,
I want to reassure you that your personal information in UK Biobank is safe. After 14 years of making your data available for scientific discovery, we have no evidence of any of you being unwillingly identified.
Ensuring your personal information is used safely and correctly is our number one priority. There has not been any hack or data breach of UK Biobank, and if there had been you would have heard about it from us.
What is The Guardian writing about?
Over 20,000 scientists around the world have been approved to use your de-identified data to discover how to prevent and treat many different types of disease better.
In a small proportion of cases, some scientists have unintentionally put these de-identified data along with their research findings on websites which are publicly accessible. These data do not contain any personally identifying information about you or any other participant.
Even though the data are de-identified before being made available to researchers, we don’t want them to be used by researchers who have not gone through our rigorous access review process. Consequently, we’ve taken steps to help researchers avoid putting any de-identified data on code repositories, and to ensure that they are removed rapidly if it occurs. This includes already having had them removed from a code archive website that is mentioned in The Guardian.
What has actually happened?
As part of their work, researchers may write computer instructions, or code, to assess patterns in data. Scientific journals often require researchers to publish this code, so that other researchers can check and replicate their methods. Researchers typically publish the code on websites known as code repositories, which are designed for technical users.
I want to reassure you that your personal information is safe. I am also a UK Biobank participant, so I know how much this matters.
When publishing this code, a small proportion of researchers have unintentionally included portions of de-identified UK Biobank data.
All UK Biobank data provided to researchers are ‘de-identified’. This means that the data do not contain personally identifying information – such as your name, address, exact date of birth or NHS number – but may include health data you have provided and from your health records.
We take your privacy extremely seriously. I am also a UK Biobank participant, so I know how much this matters. We know the possibility of your data being identified can never be completely removed, but it would require someone to have specific, matching information from another source that they knew was personal to you, in order to do so.
That is exactly what The Guardian has done. The participant featured in the article chose to give specific, personal, health information to The Guardian. The Guardian then cross-referenced this information with UK Biobank data to establish that the person is a participant.
This is not a failure of our approach to data confidentiality because the participant shared the information to identify themselves and, significantly, has said she intends to remain part of UK Biobank because its work is so important. As with any personal information, we recommend always being careful not to reveal specific details about ourselves on social media or websites.
What actions have we taken?
- We introduced mandatory training on data security.
- We built a tool for researchers to check their code.
- We built automated search tools for participant data on code repositories.
- We will stop access to UK Biobank if researchers fail to comply.
All researchers and their institutions commit in their legal contract with UK Biobank never to share the data outside of their systems. We first detected de-identified participant data on an online code repository in 2022. We had it removed immediately by the researcher, and reminded all approved researchers of their legal requirement not to share the data.
We have continued to monitor these repositories and, as more researchers were encouraged to share their code by the journals in which they were publishing their research findings, we found more examples of de-identified data being shared unintentionally.
To stop this happening we:
- have introduced mandatory training for all researchers approved to use UK Biobank about how to keep all of the data from participants secure;
- have built a tool that researchers must use to check their code before publishing it in order to ensure that it does not contain any participant data;
- have built automated search tools to run on a daily basis to identify any participant data on these code repositories so that we can have them removed immediately if it occurs;
- will stop researchers or institutions from accessing UK Biobank if it becomes clear that they are not taking the necessary steps to manage the data securely.
In addition, we have changed the way that researchers access UK Biobank data.
When we first made data available in 2012, the only feasible way to do so (and as was standard at the time) was to send them the de-identified data they needed for their research. In 2021, we launched the world’s first cloud-based research platform able to handle the scale of data and numbers of researchers in UK Biobank. By 2024, the platform was sufficiently refined to support the work of most researchers using UK Biobank. As a result, with a few carefully assessed exceptions, further releases of data are now only available on this secure platform, and researchers must delete any data that they had received previously when their current projects come to an end (the last of which will end early next year).
As an additional layer of security, we are creating a world-first automated checking system to prevent any participant data from being taken off our research platform. In particular, the coded GP data that will be provided to UK Biobank are only going to be made available to researchers when this checking system is in place.
Restricting access to all UK Biobank data in this way further reduces the opportunity for de-identified data to be inadvertently included with a researcher’s code on a public repository.
Our continued thanks for supporting discovery science
I hope that you are reassured that we have done everything that we should to ensure that those who use your data for research do so responsibly and securely. I understand that The Guardian’s article may be unsettling, but it does not accurately reflect the reality, and we do not believe there is a cause for any concern.
We are extremely grateful for the contributions that you are making by being part of UK Biobank. By making data available responsibly to scientists around the world, UK Biobank is allowing discoveries to be made that would not otherwise have emerged, leading to the development of new ways to prevent and treat disease, and better health for all of us.
Professor Sir Rory Collins
CEO and Principal Investigator of UK Biobank
If you are a participant and have any questions about the UK Biobank data use, please contact us on [email protected] so we can discuss these with you personally.
Related content
We made an important commitment to our participants when they joined the study. Find out how we uphold this commitment.
UK Biobank data is powering research across scientific disciplines and across the globe. Read a selection of stories about how healthcare is being changed by discoveries made with our participants’ data.