Important update re: exome data fields 24066 and 24067

We have identified an issue regarding some data that you may have used within the UK Biobank Research Analysis Platform (UKB-RAP). This issue solely affects the alternatively-reprocessed exome dataset subcategory using the DRAGEN pipeline and relates to the following two fields: 24066 Exome variant call files DRAGEN VCFs and 24067 Exome variant call files DRAGEN VCFs indices.

We have restricted access to these fields on the UKB-RAP and researchers should stop using the data derived from these specific fields immediately.

An error occurred when these data were added to the UKB-RAP in February 2022, which resulted in them being dispensed into your workspace with incorrect EIDs (Encoded Identifiers). This inevitably means that any associations between these data and the rest of your dataset are incorrect.

Our immediate priorities have been to alert you and to investigate the extent of the issue. We have used two independent forms of verification to re-check our data release processes and can reassure you that this was an isolated error and no other data have been affected in this way.

We will correct these data and will write to researchers again with further information regarding when the data will be re-issued.

We very much regret that these errors have occurred and understand the impact this may have had on your research.

We have provided additional information below, but please contact access@ukbiobank.ac.uk with any further queries.

FAQs

Does this mean that any research I have done with these fields should be considered incorrect?

Unfortunately, yes. The issue solely relates to the exome data reprocessed using the Illumina DRAGEN pipeline (fields 24066 and 24067) and means that any research analyses undertaken using these data will be incorrect.

How do you know that no other data is affected?

The mapping process by which these data were added to the UKB-RAP has only been used on a relatively limited number of occasions to add returned datasets received from third-party providers. We have reviewed each case where this process has been applied and are satisfied that an issue has only risen in relation to this reprocessed exome dataset. In relation to our other datasets, we are undertaking further random-selected analyses as an extra layer of confidence to ensure that there are no comparable issues elsewhere.

When will the correct data be re-introduced?

We are working hard to make the appropriate corrections. Once we have undertaken additional checks to ensure that we are happy then we will make those data available. We will alert researchers of the timeline once we know more.

How did this error happen?

Our investigations are ongoing as the immediate priority is alerting researchers who may be affected. Initial investigations show that the Encoded Identifiers (EIDs) were not re-mapped in the appropriate manner and so when the data were subsequently dispensed into project workspaces, they were matched using incorrect identifiers.

When did this error occur?

This error occurred in February 2022 when this alternatively-reprocessed Whole Exome Sequence data using the DRAGEN pipeline was uploaded to the UKB-RAP. These data were in addition to the main data release and were specifically of use for researchers wishing to use the DRAGEN system.

How have you alerted affected researchers?

We put an immediate stop on researchers accessing these data fields so that they can no longer be dispensed into new project workspaces. We have also emailed all researchers who may have used these data, instructing them not to use data they may have already dispensed.

Has any research been published that uses these data?

Following initial searches, as far as we can ascertain, there are no published papers to date that use these data. This situation will be monitored closely.

Is this an error with the DRAGEN data or pipeline?

No, this is solely an issue with the mapping process used to upload these data to the UKB-RAP.

Last updated April 20^th 2023