We have identified a problem with the UK Biobank imputed data and which has come to light following discussion via the UKB-GENETICS mail list.
This problem relates to the imputed data and does not affect the genotyped data from the Affymetrix array.
The genetic data was imputed using two different reference panels. The Haplotype Reference Consortium (HRC) panel was used as first choice option, but for SNPs not in that reference panel the UK10K + 1000 Genomes panel was used. The problem arose in the second set of imputed data from the UK10K + 1000 Genomes panel. The genotypes at these SNPs are imputed correctly, but have not been recorded as having the correct genome position in the files.
We have established that the imputed data from the HRC panel is not affected and has the correct positions. This is about ~40M sites and will include the majority of the common SNPs i.e. sites most likely to show genetic associations. These sites are readily identified since the HRC site list is public
The problem is not easy to fix post-hoc, so we intend to re-impute the data from the UK10K + 1000 Genomes panel and re-release the imputed data.
For now we recommend that researchers focus exclusively on SNPs in the HRC panel, or work with the directly genotyped data until the new release is available.
We will progress the re-imputation as quickly as we can and expect to release a new version of the imputed files ideally in September. We will send more details about this data release and confirm timelines in due course.
We can only apologise that this error was not identified during the QA review and do not underestimate the frustration this will cause for the research community.
If you have any questions about this issue please send them to the UKB-GENETICS mail list to https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=UKB-GENETICS
UK Biobank & the Access Team