Fine-scale ancestry in the UK Biobank samples
Principal Investigator:
Mr David Nicholson
Approved Research ID:
28659
Approval date:
November 1st 2017
Lay summary
Our research questions are (a) Can we build an accurate model that can be deployed on a large scale? We aim to build a predictive genetic model of ancestry, analogous to genome-wide association studies that build a predictive genetic model of health outcomes. (b) When we infer ancestry on 500,000 people, what are the properties of these ancestry breakdowns, and what population genetic inferences can be made? E.g. what proportion of people have Scottish ancestry, and where do these people now live? (c) Can we use these breakdowns in GWAS to control against the confounding effect of population structure. By inferring genetic ancestry on all 500,000 UK participants we will facilitate other researchers to carryout accurate GWAS. Thus there will be an impact on many different health related outcomes. Genetic ancestry is also of great interest to individuals, which is a stated aim of the UK Biobank, and we expect there to be substantial public interest in these results. We suggest that many UK Biobank participants may be interested in these ancestry results. We hope that public engagement about this will help combat misconceptions underlying racism. Genetic ancestry can also be important to individuals for health related reasons i.e. response to drug treatment can depend upon ancestry. We will use statistical methods to look at the genetic ancestry of the UK Biobank individuals together with baseline information provided by each individual. We aim to identify useful clusters of individuals, and create summaries of this information for use in a predictive model. We will then use the model to make predictions on all 500,000 individuals. For example, this might infer the proportion of each person?s genome with ancestry from Scotland, Ireland, Wales etc. We will study the patterns of these ancestry proportions to understand the population genetic history of the UK population. We will also look at whether these proportions can be used in GWAS to control against false positive findings. Full cohort