Principal Investigator: Professor Amy Williams
Department: Bio Statistical & Computational Biology Department
Cornell University, Bio Statistical & Computational Biology Department, 102G Weill Hall, Ithaca, NY14856, United StatesTags: 19947, Identical by descent, relatedness
1a: Identification of genetic variants associated to disease requires understanding the relatedness between all study samples. Traditional association testing necessitates the exclusion of related study samples while mixed models enable their inclusion by specifying their relatedness. We propose new methods for inferring identical by descent (IBD) segments and the relationship status between every pair of individuals in a large cohort. These methods rely on combining information from sequenced panels and other samples to improve inference accuracy. We will evaluate and apply the method to the UK Biobank genotype data and make the software we develop available for use by other researchers.
1b: We aim to develop methods that will enable IBD and relationship inference at greater accuracy than current methods provide. This will enable more effective genome-wide association studies through a better characterization of the samples’ relatedness.
1c: We will use combined data from the UK Biobank, the 1000 Genomes project, and others to perform IBD detection at high accuracy by leveraging information all available samples. We will then use these IBD segments for all pairs of individuals to infer the relatedness status of the UK Biobank samples. Relatedness inference will identify clusters of individuals that share IBD segments in common in order to infer the relationships of members of the cluster to each other. Using multiple samples to jointly infer relatedness will enable increased accuracy compared to existing approaches that perform inference in a pairwise fashion.
1d: Full cohort.
We are intending to do two things within both the UK Biobank and Generation Scotland (GS) data: (1) infer relatedness and pedigree structures across the samples and (2) analyze patterns of migration within the UK based on the sample data and the results from (1). For this purpose we plan to merge the genotype data from both studies and remove any duplicated individuals.
The key distinction is that we aim to infer a pedigree: to infer who the cousins are, who are aunts/uncles, and how all individuals with close and more distant familial relationships are related. We are less interested in how much of the genome two individuals share as using that information to infer the specific relationships. We will do this by analyzing IBD segments — not using an approach like PLINK that estimates an IBD proportion, but per-SNP level information about IBD. We wish to study this in order to understand population movements within the UK and also population densities, etc.