Skip to navigation Skip to main content Skip to footer

Approved research

Exploring Diet/Lifestyle factors as causes and modifiers of genetic determinants of obesity and metabolic traits

Principal Investigator: Dr Zoltan Kutalik
Approved Research ID: 66995
Approval date: December 1st 2015

Lay summary

We have a broad interest and a long track record in developing scalable and accurate statistical methods for large genomic datasets. This requires applications on real datasets and the UK Biobank represents the best so far for the amounts of genetic and phenotypic data it contains. Specifically, our project relies on the four following aims:

  1. Haplotype estimation. We will explicitly model cryptic relatedness between individuals (which is common in biobanks) in order (i) to substantially improve the quality of the haplotype estimates and (ii) to increase association discovery power compared to genotype based approaches.
  2. Genotype imputation. We will improve the process of genotype imputation in terms of speed and accuracy, notably by optimizing methods for modern reference panels of haplotypes. Through simulations, we will also assess how alternative genotyping approaches, such as low-coverage sequencing, can affect association discovery power.
  3. Parent-of-origin effects. Using identity-by-descent between individuals, we will build surrogate parental genomes for many UK Biobank individuals. We will then determine the parent-of-origin of alleles for individuals for which parental genomes are unavailable. Finally, we will investigate the prevalence of parent-of-origin effects across a broad range of complex phenotypic traits.
  4. Rare regulatory variant effects. We will use the whole genome sequence (WGS) data to develop new statistical methods to test groups of rare variants in the non-coding regulatory regions for association with many complex  phenotypic traits. We will develop efficient optimization approaches for choosing the most pertinent combination of non-coding variations to test, (i) supervised by known functional annotations of the genome and (ii) informed by the data derived in aims 1 and 3; that is by making the statistical tests aware of phase and allelic parent-of-origin information.