Skip to navigation Skip to main content Skip to footer

Approved Research

The effect of sampling strategies on the observation of rare, deleterious alleles

Principal Investigator: Professor John Novembre
Approved Research ID: 88057
Approval date: April 19th 2023

Lay summary

New scope:

Under many models of geographic dispersal and mating, rare variants tend to be clustered in space, centered around their geographical origin, especially for rare deleterious variants.  How best to design sampling approaches to discover such rare variants is an open question in human genetics. Specifically there are likely tradeoffs between "focused" and "broad" sampling strategies.

The overall aim of this research is to characterize the discovery of rare variants as a function of sampling strategy, with the goal of informing sampling strategies for human genetics studies. For instance, we will address questions such as: how does the spatial distribution of rare variants within a sample vary across levels of deleteriousness and/or mutational type? And, how does reducing the geographic breadth of individuals included in a sub-sample affect the discovery of deleterious alleles?  This research will be complementary to an existing line of work in our group focused on developing a theoretical model of the distribution of rare variants and the impact of geographic sampling approaches on the expected site frequency spectrum.

We would like to extend the scope to:  1) Assess linkage disequilibrium among variants as a function of sample size and sampling breadth.  This does not involve new data but involves calculating LD statistics.  2) Assess phenotypic covariance patterns across scales of genetic relatedness and geographic sampling.  This does not involve new data nor analyses, but is a different framing on the analyses of the impacts of sampling breadth than in our original scope.  3) investigate measure the rare variant spectrum against theoretical expectations in models of population growth and to investigate what size sample is needed to detect deleterious variants.  This does not require new data and would involve similar analyses of site frequency spectra as in our original project.  The change is that we would investigate the impact of sample size rather than geographic breadth of sampling.