The effect of sampling strategies on the observation of rare, deleterious alleles
Approved Research ID: 88057
Approval date: April 19th 2023
In recent years, increasing attention has been placed on efforts to collect genetic information from large numbers of individuals in order to study human disease. This is in part motivated by research on rare genetic mutations and associated disorders, as the low frequency of these cases makes them difficult to identify from small groups of individuals. The ways in which individuals are chosen for these studies with respect to geography (e.g. if all participants are chosen from the same geographic region vs. a few participants being chosen from each of several regions) impacts what kinds of genetic variation can be studied due to the evolutionary processes underlying how new mutations arise and spread.
In the proposed project, we will investigate how sub-sampling strategies affect the frequency of discovery of rare genetic variants. This will include developing and applying statistical and computational methods to characterize patterns of genetic variation as they relate to the frequency, type, and deleterious effects of rare mutations. Lastly, we will compare the patterns we find in data from the Biobank to results of our previous work using mathematical models to study similar problems.
The results of this research will help scientists to better design genetic studies by identifying circumstances in which it is beneficial to sample many individuals from a "focused" region versus the same number of individuals spread across a "broad" geographic range. For instance, "focused" sampling may be more useful for identifying genetic mutations that contribute to human disease and characterizing how they affect systems in the human body. Identifying these mutations and their mechanism of action can help scientists to develop new treatments for patients with rare disorders. On the other hand, "broad" sampling may be preferable for developing genetic risk screenings for personalized medicine, as developing these tests using data from individuals with a wide range of environments and genetic ancestries can improve test accuracy when used in clinical settings and be beneficial in terms of equity in prediction. By investigating this trade-off empirically, the results of our research will help to clarify how to best sample individuals for these studies in order to maximize the medical and public health benefits of genetic research.