Approved Research

Genetic Heterogeneity in Diseases with Complex Inheritance

Genetics & Genomics Services Inc

Lay summary

The aim of this project is to develop a new statistical method to understand the pattern of genetic variation in common diseases. We will use coronary artery disease, breast cancer, and Parkinson's disease as models, because there is a huge prior experience and information about genetic factors in these diseases. This prior knowledge will help us determine whether the new mathematical methods are useful and capable of adding new insights. Specifically, this project will examine whether new statistical techniques - called Causal Inference - help to distinguish subtypes of these diseases. Some of these subtypes are already known and we hope to identify more.

The scientific rationale for developing this new method is that it could address a major challenge in medical practice. Currently, medications and other treatments for most diseases do not benefit all patients. Randomized controlled trials often show that not all patients respond to otherwise effective treatments but it is tough to uncover the explanation. Doctors typically use trial and error to find individualized treatments. In this research, we will evaluate some new methods to see if we can define previously unrecognized subgroups of patients. We hope that by recognizing these subgroups it will be possible to tailor effective treatments to them and avoid ineffective or harmful treatments.

This project will last about 2 years. In the first year, we will develop the algorithms needed to analyze genetic and medical data. We will test the algorithms in simulated data and then we will test the algorithms in real data from the UKBB. The final months will be used to prepare a publication of the results, which will be published in a way that makes the mathematical results available to the general public without restriction.

Public health could be positively impacted if the new method successfully identifies subtypes of common diseases. Knowing the factors that cause disease is the first step needed to develop new effective treatments.

Scope extension:

The purpose of this research is to evaluate novel analytical methods designed to detect genetic heterogeneity within diseases with complex genetic architectures. We are developing these methods and characterizing their performance using simulations. We would like to further investigate their relevance, usability, and validity in empirical data. Genome and exome data from the UK Biobank is ideal for this next stage of research.

The aim of this project is to apply novel statistical methods to DNA variants observed in cases and controls using three diseases as models. We will focus on Parkinson's Disease (ICD-10 G20), Atherosclerosis (I25.1, E78.00), and Breast Cancer (C50) occurrences in the UK Biobank cohort. Genome sequencing data will be used to comprehensively incorporate coding and non-coding sequence variants. We also plan to evaluate exome data since a large fraction of rare pathogenic variation discovered to date involves coding sequences. We propose to start with SNV and small indel variants, recognizing that a broader spectrum of variant types might be evaluated if the initial analyses yield promising results.

We propose to extend the scope of this project to include a larger number of medically important phenotypes, biomarkers, and disease diagnoses. Our criteria for allowed inclusion in the extended project are: (1) well-defined (single or multiple-related ICD-10 code) complex disease for which there are publicly available summary statistics from one or more well-powered genome wide association studies (GWAS) preferably with meta-analysis and/or multi-ancestry or diverse ancestry cohorts; and (2) a proven subpopulation with monogenic forms of the disease (established definitive, strong or moderate gene-disease associations). The intention is to more comprehensively investigate underlying etiologic heterogeneity in these diseases, leveraging known disease genes and rigorously associated common variants through their derived PRS.