Detecting heterogeneity in GWAS traits from expected correlations between polygenic risk score predictors
Principal Investigator:
Professor Itsik Pe'er
Approved Research ID:
55739
Approval date:
March 29th 2020
Lay summary
Genome-wide association studies have in the past decade discovered many variants in the genome which inform an individual's risk of various diseases. But conventional GWAS often labels individuals as strictly cases or controls for a disease. In practice, this may too reductive, as within a particular disease there may exist a distinct number of sub-types each with their own sets of observed symptoms. Some examples include bipolar disorder (manic or hypomanic), depression (typical and atypical), lung cancer, and breast cancer. Our goal is to detect the presence of these sub-types, which we term heterogeneity, without necessarily determining which individuals belong to which sub-type. By reducing the complexity of our model in this way, we have a decent chance with current cohort sizes to obtain the required signals, which are very small and widely dispersed throughout the genome. Our model CLiP (Correlated Liability Predictors) detects heterogeneity by calculating how often particular genetic variants co-occur with one another among affected individuals and compares these values to what we would expect when no heterogeneity is present. We expect the duration of this project to be 1 year. We have already devoted significant time to developing and validating CLiP with simulated GWAS cohorts, so this time will consist of data processing. While interest in disease heterogeneity has increased in recent years, there has been relatively little work on discovering hidden sub-types. We hope that CLiP is a first step toward routinely considering the possibility of heterogeneity when conducting GWAS. Additionally, recent works have shown that genetic variants associated with diseases may vary among different populations separated by sex or ancestry, and the overrepresentation of certain groups in GWAS may lead to disparities in diagnosis and care. It stands to reason that there may be numerous diseases with distinct genetic associations across different groups, many of which may not neatly divide among readily visible attributes. Addressing heterogeneity in this scenario will ensure we are properly serving every population.