Over decades, genome-wide association studies (GWAS) have been the common tool for exploring the genetic causation of phenotypic variation. Unfortunately, GWAS is limited to the predefined phenotypes. Besides, due to the single phenotypes focus, GWAS always ignores pleiotropy and is weak in analyzing conditions with comorbidities. The growth of genomic data, linked to rich phenotypic data which are available in real Electronic Medical Records (EMR) system, such as UK Biobank, BioVU etc., makes it possible to conduct the “reverse GWAS”, Phenome-wide association studies (PheWAS), working as a complementary approach to GWAS. The PheWAS inverts the traditional GWAS by searching for phenotypes association with specific SNPs across thousands of human phenotypes, enabling the unbiased phenotypic exploration and examination for the impacts for the genetic variants with special interest.
Our study aims to validate the performance of our newly developed method K-Means Clustering Linear Combination (KCLC) for jointly analyzing multiple phenotypes in Phenome-Wide Association Studies (PWAS). Our simulation results showed that KCLC effectively controlled the False Discovery Rate (FDR) and outperformed other methods we compared with. We will further investigate whether the newly developed method performs well in real-word datasets. To achieve this, we will apply the proposed method to a set of EMR-based phenotypes from the UK biobank.