Principal Investigator: Dr Roy Perlis
Massachusetts General Hospital, Boston, Massachusetts, USATags: 40404, cluster analysis, coded clinical data, genetic association, latent Dirichilet allocation, Machine Learning, reverse genetics
Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that may be unreliable and fail to capture the relationship between related diagnoses. We are proposing to apply a new method that identifies and groups related disorders for study. In preliminary studies in another biobank, we have demonstrated that this method makes better use of diagnostic codes than studies that simply examine every single code individually. We will use this new method to identify genetic variations that contribute, not to single diseases, but to groups of diseases. From a public health perspective, this study will help researchers understand how seemingly different diseases are related, at a clinical as well as genetic level, which may help us understand the causes of these diseases. We anticipate completing this project within 18-24 months.