Multimodal Data Analysis to Identify Biomarkers and Patient Cohorts in Neurodegenerative Diseases
Principal Investigator: Silvia Lopez de Diego
Approved Research ID: 42508
Approval date: October 10th 2018
Scientific and medical data are growing exponentially in terms of size, complexity and number of datasets. In order to successfully and accurately leverage the rapidly growing data, some challenges must be overcome. Currently, bringing data together from disparate studies, and even different modalities from the same study, is an effort that lacks software support. This project aims to address these missing enablers of discovery and develop generalized ways to bring together different data modalities from UK Biobank, providing the ability to normalize this data in order to integrate it for detailed analysis. More specifically, subject data models, or subject networks, will be constructed from the data collected for each subject. This will be accomplished through an approach that combines identification of concept entities from available knowledge bases and inference of relationships between entities through the natural language processing of published literature. The resulting knowledge graph will then be used to accomplish two aims: (1) identify biomarkers for neurodegenerative diseases through advanced querying, normalization and analysis over the connected graph in a data driven way and (2) develop graph-based algorithms that can compute subject similarity scores in order to find better subject stratification techniques.