Approved Research
Interpretable machine learning methods for finding sparse non-linear associations in multi-view health data
Approved Research ID: 147811
Approval date: November 22nd 2023
Lay summary
Many diseases are characterized by a variety of symptoms, signs, varying degrees of severity and related conditions. However, despite this complexity, conventional studies and clinical risk assessment models often treat health outcomes as single variables (such as the presence or absence of a disease, or a single measurable characteristic like LDL cholesterol level). This type of approach does not consider the correlation structure with the various health outcomes and may oversimplify the complex reality of human disease. Another challenge arises from the variety of the underlying risk factors and molecular mediators influencing susceptibility to disease, adding another layer of complexity.
To address this limitation, we propose to analyze the data by representing it via two multivariate views: the first view representing a profile of underlying risk factors (such as a panel of SNPs, proteins, metabolites, or physiological characteristics) and the second view representing a set of health outcome data (such as a set of related diagnoses, level of severity, laboratory measurements and medical treatments). Given this type of multivariate representation of the data in both views, the first aim of the project is to discover a limited set of meaningful risk factors that are related to the combination health outcomes. To achieve this, we will use novel multi-view analysis methods developed by our research team, that can associate multiple risk factors to multiple health outcomes and are suitable for analyzing large-scale datasets such as the UK Biobank. Building on the findings from the first part of the analyses, we also intend to develop practical risk prediction models based on the identified risk factors and assess whether they can provide improvements over other prediction strategies.
This research project is planned to span over three years, during which we will analyze the findings from our methods and compare the results to other available methods. The outcome of this project is expected to impact public health in terms of improved understanding of the molecular foundations of disease and contributions of various risk factors. This, in turn, holds potential to yield more advanced tools for predicting and diagnosing disease in precision medicine applications.