(1) Research question: What are the causal biomarkers for AD?
(2) Research objectives: We will use the processed proteomic data from the UK Biobank project, specifically from the studies titled “Plasma Proteomic Profiles Predict Future Dementia in Healthy Adults” and “Identifying Multi-Level Biomarkers and Disease Mechanisms for Major Mental Disorders” (application number 19542), to identify causal biomarkers for AD. We require only processed proteomic data and no raw data or biological samples. Our primary objective is to validate our robust causal discovery method [Fan et al., 2024], which is grounded in strong theoretical guarantees, and to apply it to identify biomarkers that serve as direct causes of AD. The findings will be shared through a published paper, contributing to a deeper understanding of AD and promoting drug development from a novel causality-driven perspective. This research aims to achieve two key outcomes: (1) enabling the early detection of AD to facilitate timely interventions and (2) providing a scientifically rigorous foundation for targeted drug discovery.
(3) Scientific rationale: Traditional methods, such as logistic regression, identify biomarkers highly correlated with AD to predict its onset. However, these biomarkers may be the causes or effects of the disease. This project aims to differentiate causes from effects among highly correlated variables, using observational data with a novel causal discovery technique [Fan et al., 2024]. Our method leverages the invariance principle by analyzing data from multiple sub-populations. Unlike standard regression, our method, EILLS, minimizes prediction error under an invariance constraint, ensuring selected variables maintain stable associations across sub-populations. This approach aligns with structural causal model frameworks when heterogeneity among sub-populations is sufficient.
Fan et al. (2024). Environment invariant linear least squares. Annals of Statistics 52(5), 2268-2292, 2024.