We propose to identify causal protein and genetic determinants of hemorrhagic transformation (HT) in ischemic stroke patients through two-sample or Bi-directional Mendelian randomization (MR) using plasma pQTLs and large!scale GWAS summary statistics from MEGASTROKE and UK Biobank. We aim to develop and validate a high performance machine-learning (ML) based risk prediction model for HT by integrating demographic, clinical, genomic, and proteomic features using algorithms like XGBoost. We will interpret model predictions with SHapley Additive exPlanations (SHAP) to elucidate each feature’s contribution and nonlinear relationships with HT risk. Our objectives include assessing discrimination (AUC) and calibration (calibration curves, decision curve analysis) through nested cross!validation and external validation cohorts. We will perform colocalization analyses to confirm shared causal signals between pQTLs and stroke outcomes. Model hyperparameters will be optimized via grid search to balance performance and generalizability.
Finally, we will generate SHAP-derived risk scores and share all analytic code and derived variables within six months of publication to enhance UK Biobank resource utility.
Last, we need samples data to identify potential biomarkers(including gene, proteins, clinicasl data and samples) for predict HT in early stage by using ML methods.