Blood-based biomarkers, proteomics and genetics to predict and understand hematologic malignancies in UK Biobank

Last updated:: 6 January 2026

ID:: 1146999
Start date:: 6 January 2026
Project status:: Current
Principal investigator:: Ms Hongbin Hu
Lead institution:: Nanfang Hospital, Southern Medical University, China

Background and rationale: Hematologic malignancies cause substantial morbidity and mortality. Blood-based biochemistry, hematology and proteomics capture immune, inflammatory and hematopoietic processes that may precede diagnosis. UK Biobank provides these data at scale, together with measured and imputed genotypes and approved enhancements (e.g., linked cancer/death registries), enabling biomarker- and protein-centric discovery anchored by genetics.
Research questions: (1) Which baseline biochemical, hematology and proteomic markers associate with incident hematologic malignancies overall and by subtype? (2) Can combinations of blood counts, routine biochemistry, proteomic signatures and polygenic risk improve short-term (3-5 year) risk prediction and early detection? (3) Do genetically anchored analyses (polygenic risk, cis-pQTL Mendelian randomization and colocalization based on measured/imputed genotypes) support causal roles for specific proteins/biomarkers? (4) Among cases, do baseline markers and proteins stratify survival risks?
We will assemble cohort excluding participants with cancer at baseline; ascertain incident leukemias, lymphomas, myeloma, and myelodysplastic/myeloproliferative neoplasms via linked registries and death records; and classify subtypes using ICD/O codes, and adjust models for age, sex, assessment center, ancestry principal components and technical factors. Time-to-event models with multiple-testing control will estimate associations. We will run proteome-wide association scans, derive and evaluate subtype-specific polygenic risk scores, and apply cis-pQTL Mendelian randomization with colocalization to prioritize protein targets with causal support. Predictive models (regularized generalized models and gradient boosting) will undergo cross-validation, calibration assessment, decision-curve analysis and subgroup evaluation across sex and ancestry. Findings will be disseminated per UK Biobank policies.