This project aims to investigate the risk factors and predict the incidence of venous thromboembolism (VTE) by leveraging the rich multimodal data from the UK Biobank. VTE, comprising deep vein thrombosis and pulmonary embolism, is a major global health burden with significant morbidity and mortality. Early identification of high-risk individuals through predictive modeling can enable targeted prevention strategies, reducing healthcare costs and improving outcomes.
Research Questions:!
What genetic, lifestyle, clinical, and biochemical factors are independently associated with VTE risk ?
How can these factors be integrated into a robust prediction model for VTE incidence?
What is the comparative performance of traditional statistical models versus machine learning approaches in VTE prediction?
Objectives:!
To identify key VTE risk factors through multivariate Cox regression analyses, accounting for covariates such as age, sex, BMI, smoking status, and genetic predisposition.
To develop and validate a VTE prediction model using machine learning techniques (e.g., random forests, gradient boosting) to handle complex interactions between variables.
To assess the incremental value of different data types (e.g., polygenic risk scores, imaging-based adiposity measures, blood biomarkers) in enhancing prediction accuracy beyond conventional risk factors.
Scientific Rationale:!
VTE etiology is multifactorial, involving genetic, environmental, and metabolic components. UK Biobank provides a unique opportunity to explore these factors longitudinally with extensive data, including genomics, physical measurements, and health records. This research will address gaps in current VTE risk stratification by creating a holistic prediction tool, potentially informing clinical guidelines and personalized interventions. The study aligns with UK Biobank’s mission to advance public health through large-scale data analysis.