Application of machine learning methodologies on the UK biobank dataset to quantify the effect of lifestyle habits on health
Application of machine learning methodologies on the UK biobank dataset to quantify the effect of lifestyle habits on health
Lay summary
The primary aim of this project is to utilize the large data set available in the UK biobank to establish a quantitative association between the lifestyle behavior (e.g. diet, fitness activity, smoking habits, stress, etc.) of the subjects and their health status. The association between specific lifestyle behaviors and chronic diseases is widely studied (e.g. sedentary lifestyle and cardiovascular disease risk, consumption of processed meat and cancer, etc.). However, quantifying such risks according to lifestyle behavior remains a challenge, as well as measuring the combined effect of multiple behaviors on multiple outcomes. By analyzing data of the entire UK Biobank cohort, including lifestyle questionnaire data, primary care records, and additional clinical data records, we will construct multiple machine learning models tailored to predicting a wide set of disease risks. In addition, we plan to study the association of lifestyle behaviors to diseases through clinical biomarkers (e.g. CRP, ESR, glucose, albumin, BMI, TG, cholesterol, BP, etc.). This will allow us to study how lifestyle behavior affects the clinical biomarkers, and how these biomarkers can be used for health status evaluation (i.e. how clinical biomarkers mediate the association between lifestyle behavior and clinical status). We will undertake this research using anonymous data. The project will last 18 months.
The high impact of this study stems from its ability to influence a very wide population. An ability to better understand and quantify the effect of lifestyle behavior on health has the potential to allow choosing the most fitting change, considering the personalized impact on the individual's health. This project will potentially expand the public knowledge concerning lifestyle habits and their impact on disease risk, and will also contribute to the creation of models that will better allow assessing and improving health, thus avoiding major health complications in the future.
Scope extension:
The primary goal of this research is to design an analytical tool that will quantify the effect of lifestyle behaviors (e.g. diet, physical activity, stress, smoking, etc.) on clinical conditions (e.g. disease risk). Such a tool will characterize the correlation of lifestyle behaviors to certain adverse health outcomes, generate knowledge useful for guiding clinician recommendations, and enable individuals to make more informed personal health decisions.
The secondary goal is to characterize the association between lifestyle behaviors to clinical conditions specifically through biomarkers (e.g. blood test results, physical measurements, etc.). Understanding the association of lifestyle habits to a wide set of biomarkers, and of these to multiple clinical conditions, will increase our understanding of the concerning physiological effects of the behaviors. This may potentially allow characterizing individuals with specific lifestyle habits or diseases, based on their clinical biomarkers.
Within both goals we are looking to extend our scope as follows:
- We are investigating the effect of lifestyle behaviors (e.g., diet) on clinical outcomes (e.g. myocardial infarction) through circulating metabolomic biomarkers and on cellular aging through telomeres' length.
2. We are interested in expanding the analysis and look for causal effects, rather than merely associations. To this end, we will apply Mendelian randomization techniques, which use genetic information to highlight associations consistent with causal effects.