Using UK Biobank data to build an obesity related person and area classification for the UK
Principal Investigator: Dr Stephen Clark
Approved Research ID: 30846
Approval date: March 13th 2019
The aim of the research is to utilise the extensive nature of the UK Biobank to understand the dynamics of obesity in the older UK population. The primary research question will establish the linkages between the characteristics of the individual, and their reported environment, to obesity outcomes. The secondary research question will examine how suitable existing machine learning/data mining architectures are able to process the volume of UK Biobank data to reveal insight. The primary morbidity under study will be obesity, but the research will also extend to related morbidities such as diabetes and heart conditions. The purpose of the UK Biobank is to help researchers to improve the prevention, diagnosis and treatment of a wide range of serious and life-threatening illnesses. Obesity is recognised as one of the most important health challenges in society which has the potential to negatively impact on people's physical and mental health. A system for identifying the characteristics of individuals and areas prone to obesity allows for the effective targeting of resources to meet this challenge. A range factors are highlighted in the Foresight project on Tackling Obesities' Obesity System Map as being influential in determining obesity risk for individuals. Information relating to a large number of these factors are collected by UK Biobank and we have completed a mapping exercise to identify these variables in the DataShowcase. We will then use machine learning algorithms (supervised/unsupervised) to classify individuals and areas into differing obesity risk levels. The utility of this knowledge to identify obesity in individuals will be assessed. Recognising that the participants were aged 40-69 years during the period 2006-2010, this research will use the full initial assessment visit participants of around 500,000. For further validation exercises the equivalent data for those who undertook the repeat assessment visit (2012-2013) will be requested. Initial examination of the Data Showcase suggests that many of the approximately 1,800 non-image variables are suitable for this research.